Skip to main content

Common Table Expressions

A common table expression (CTE) is a named, temporary result set that is valid only for the current statement. It is not stored as an object and is used only during query execution. Unlike derived tables, CTEs can be self-referencing and can be referenced multiple times within the same query.

Scenarios:

  • CTEs can be used to reuse the same subquery in multiple places, avoiding the need to write the same logic repeatedly.
  • CTEs can simplify recursive queries, such as when searching for hierarchical data.
  • CTEs can break down complex queries into smaller, more manageable parts, making the query logic clearer and easier to understand.

seekdb supports both non-recursive and recursive CTEs.

CTE syntax

Common table expressions are optional parts of DML statement syntax and are defined using the WITH clause. If the WITH clause contains multiple clauses, they can be separated by commas. Each clause provides a subquery to generate a result set and associates the subquery with a name. The syntax is as follows:

 WITH [RECURSIVE]
cte_name [(column_name [, column_name] ...)] AS (subquery)
[, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...

Parameter description

ParameterDescription
[RECURSIVE]An optional keyword used to specify whether to create a recursive CTE.
  • Recursive CTE: Specify the RECURSIVE keyword. A recursive CTE is an expression that references itself within the CTE. It is used to process data with recursive structures, such as trees and graphs. A recursive CTE includes a base query (starting condition) in its definition and performs recursive operations on the results of the base query until the stopping condition is met.
  • Non-recursive CTE: Do not specify the RECURSIVE keyword. A non-recursive CTE is an expression that does not reference itself within the CTE. It is used to build temporary result sets and does not involve recursive operations.
cte_nameThe name of the CTE, which can be referenced by a table that contains the WITH clause.
column_nameThe list of selected column names used to specify aliases for columns in the CTE. This allows you to use more readable column names in the main query.
AS(subquery)The CTE subquery used to generate the CTE result set. The AS keyword must be followed by parentheses.

If a list of names in parentheses follows the CTE name, these names are column names, and the number must match the number of columns in the SELECT statement of the CTE. If no column names are specified, the column names will come from the AS(subquery) part, specifically from the first SELECT list.

Scenarios for using the WITH clause

The WITH clause can be used in the following scenarios:

  • At the beginning of a SELECT statement.

    WITH ... SELECT ...
  • At the beginning of a subquery, including derived table subqueries.

    SELECT ... WHERE id IN (WITH ... SELECT ...) ...
    SELECT * FROM (WITH ... SELECT ...) AS dt ...
  • Immediately before a statement that contains a SELECT statement.

    INSERT ... WITH ... SELECT ...
    REPLACE ... WITH ... SELECT ...
    CREATE TABLE ... WITH ... SELECT ...
    CREATE VIEW ... WITH ... SELECT ...

Only one WITH clause is allowed at the same level. If the WITH clause contains multiple clauses, they can be separated by commas.

WITH cte1 AS (...), cte2 AS (...) SELECT ...

The WITH clause can define one or more common table expressions, but each CTE name must be unique within the clause. The following example is invalid:

WITH cte1 AS (...), cte1 AS (...) SELECT ...

Structure of a recursive CTE

A recursive CTE has the following structure:

  • If a CTE in the WITH clause references itself, the WITH clause must start with WITH RECURSIVE. Otherwise, the RECURSIVE keyword is not required.

  • A recursive CTE subquery consists of two parts separated by UNION [ALL | DISTINCT]:

    info
    • UNION or UNION DISTINCT: Merges the result sets of two or more SELECT statements into a single set and removes duplicate rows.

    • UNION ALL: Merges the result sets of two or more SELECT statements into a single set without removing duplicate rows.

    SELECT ...      -- Returns the initial row set
    UNION ALL
    SELECT ... -- Returns additional rows

    The first SELECT generates one or more initial rows for the CTE and does not reference the CTE name. The second SELECT generates additional rows through recursive operations by referencing the CTE name in its FROM clause. The recursion stops when this part no longer generates new rows. Therefore, a recursive CTE consists of a non-recursive SELECT part and a recursive SELECT part. Each SELECT part can be a union of multiple SELECT statements.

  • The data types of the result columns of the CTE are inferred only from the columns of the non-recursive SELECT part, and all columns can be nullable. The data types of the recursive SELECT part are ignored.

  • Each iteration of the recursive part operates only on the rows generated in the previous iteration. If the recursive part contains multiple query blocks, each query block's iterations are scheduled in an unspecified order, and each query block operates on the rows generated in its previous iteration or by other query blocks since the last iteration.

Here is an example:

WITH RECURSIVE cte1 (n) AS
(
SELECT 1 /*Non-recursive part, which retrieves a single row to generate the initial row set*/
UNION ALL
SELECT n + 2 FROM cte1 WHERE n < 10 /*Recursive part, generates a new value that is 2 greater than the n value in the previous row set until n is no longer less than 10*/
)
SELECT * FROM cte1;

Limitations

The following syntax constraints apply to recursive CTE subqueries:

  • The recursive SELECT part cannot contain the following structures:

    • Aggregate functions, such as SUM()

    • Window functions

    • GROUP BY

    • ORDER BY

    • DISTINCT

  • The recursive SELECT part must reference a subquery in its FROM clause only once. It can reference tables other than the CTE and join with the CTE. If such a join is used, the CTE cannot be on the right side of a LEFT JOIN.

For recursive CTEs, the cost estimates displayed by EXPLAIN represent the cost of each iteration, which may be significantly different from the total cost. The optimizer cannot predict the number of iterations because it cannot predict when the WHERE clause will become false.

  • The use of the LIMIT structure is not supported between the recursive SELECT and the non-recursive SELECT.

Example

Here is a simple example that creates a student table with the following columns: student ID, name, and mentor ID. This example demonstrates the difference between recursive and non-recursive CTEs.

First, we create the student table and insert some data:

CREATE TABLE student (
student_id INT PRIMARY KEY,
name VARCHAR(100),
mentor_id INT,
FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected (0.135 sec)

INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected (0.002 sec)
Records: 5 Duplicates: 0 Warnings: 0

In this data model, Alice is the top-level student and has no mentor. Bob and Charlie are Alice's students, David is Bob's student, and Eve is Charlie's student.

Non-recursive CTE example

A non-recursive CTE does not reference itself. For example, if we want to select all of Alice's direct students (i.e., the first-level students), we can use a non-recursive CTE like this:

WITH Alice_Students AS (
SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;

The execution result is:

+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
+------------+---------+-----------+
2 rows in set (0.003 sec)

Recursive CTE example

Now, if we want to find all of Alice's direct and indirect students (i.e., all students at every level), we need to use a recursive CTE. A recursive CTE references itself to query down each level.

WITH RECURSIVE Student_Hierarchy AS (
-- Anchor member: Select Alice as the starting point
SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
UNION ALL
-- Recursive member: Select direct students based on the ID of the previous-level student
SELECT s.student_id, s.name, s.mentor_id
FROM student s
INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;

This recursive CTE first selects Alice as the starting point and then recursively selects each student's direct students until there are no more students.

The execution result is:

+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 1 | Alice | NULL |
| 3 | Charlie | 1 |
| 2 | Bob | 1 |
| 5 | Eve | 3 |
| 4 | David | 2 |
+------------+---------+-----------+
5 rows in set (0.008 sec)

In this example, the recursive CTE allows us to traverse the student hierarchy, while the non-recursive CTE only selects students at a fixed level.