Common Table Expressions
A common table expression (CTE) is a named, temporary result set that is valid only for the current statement. It is not stored as an object and is used only during query execution. Unlike derived tables, CTEs can be self-referencing and can be referenced multiple times within the same query.
Scenarios:
- CTEs can be used to reuse the same subquery in multiple places, avoiding the need to write the same logic repeatedly.
- CTEs can simplify recursive queries, such as when searching for hierarchical data.
- CTEs can break down complex queries into smaller, more manageable parts, making the query logic clearer and easier to understand.
seekdb supports both non-recursive and recursive CTEs.
CTE syntax
Common table expressions are optional parts of DML statement syntax and are defined using the WITH clause. If the WITH clause contains multiple clauses, they can be separated by commas. Each clause provides a subquery to generate a result set and associates the subquery with a name. The syntax is as follows:
WITH [RECURSIVE]
cte_name [(column_name [, column_name] ...)] AS (subquery)
[, cte_name [(column_name [, column_name] ...)] AS (subquery)] ...
Parameter description
| Parameter | Description |
|---|---|
[RECURSIVE] | An optional keyword used to specify whether to create a recursive CTE.
|
cte_name | The name of the CTE, which can be referenced by a table that contains the WITH clause. |
column_name | The list of selected column names used to specify aliases for columns in the CTE. This allows you to use more readable column names in the main query. |
AS(subquery) | The CTE subquery used to generate the CTE result set. The AS keyword must be followed by parentheses. |
If a list of names in parentheses follows the CTE name, these names are column names, and the number must match the number of columns in the SELECT statement of the CTE. If no column names are specified, the column names will come from the AS(subquery) part, specifically from the first SELECT list.
Scenarios for using the WITH clause
The WITH clause can be used in the following scenarios:
-
At the beginning of a
SELECTstatement.WITH ... SELECT ... -
At the beginning of a subquery, including derived table subqueries.
SELECT ... WHERE id IN (WITH ... SELECT ...) ...
SELECT * FROM (WITH ... SELECT ...) AS dt ... -
Immediately before a statement that contains a
SELECTstatement.INSERT ... WITH ... SELECT ...
REPLACE ... WITH ... SELECT ...
CREATE TABLE ... WITH ... SELECT ...
CREATE VIEW ... WITH ... SELECT ...
Only one WITH clause is allowed at the same level. If the WITH clause contains multiple clauses, they can be separated by commas.
WITH cte1 AS (...), cte2 AS (...) SELECT ...
The WITH clause can define one or more common table expressions, but each CTE name must be unique within the clause. The following example is invalid:
WITH cte1 AS (...), cte1 AS (...) SELECT ...
Structure of a recursive CTE
A recursive CTE has the following structure:
-
If a CTE in the
WITHclause references itself, theWITHclause must start withWITH RECURSIVE. Otherwise, theRECURSIVEkeyword is not required. -
A recursive CTE subquery consists of two parts separated by
UNION [ALL | DISTINCT]:info-
UNIONorUNION DISTINCT: Merges the result sets of two or moreSELECTstatements into a single set and removes duplicate rows. -
UNION ALL: Merges the result sets of two or moreSELECTstatements into a single set without removing duplicate rows.
SELECT ... -- Returns the initial row set
UNION ALL
SELECT ... -- Returns additional rowsThe first
SELECTgenerates one or more initial rows for the CTE and does not reference the CTE name. The secondSELECTgenerates additional rows through recursive operations by referencing the CTE name in itsFROMclause. The recursion stops when this part no longer generates new rows. Therefore, a recursive CTE consists of a non-recursiveSELECTpart and a recursiveSELECTpart. EachSELECTpart can be a union of multipleSELECTstatements. -
-
The data types of the result columns of the CTE are inferred only from the columns of the non-recursive
SELECTpart, and all columns can be nullable. The data types of the recursiveSELECTpart are ignored. -
Each iteration of the recursive part operates only on the rows generated in the previous iteration. If the recursive part contains multiple query blocks, each query block's iterations are scheduled in an unspecified order, and each query block operates on the rows generated in its previous iteration or by other query blocks since the last iteration.
Here is an example:
WITH RECURSIVE cte1 (n) AS
(
SELECT 1 /*Non-recursive part, which retrieves a single row to generate the initial row set*/
UNION ALL
SELECT n + 2 FROM cte1 WHERE n < 10 /*Recursive part, generates a new value that is 2 greater than the n value in the previous row set until n is no longer less than 10*/
)
SELECT * FROM cte1;
Limitations
The following syntax constraints apply to recursive CTE subqueries:
-
The recursive
SELECTpart cannot contain the following structures:-
Aggregate functions, such as
SUM() -
Window functions
-
GROUP BY -
ORDER BY -
DISTINCT
-
-
The recursive
SELECTpart must reference a subquery in itsFROMclause only once. It can reference tables other than the CTE and join with the CTE. If such a join is used, the CTE cannot be on the right side of aLEFT JOIN.
For recursive CTEs, the cost estimates displayed by EXPLAIN represent the cost of each iteration, which may be significantly different from the total cost. The optimizer cannot predict the number of iterations because it cannot predict when the WHERE clause will become false.
- The use of the
LIMITstructure is not supported between the recursiveSELECTand the non-recursiveSELECT.
Example
Here is a simple example that creates a student table with the following columns: student ID, name, and mentor ID. This example demonstrates the difference between recursive and non-recursive CTEs.
First, we create the student table and insert some data:
CREATE TABLE student (
student_id INT PRIMARY KEY,
name VARCHAR(100),
mentor_id INT,
FOREIGN KEY (mentor_id) REFERENCES student(student_id)
);
Query OK, 0 rows affected (0.135 sec)
INSERT INTO student (student_id, name, mentor_id) VALUES
(1, 'Alice', NULL),
(2, 'Bob', 1),
(3, 'Charlie', 1),
(4, 'David', 2),
(5, 'Eve', 3);
Query OK, 5 rows affected (0.002 sec)
Records: 5 Duplicates: 0 Warnings: 0
In this data model, Alice is the top-level student and has no mentor. Bob and Charlie are Alice's students, David is Bob's student, and Eve is Charlie's student.
Non-recursive CTE example
A non-recursive CTE does not reference itself. For example, if we want to select all of Alice's direct students (i.e., the first-level students), we can use a non-recursive CTE like this:
WITH Alice_Students AS (
SELECT * FROM student WHERE mentor_id = 1
)
SELECT * FROM Alice_Students;
The execution result is:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 2 | Bob | 1 |
| 3 | Charlie | 1 |
+------------+---------+-----------+
2 rows in set (0.003 sec)
Recursive CTE example
Now, if we want to find all of Alice's direct and indirect students (i.e., all students at every level), we need to use a recursive CTE. A recursive CTE references itself to query down each level.
WITH RECURSIVE Student_Hierarchy AS (
-- Anchor member: Select Alice as the starting point
SELECT student_id, name, mentor_id FROM student WHERE mentor_id IS NULL
UNION ALL
-- Recursive member: Select direct students based on the ID of the previous-level student
SELECT s.student_id, s.name, s.mentor_id
FROM student s
INNER JOIN Student_Hierarchy sh ON s.mentor_id = sh.student_id
)
SELECT * FROM Student_Hierarchy;
This recursive CTE first selects Alice as the starting point and then recursively selects each student's direct students until there are no more students.
The execution result is:
+------------+---------+-----------+
| student_id | name | mentor_id |
+------------+---------+-----------+
| 1 | Alice | NULL |
| 3 | Charlie | 1 |
| 2 | Bob | 1 |
| 5 | Eve | 3 |
| 4 | David | 2 |
+------------+---------+-----------+
5 rows in set (0.008 sec)
In this example, the recursive CTE allows us to traverse the student hierarchy, while the non-recursive CTE only selects students at a fixed level.