Lateral Derived Tables
This topic introduces the concept of lateral derived tables, their limitations, and provides examples.
Lateral Derived Tables
A derived table is a subquery used in the FROM clause. The result set of the subquery is temporarily treated as a table in the outer query. Derived tables are commonly used for grouping and aggregating data or creating a dataset that meets specific conditions for further use in the main query.
A lateral derived table is a special type of derived table that allows the use of the LATERAL keyword to specify a derived table that can reference columns from other tables or derived tables defined earlier in the same FROM clause. This means that the subquery within the lateral derived table can depend on tables defined in the same FROM clause, allowing access to their column values.
The main feature of lateral derived tables is their ability to reference columns from tables defined before them in the FROM clause, which regular derived tables cannot do.
Limitations of LATERAL
- The
LATERALkeyword must be used in theFROMclause. It can appear in a comma-separated list of tables or in a join expression such asJOIN,INNER JOIN,CROSS JOIN,LEFT [OUTER] JOIN, orRIGHT [OUTER] JOIN. - When
LATERALis used in aJOINstatement and references fields from the left table, the allowed join types areINNER JOIN,CROSS JOIN, andLEFT [OUTER] JOIN. - When
LATERALis used in aJOINstatement and references fields from the right table, the allowed join types areINNER JOIN,CROSS JOIN, andRIGHT [OUTER] JOIN. - When using an aggregate function in a lateral derived table, the aggregate function cannot directly depend on the outer query that contains the lateral derived table. In other words, the aggregate function within the lateral derived table cannot directly reference the results of the outer query.
Using the LATERAL Keyword
SELECT select_list
FROM table_name1,
LATERAL (SELECT select_list
FROM table_name2
WHERE table_name2.col_name = table_name1.col_name) AS lateral_derived_table_name
[...];
Parameters:
| Parameter | Description |
|---|---|
| select_list | A list of columns to retrieve, which can include column names, expressions, and aggregate functions. Multiple columns can be separated by commas. |
| table_name1 | The primary table to query. |
| LATERAL | Specifies a lateral derived table. |
| table_name2 | The secondary table referenced in the LATERAL subquery, providing additional information for each row of table_name1. |
| lateral_derived_table_name | An alias for the secondary table, used for subsequent queries. |
| [...] | Optional query clauses such as WHERE. |
Examples
Create test tables and add test data
-
Create the
studentstable.CREATE TABLE students (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
age INT
); -
Insert three rows of data into the
studentstable.INSERT INTO students (name, age)
VALUES ('name1', 20),
('name2', 22),
('name3', 21); -
Create the
scorestable.CREATE TABLE scores (
id INT AUTO_INCREMENT PRIMARY KEY,
student_id INT,
subject VARCHAR(50) NOT NULL,
score DECIMAL(5, 2),
FOREIGN KEY (student_id) REFERENCES students(id)
); -
Insert nine rows of data into the
scorestable.INSERT INTO scores (student_id, subject, score)
VALUES (1, 'A', 86.5),
(1, 'B', 90.0),
(1, 'C', 91.5),
(2, 'A', 86.0),
(2, 'B', 92.0),
(2, 'C', 89.5),
(3, 'A', 93.0),
(3, 'B', 92.5),
(3, 'C', 85.0);
Query test data
Retrieve the name, average score, and highest score for each student from the students and scores tables.
Q1: Use multiple subqueries to retrieve the required data. Perform two independent grouping and aggregation operations on the scores table, scanning it twice to calculate the average and highest scores. Then, use the WHERE clause to join the results with the students table.
SELECT st.name, sc.avg_score, scs.max_score
FROM students st,
(SELECT student_id, AVG(score) avg_score
FROM scores
GROUP BY student_id) sc,
(SELECT student_id, MAX(score) max_score
FROM scores
GROUP BY student_id) scs
WHERE sc.student_id = st.id
AND scs.student_id = st.id;
The result is as follows:
+-------+-----------+-----------+
| name | avg_score | max_score |
+-------+-----------+-----------+
| name1 | 89.333333 | 91.50 |
| name2 | 89.166667 | 92.00 |
| name3 | 90.166667 | 93.00 |
+-------+-----------+-----------+
3 rows in set
Q2: Use the LATERAL keyword to retrieve the required data. Use the LATERAL keyword (lateral derived table) to calculate the average and highest scores for each student in a single SQL statement. The scores table is scanned only once for each student ID.
SELECT st.name, ld_tbl.avg_score, ld_tbl.max_score
FROM students st,
LATERAL (SELECT AVG(score) avg_score, MAX(score) max_score
FROM scores sc
WHERE sc.student_id = st.id) ld_tbl;
The result is as follows:
+-------+-----------+-----------+
| name | avg_score | max_score |
+-------+-----------+-----------+
| name1 | 89.333333 | 91.50 |
| name2 | 89.166667 | 92.00 |
| name3 | 90.166667 | 93.00 |
+-------+-----------+-----------+
3 rows in set
Compared to Q1, Q2 may offer better performance when handling large datasets because it avoids scanning the scores table multiple times. Q2 is also more concise and easier to understand and maintain.