Skip to main content

Lateral Derived Tables

This topic introduces the concept of lateral derived tables, their limitations, and provides examples.

Lateral Derived Tables

A derived table is a subquery used in the FROM clause. The result set of the subquery is temporarily treated as a table in the outer query. Derived tables are commonly used for grouping and aggregating data or creating a dataset that meets specific conditions for further use in the main query.

A lateral derived table is a special type of derived table that allows the use of the LATERAL keyword to specify a derived table that can reference columns from other tables or derived tables defined earlier in the same FROM clause. This means that the subquery within the lateral derived table can depend on tables defined in the same FROM clause, allowing access to their column values.

The main feature of lateral derived tables is their ability to reference columns from tables defined before them in the FROM clause, which regular derived tables cannot do.

Limitations of LATERAL

  • The LATERAL keyword must be used in the FROM clause. It can appear in a comma-separated list of tables or in a join expression such as JOIN, INNER JOIN, CROSS JOIN, LEFT [OUTER] JOIN, or RIGHT [OUTER] JOIN.
  • When LATERAL is used in a JOIN statement and references fields from the left table, the allowed join types are INNER JOIN, CROSS JOIN, and LEFT [OUTER] JOIN.
  • When LATERAL is used in a JOIN statement and references fields from the right table, the allowed join types are INNER JOIN, CROSS JOIN, and RIGHT [OUTER] JOIN.
  • When using an aggregate function in a lateral derived table, the aggregate function cannot directly depend on the outer query that contains the lateral derived table. In other words, the aggregate function within the lateral derived table cannot directly reference the results of the outer query.

Using the LATERAL Keyword

SELECT select_list
FROM table_name1,
LATERAL (SELECT select_list
FROM table_name2
WHERE table_name2.col_name = table_name1.col_name) AS lateral_derived_table_name
[...];

Parameters:

ParameterDescription
select_listA list of columns to retrieve, which can include column names, expressions, and aggregate functions. Multiple columns can be separated by commas.
table_name1The primary table to query.
LATERALSpecifies a lateral derived table.
table_name2The secondary table referenced in the LATERAL subquery, providing additional information for each row of table_name1.
lateral_derived_table_nameAn alias for the secondary table, used for subsequent queries.
[...]Optional query clauses such as WHERE.

Examples

Create test tables and add test data

  1. Create the students table.

    CREATE TABLE students (
    id INT AUTO_INCREMENT PRIMARY KEY,
    name VARCHAR(50) NOT NULL,
    age INT
    );
  2. Insert three rows of data into the students table.

    INSERT INTO students (name, age) 
    VALUES ('name1', 20),
    ('name2', 22),
    ('name3', 21);
  3. Create the scores table.

    CREATE TABLE scores (
    id INT AUTO_INCREMENT PRIMARY KEY,
    student_id INT,
    subject VARCHAR(50) NOT NULL,
    score DECIMAL(5, 2),
    FOREIGN KEY (student_id) REFERENCES students(id)
    );
  4. Insert nine rows of data into the scores table.

    INSERT INTO scores (student_id, subject, score)
    VALUES (1, 'A', 86.5),
    (1, 'B', 90.0),
    (1, 'C', 91.5),
    (2, 'A', 86.0),
    (2, 'B', 92.0),
    (2, 'C', 89.5),
    (3, 'A', 93.0),
    (3, 'B', 92.5),
    (3, 'C', 85.0);

Query test data

Retrieve the name, average score, and highest score for each student from the students and scores tables.

Q1: Use multiple subqueries to retrieve the required data. Perform two independent grouping and aggregation operations on the scores table, scanning it twice to calculate the average and highest scores. Then, use the WHERE clause to join the results with the students table.

SELECT st.name, sc.avg_score, scs.max_score
FROM students st,
(SELECT student_id, AVG(score) avg_score
FROM scores
GROUP BY student_id) sc,
(SELECT student_id, MAX(score) max_score
FROM scores
GROUP BY student_id) scs
WHERE sc.student_id = st.id
AND scs.student_id = st.id;

The result is as follows:

+-------+-----------+-----------+
| name | avg_score | max_score |
+-------+-----------+-----------+
| name1 | 89.333333 | 91.50 |
| name2 | 89.166667 | 92.00 |
| name3 | 90.166667 | 93.00 |
+-------+-----------+-----------+
3 rows in set

Q2: Use the LATERAL keyword to retrieve the required data. Use the LATERAL keyword (lateral derived table) to calculate the average and highest scores for each student in a single SQL statement. The scores table is scanned only once for each student ID.

SELECT st.name, ld_tbl.avg_score, ld_tbl.max_score
FROM students st,
LATERAL (SELECT AVG(score) avg_score, MAX(score) max_score
FROM scores sc
WHERE sc.student_id = st.id) ld_tbl;

The result is as follows:

+-------+-----------+-----------+
| name | avg_score | max_score |
+-------+-----------+-----------+
| name1 | 89.333333 | 91.50 |
| name2 | 89.166667 | 92.00 |
| name3 | 90.166667 | 93.00 |
+-------+-----------+-----------+
3 rows in set

Compared to Q1, Q2 may offer better performance when handling large datasets because it avoids scanning the scores table multiple times. Q2 is also more concise and easier to understand and maintain.

References