Skip to main content
Version: V1.0.0

Subqueries

A subquery is a query nested within another query. The outer query is referred to as the parent or outer query. The result of the subquery is passed back to the parent or outer query. The parent query then uses this value in its computation to determine the final output.

SQL allows for multiple levels of nested queries, meaning that a subquery can itself contain other subqueries. Additionally, subqueries can appear in various clauses of an SQL statement, such as SELECT, FROM, and WHERE.

Subqueries

In a database, subqueries can be categorized into correlated and noncorrelated subqueries. A correlated subquery is one whose execution depends on variables from the outer query. Therefore, a correlated subquery is typically evaluated multiple times. A noncorrelated subquery is one whose execution does not depend on variables from the outer query. A noncorrelated subquery is usually evaluated only once. The following examples show a noncorrelated subquery and a correlated subquery.

  • Noncorrelated subquery:

    SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM t2);
  • Correlated subquery that uses the outer query variable t1.b:

    SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM T2 WHERE t2.b = t1.b);

Scalar subqueries

A scalar subquery is a subquery that returns a single element. The result of a scalar subquery can be used as a scalar operand and has the basic characteristics of a scalar operand, such as data type, length, and whether it can be NULL. For example, the result of the following query can be used as a scalar operand:

CREATE TABLE t1 (c1 INTEGER, c2 VARCHAR(20) NOT NULL);
INSERT INTO t1 VALUES (1, 'OceanBase');
SELECT (SELECT c2 FROM t1);
+---------------------+
| (SELECT c2 FROM t1) |
+---------------------+
| OceanBase |
+---------------------+
1 row in set (0.001 sec)

The preceding query returns a value of the VARCHAR type with a length of 20, OceanBase, which is in the default character set and collation. The nullability of the value selected by a scalar subquery is not copied. Although the c2 column has a NOT NULL constraint, if the result of the subquery is NULL, the result will still be NULL.

Although a scalar subquery returns a single element, it cannot be used in all statements. If a statement only allows literals, a scalar subquery cannot be used to replace it.

The following example helps you better understand scalar subqueries.

The query returns 2 because the t2 table contains a column named c1 that is the same as the c1 column in the t1 table.

CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1);
CREATE TABLE t2 (c1 INT);
INSERT INTO t2 VALUES (2);
SELECT (SELECT c1 FROM t2) FROM t1;
+---------------------+
| (SELECT s1 FROM t2) |
+---------------------+
| 2 |
+---------------------+
1 row in set (0.001 sec)

In addition, a scalar subquery can be part of an expression, for example:

CREATE TABLE t1 (c1 INTEGER, c2 VARCHAR(20) NOT NULL);
INSERT INTO t1 VALUES (1, 'OceanBase');
SELECT HEX((SELECT c2 FROM t1 WHERE c1=1)) FROM t1;
+-------------------------------------+
| HEX((SELECT c2 FROM t1 WHERE c1=1)) |
+-------------------------------------+
| 4F6365616E42617365 |
+-------------------------------------+
1 row in set (0.001 sec)

Comparisons with subqueries

The following is a common syntax of a subquery:

operand operator {ANY | SOME | ALL} (subquery)
operand LIKE (subquery)
operand {IN | NOT IN | EXISTS | NOT EXISTS} (subquery)

In this syntax, operand is the operand used to compare with the result of the subquery. operator can be one of the following operators:

  • Equal (= or <=>)

  • Greater than (>)

  • Less than (<)

  • Greater than or equal to (>=)

  • Less than or equal to (<=)

  • Not equal (!= or <>)

  • LIKE for character or string comparisons.

The following subquery is used to query the values in the c1 column of the t1 table that are the same as the maximum value in the c2 column of the t2 table. This query cannot be rewritten as a join between the t1 and t2 tables.

CREATE TABLE t1 (c1 INT);
INSERT INTO t1 VALUES (1);
CREATE TABLE t2 (c2 INT);
INSERT INTO t2 VALUES (2);
SELECT * FROM t1 WHERE c1 = (SELECT MAX(c2) FROM t2);
tip

When comparing with a scalar, the subquery must return a scalar. When comparing with a row constructor, the subquery must be a row subquery and return a row with the same properties as the row constructor.

Clearly, only the result of a scalar subquery can be directly compared. If a subquery returns a set of values, the set must be processed. seekdb provides the following methods to convert a set into a scalar value:

  • Aggregation functions: Aggregate all values in the set into a single value and compare it with the operand.

  • ANY | SOME: Compare all values in the set with the operand. If any row satisfies the comparison condition, TRUE is returned. SOME is an alias for ANY.

  • ALL: Compare all values in the set with the operand. If all rows satisfy the comparison condition, TRUE is returned.

  • IN: Return TRUE if the operand is in the set. IN is equivalent to = ANY.

  • NOT IN: Return TRUE if the operand is not in the set. NOT IN is equivalent to <> ALL.

  • EXIST: Return TRUE if the subquery returns at least one row.

  • NOT EXIST: Return TRUE if the subquery returns no rows.

Row subqueries

A scalar subquery or a column subquery returns a scalar or a single column value, while a row subquery returns a single row, which can contain multiple column values.

The following subquery returns only one row:

CREATE TABLE t1(id INTEGER PRIMARY KEY,c1 INTEGER,c2 INTEGER);
CREATE TABLE t2(id INTEGER PRIMARY KEY,c3 INTEGER,c4 INTEGER);
SELECT * FROM t1
WHERE ROW(c1,c2) = (SELECT c3, c4 FROM t2 WHERE id = 10);

If the values of c3 and c4 match any pair of values in the c1 and c2 columns of the t1 table, all matching pairs of values in the c1 and c2 columns are returned. If no match is found, an empty set is returned. Additionally, if no WHERE condition is specified in the subquery, it may return multiple rows, which will cause an error. A row subquery can return only one row.

The expression ROW(c1,c2) can be abbreviated as (c1,c2). ROW() is called a row constructor. The row constructor and the row returned by the subquery must contain the same number of values. A row constructor is used to compare with a subquery that returns two or more columns. A row constructor cannot be used with a column subquery that returns a single column. The following query will return a syntax error:

SELECT * FROM t1 WHERE ROW(1) = (SELECT column1 FROM t2);
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your OceanBase version for the right syntax to use near ') = (SELECT c1 FROM t2)' at line 1

In the optimizer, a row constructor is expanded. The following two SQL statements are equivalent:

SELECT * FROM t1 WHERE (c1,c2) = (1,1);
SELECT * FROM t1 WHERE c1 = 1 AND c2 = 1;

Correlated subqueries

A correlated subquery is one that references columns from the outer query. For example, the following subquery references the c2 column of the t1 table in its WHERE condition:

SELECT * FROM t1 WHERE c1 IN (SELECT c3 FROM t2 WHERE t2.c4 = t1.c2);

For such statements, the query is evaluated from the innermost subquery to the outermost query. For example, in the following statement, x.c2 must be a column in the t2 table, not the t1 table, because the SELECT c1 FROM t2 AS x statement first renames the t2 table, while SELECT c1 FROM t1 is the outer query.

SELECT c1 FROM t1 AS x
WHERE x.c1 = (SELECT c1 FROM t2 AS x
WHERE x.c1 = (SELECT c1 FROM t3
WHERE x.c2 = t3.c1));

For subqueries in HAVING or ORDER BY clauses, seekdb will also look for column names in the outer SELECT list.

If an aggregate function in a correlated subquery contains an external reference, the function can only contain external references and cannot be contained within another function or expression.

Derived tables

A derived table is an expression that generates a table within the FROM clause of a query. For example, the query in the FROM clause of a SELECT statement can be a derived table:

SELECT ... FROM (subquery) [AS] table_name ...

As a derived table in the FROM clause, [AS] table_name is no longer an optional alias. Instead, it must be the name specified for the result set of the subquery. All columns of a derived table must have unique names.

Derived tables are often used for step-by-step calculations. For example, the following query calculates the average value for each group after grouping:

SELECT AVG(sum_c1)
FROM (SELECT SUM(c1) AS sum_cumn1
FROM t1 GROUP BY c1) AS t1;

A derived table can return a scalar, a column, a row, or a table.

Derived tables have the following limitations:

  • A derived table cannot be a correlated subquery.

  • A derived table cannot reference other tables in the same SELECT statement.

  • A derived table cannot contain external references.

Lateral derived tables

A lateral derived table is a feature that allows a subquery in the FROM clause to reference previously defined tables or derived tables in the FROM list.

Lateral derived tables provide more flexibility and efficiency when using subqueries, especially when the subquery depends on columns from the outer query.

Here is an example:

SELECT *
FROM A, LATERAL (SELECT * FROM B WHERE B.col1 = A.col1) AS derived_table;

For more information about lateral derived tables, see Lateral derived tables.

Subquery unnesting

Subquery unnesting is a database optimization strategy that moves some subqueries into the parent query. In essence, it converts certain subqueries into equivalent multi-table join operations. One clear benefit of this strategy is that it can leverage certain access paths, join methods, and join orders more effectively, reducing the number of query levels. Here's an example of subquery unnesting, where a subquery is rewritten as a join statement.

CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected
  • A subquery with no dependencies.

    EXPLAIN SELECT * FROM t1 WHERE t1.a IN (SELECT t2.a FROM t2);
    | ======================================
    |ID|OPERATOR |NAME|EST. ROWS|COST |
    --------------------------------------
    |0 |MERGE JOIN | |100001 |131664|
    |1 | TABLE SCAN|t1 |100000 |48372 |
    |2 | TABLE SCAN|t2 |100000 |41911 |
    ======================================
    ...
  • A subquery with dependencies is unnested and rewritten as a join.

    EXPLAIN SELECT * FROM t1 WHERE T1.A IN (SELECT T2.B FROM T2 WHERE T2.C = T1.C);
    | ===========================================
    |ID|OPERATOR |NAME |EST. ROWS|COST |
    --------------------------------------------
    |0 |MERGE JOIN | |9703 |215436|
    |1 | TABLE SCAN |t1 |100000 |64066 |
    |2 | SORT | |10001 |129621|
    |3 | SUBPLAN SCAN |VIEW1|10001 |111242|
    |4 | HASH DISTINCT| |10001 |109862|
    |5 | TABLE SCAN |t2 |100000 |64066 |
    ============================================
    ...