Skip to main content

Connection

Connection statements in a database are used to combine two or more tables in the database based on the connection conditions. The set generated by the "connection" operation can be saved as a table or used as a table.

The meaning of a connection statement is to combine the attributes of two tables based on their values. The types of connections in a database generally include inner join, outer join, semi join, and anti join. Semi joins and anti joins are rewritten using subqueries, and SQL does not have syntax for expressing semi joins and anti joins.

Connection conditions

Connection conditions can be divided into equi-joins (e.g., t1.a = t2.b) and non-equi-joins (e.g., t1.a < t2.b). Compared to non-equi-join conditions, equi-join conditions allow the use of efficient join algorithms in the database, such as hash join and merge-sort join.

Self-Join

A self-join is a join operation in which a table is joined with itself. The following example shows a self-join.

CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

SELECT * FROM t1 AS ta, t1 AS tb WHERE ta.b = tb.b;

Inner-Join

An inner join is the most basic type of join operation in a database. An inner join combines the columns of two tables (such as A and B) based on the join conditions, generating a new result table. The query compares each row in table A with each row in table B and finds the combinations that satisfy the join conditions. When the join conditions are met, the matching rows from tables A and B are combined into a single row in the result set. The result set generated by the join can be defined as follows: first, perform a Cartesian product (cross join) of the two tables, combining each row in table A with each row in table B, and then return the records that satisfy the join conditions.

CREATE TABLE t1(c1 INT,c2 INT);
Query OK, 0 rows affected
CREATE TABLE t2(c1 INT,c2 INT);
Query OK, 0 rows affected

SELECT * FROM t1 JOIN t2 USING(c1);

Outer-Join

An outer join does not require that every record in one table has a matching record in the other table. The table that retains all records, including those without matches, is called the preserving table.

Outer joins can be further divided into left outer joins, right outer joins, and full outer joins based on whether the rows of the left table, right table, or both tables are retained.

  • In a left outer join, if a row in the left table does not have a matching row in the right table, NULL is automatically filled in the right table.

  • In a right outer join, if a row in the right table does not have a matching row in the left table, NULL is automatically filled in the left table.

  • In a full outer join, if a row in the left table or the right table does not have a matching row, NULL is automatically filled.

CREATE TABLE t1(c1 INT,c2 INT);
Query OK, 0 rows affected
CREATE TABLE t2(c1 INT,c2 INT);
Query OK, 0 rows affected

SELECT * FROM t1 LEFT JOIN t2 ON t1.c1 = t2.c1;
SELECT * FROM t1 RIGHT JOIN t2 ON t1.c1 = t2.c1;
SELECT * FROM t1 FULL JOIN t2 ON t1.c1 = t2.c1;

Semi-Join

When performing a left or right semi-join between tables A and B, it returns all rows from table A that have a match in table B or all rows from table B that have a match in table A.

Semi joins can only be rewritten using subqueries, as shown below.

CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

INSERT INTO t1 VALUES (1, 1, 1),(2, 2, 2);
INSERT INTO t2 VALUES (1, 1, 1),(2, 2, 2);

SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);

When viewing the query plan using EXPLAIN, you can see that dependent subqueries are rewritten into semi joins.

EXPLAIN SELECT * FROM t1 WHERE t1.a IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);
| ========================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
----------------------------------------
|0 |MERGE SEMI JOIN| |2 |76 |
|1 | TABLE SCAN |t1 |2 |37 |
|2 | SORT | |2 |38 |
|3 | TABLE SCAN |t2 |2 |37 |
========================================
...

Anti-Join

When performing a left or right anti-join between tables A and B, it returns all rows from table A that do not have a match in table B or all rows from table B that do not have a match in table A.

Similar to semi joins, anti joins can only be rewritten using subqueries, as shown below.

CREATE TABLE t1(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

CREATE TABLE t2(a INT PRIMARY KEY, b INT, c INT);
Query OK, 0 rows affected

INSERT INTO t1 VALUES (1, 1, 1),(2, 2, 2);
INSERT INTO t2 VALUES (1, 1, 1),(2, 2, 2);

SELECT * FROM t1 WHERE t1.a NOT IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);

When viewing the query plan using EXPLAIN, you can see that dependent subqueries are rewritten into anti joins:

EXPLAIN SELECT * FROM t1 WHERE t1.a NOT IN (SELECT t2.b FROM t2 WHERE t2.c = t1.c);
| =============================================
|ID|OPERATOR |NAME|EST. ROWS|COST|
---------------------------------------------
|0 |HASH RIGHT ANTI JOIN| |0 |77 |
|1 | TABLE SCAN |t2 |2 |37 |
|2 | TABLE SCAN |t1 |2 |37 |
=============================================
...