User-defined variables
Values stored in user-defined variables can be referenced by other statements to pass values from one statement to another.
User variables are written in the format @var_name, where var_name consists of letters, numbers, . , _, and $. If a user variable name is referenced as a string or identifier (for example, @'my-obvar', @"my-obvar", or @`my-obvar`), the name can contain other characters. User variable names are case-insensitive and can be up to 64 characters long.
User-defined variables are session-specific. A user-defined variable defined by one client is not visible or usable by other clients. When the client exits, all variables specified for the client session are automatically released.
One way to set a user-defined variable is by using the SET statement:
SET @var_name = expr [, @var_name = expr] ...
For more information, see SET.
For the SET statement, both = and := can be used as assignment operators.
You can assign values of the following data types to a user variable: integer, decimal, floating-point, binary, or nonbinary string, or NULL. Assigning decimal or floating-point values does not preserve the precision or scale of the value. Values of other data types are converted to the allowed data types. For example, values of time or spatial data types are converted to binary strings.
To read a user-defined variable, you can use the SELECT statement.
If a nonbinary (character) string value is assigned to a user variable, it has the same character set and collation as the string. Hexadecimal or bit values assigned to a user variable are treated as binary strings. To assign a hexadecimal or bit value as a number to a user variable, you can add 0 or use the CAST function CAST(... AS UNSIGNED):
SET @v1 = X'43';
SET @v2 = X'43'+0;
SET @v3 = CAST(X'43' AS UNSIGNED);
SELECT @v1, @v2, @v3;
+------+------+------+
| @v1 | @v2 | @v3 |
+------+------+------+
| C | 67 | 67 |
+------+------+------+
1 row in set (0.001 sec)
SET @v1 = b'1000011';
SET @v2 = b'1000011'+0;
SET @v3 = CAST(b'1000011' AS UNSIGNED);
SELECT @v1, @v2, @v3;
+------+------+------+
| @v1 | @v2 | @v3 |
+------+------+------+
| C | 67 | 67 |
+------+------+------+
1 row in set (0.000 sec)
If you select the value of a user variable from a result set, it is returned to the client as a string. If you reference an uninitialized variable, its value is NULL and its type is string.
Limitations
User-defined variables in seekdb have the following limitations:
- seekdb can assign values to user variables in statements other than
SET, but this feature may be removed in future versions. - The evaluation order of expressions involving user variables is undefined. For example,
SELECT @a, @a:=@a+123cannot guarantee that@ais evaluated before the assignment expression. To avoid this issue, do not assign a value to a variable and read its value in the same statement. - When parallel execution is used, the correctness of user-defined variable calculations cannot be guaranteed. To ensure correctness, use
/*+USE_DAS(...)*/to specify serial execution. Future versions plan to optimize this to automatically use serial execution without specifying the hint.
Compatibility with seekdb
- In SQL statements, avoid using the
@var:=valsyntax and only read user variables, which is compatible with MySQL. For example,SELECT c1, c2 FROM t1 WHERE c1=@var;. - During SQL execution, using
:=to modify user variable values in real-time results in undefined behavior for user variables, which depends on the execution order. - During SQL execution, using
:=to modify user variable values in real-time, seekdb maintains compatibility with MySQL 5.6/5.7/8.0 in the following scenarios, while the results in other scenarios depend on the specific execution plan type and may differ from MySQL behavior.- When it appears in the
INSERT VALUESclause, for example,INSERT INTO t1 VALUES(@a:=1, @a);.
- When it appears in the
If user variable assignment is performed in an SQL statement and the execution result is affected by the previous value of the variable, there may be compatibility differences with MySQL when the execution fails and is retried, due to changes in the initial value of the user variable.
Initializing User Variables
If a user-defined variable is not initialized, the database defaults it to NULL. Therefore, when simulating window function behavior using user variables, it is common to initialize the user variable in the FROM subquery of the SQL statement using SELECT @var:=val FROM dual, forming a JOIN query with the main query, for example:
SELECT A.c1, (@rownum:= @rownum+1) AS rownum FROM t1 A,(SELECT @rownum:=0) B;
SELECT @rownum;
If the query uses the NESTED LOOP JOIN algorithm and B table is the inner table for the JOIN, and the outer table data is an empty set, the seekdb execution engine will adopt a short-circuit execution strategy and will not drive the B table for calculation. Therefore, the user variable initialization operation on the B table will not be executed, while MySQL will execute the initialization statement SELECT @var:=val FROM dual regardless of whether the outer table is empty.
-- MySQL will execute the rownum:=0 initialization action regardless of whether there is data in t1
SELECT A.c1, (@rownum:= @rownum+1) AS rownum FROM t1 A,(SELECT @rownum:=0) B;
Empty set, 2 warnings
SELECT @rownum;
+---------+
| @rownum |
+---------+
| 0 |
+---------+
1 row in set
-- seekdb uses Nested Loop Join. If t1 data is empty, it will not execute rownum:=0
SELECT /*+use_nl(A, B)*/ A.c1, (@rownum:= @rownum+1) AS rownum FROM t1 A,(SELECT @rownum:=0) B;
Empty set
SELECT @rownum;
+---------+
| @rownum |
+---------+
| NULL |
+---------+
1 row in set
To maintain compatibility with MySQL, seekdb treats user variable assignment statements in the FROM clause that do not involve physical tables as initialization operations. These initialization statements will be prioritized and executed in the specified order at the beginning of statement execution. For example:
SELECT /*+use_nl(t1, A)*/ * FROM t1, (SELECT @rownum:=0 FROM DUAL) A;
-- SELECT @rownum:=0 is a variable initialization operation and is executed first
SELECT * FROM t1 WHERE (SELECT @rownum:=0 FROM DUAL)=0;
-- SELECT @rownum:=0 is not in the FROM clause and is not an initialization statement. Whether it is executed depends on whether t1 has data
SELECT /*+use_nl(t1, A)*/ * FROM t1, (SELECT @rownum:=(SELECT c1 FROM t2)) A;
-- SELECT @rownum:=(SELECT c1 FROM t1) is in the FROM clause, but it accesses the physical table t2, so it is not an initialization statement
SELECT * FROM t1 WHERE (SELECT rownum FROM (SELECT @rownum:=0) t)=0;
-- SELECT @rownum:=0 is in the FROM clause and is an initialization statement
@var:=val in the FROM Subquery
When a user variable assignment operation appears in the FROM subquery, MySQL prohibits the merging of FROM subqueries and no longer iterates data references between the two layers of queries in a streaming manner. Instead, it materializes the results of the FROM subquery containing the := operator into a temporary table for the upper-level query to access, for example:
CREATE TABLE t1(c1 INT);
Query OK, 0 rows affected (0.039 sec)
INSERT INTO t1 VALUES(1), (2), (3);
Query OK, 3 rows affected (0.003 sec)
Records: 3 Duplicates: 0 Warnings: 0
SELECT @a, c1 FROM (SELECT @a:=c1, c1 FROM t1) t;
+------+------+
| @a | c1 |
+------+------+
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
+------+------+
3 rows in set (0.002 sec)
From the final result, we can see that SELECT @a:=c1 and c1 FROM t1 are materialized before executing the outer query SELECT @a, c1 FROM t.
MySQL's behavior is stable and is used by users to simulate various grouping calculations in window functions. The seekdb optimizer is compatible with this behavior.
When a user variable := operation appears in the FROM subquery, the OB optimizer will prohibit VIEW MERGE rewriting between layers of FROM subqueries and add a MATERIAL operator between each layer to block streaming data calculation between queries, ensuring compatibility with MySQL behavior.
EXPLAIN SELECT @a, c1 FROM (SELECT @a:=c1, c1 FROM t1) t;
+-------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------+
| ================================================= |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ------------------------------------------------- |
| |0 |SUBPLAN SCAN |t |3 |3 | |
| |1 |└─TABLE FULL SCAN|t1 |3 |3 | |
| ================================================= |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([('a')], [t.c1]), filter(nil) |
| access([t.c1]) |
| 1 - output([(T_OP_ASSIGN, 'a', t1.c1)], [t1.c1]), filter(nil) |
| access([t1.c1]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
+-------------------------------------------------------------------+
14 rows in set (0.002 sec)
Note that using := in the FROM subquery may cause the subquery to be calculated and materialized early, which may affect the optimization of the original statement, such as pushing join conditions down to the base table, and prevent the use of index conditions on the inner table in a NESTED LOOP JOIN, thus affecting execution efficiency, for example:
CREATE TABLE t1(a INT, b INT);
CREATE TABLE t2(a INT PRIMARY KEY, b INT);
EXPLAIN SELECT /*+use_nl(t1 t)*/ * FROM t1 JOIN (SELECT @b:=b AS b, a FROM t2) t ON t1.a=t.a WHERE t1.b=1;
+------------------------------------------------------------------------------------+
| Query Plan |
+------------------------------------------------------------------------------------+
| =================================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| --------------------------------------------------- |
| |0 |MERGE JOIN | |1 |5 | |
| |1 |├─SUBPLAN SCAN |t |1 |3 | |
| |2 |│ └─TABLE FULL SCAN|t2 |1 |3 | |
| |3 |└─SORT | |1 |3 | |
| |4 | └─TABLE FULL SCAN|t1 |1 |3 | |
| =================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([t1.a], [t1.b], [t.b], [t.a]), filter(nil) |
| equal_conds([t1.a = t.a]), other_conds(nil) |
| merge_directions([ASC]) |
| 1 - output([t.a], [t.b]), filter(nil) |
| access([t.a], [t.b]) |
| 2 - output([(T_OP_ASSIGN, 'b', t2.b)], [t2.a]), filter(nil) |
| access([t2.a], [t2.b]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([t2.a]), range(MIN ; MAX)always true |
| 3 - output([t1.a], [t1.b]), filter(nil) |
| sort_keys([t1.a, ASC]) |
| 4 - output([t1.a], [t1.b]), filter([t1.b = 1]) |
| access([t1.a], [t1.b]), partitions(p0) |
| is_index_back=false, is_global_index=false, filter_before_indexback[false], |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
+------------------------------------------------------------------------------------+
26 rows in set (0.011 sec)
@var:=val in DML Statements
seekdb allows using user variables := to extract new and old values during DML execution, optimizing client-server interactions, but with the following limitations:
-
Due to internal execution constraints, DML cannot guarantee the strict consistency of expression calculation order with user-defined order. Therefore, avoid assigning and accessing the same user variable in DML clauses, for example:
INSERT INTO t1(c1, c2) VALUES(@a:=1, @a+1); UPDATE t1 SET c1=@a:=1, c2=@a+1 WHERE c1=1;In both examples, it cannot be guaranteed that@a:=1will be executed before@a+1. -
In DML statements, using
@var:=valis only compatible with MySQL when affected_row=1. When affected_row>1, due to potential differences in SQL execution order compared to MySQL, the final user variable results may vary.
@var:=val appears in the SELECT clause
In some business scenarios, expressions may be redundantly calculated multiple times within the same clause. To reuse the calculation results, you can assign values to user variables and reference them later, for example, SELECT @a:=(c1+1), @a+c2, @a*c3 FROM t1. However, the official MySQL documentation clearly states that the execution order of expressions containing user variables cannot be guaranteed, so the results may not meet user expectations.
After investigating the specific behavior of MySQL, we found that when the above query appears in the SELECT clause of the top-level query or the FROM subquery, the execution order of expressions follows the user-defined order. However, in other clauses, this order is not guaranteed. For example:
SELECT @a:=(c1+1), @a+c2, @a*c3 FROM t1;
-- In the top-level query, expressions are executed in the user-defined order.
SELECT * FROM (SELECT @a:=(c1+1), @a+c2, @a*c3 FROM t1) t;
-- In the FROM subquery, expressions are executed in the user-defined order.
SELECT * FROM t1 WHERE (c1, c2) in (SELECT @a:=c1, @a+1 FROM t2);
-- In the WHERE subquery, the execution order of expressions containing user variables is not guaranteed, and the query results may be uncertain.
seekdb only supports deterministic behavior of MySQL. For non-deterministic behaviors, it may not be compatible with MySQL.
@var:=val appears in the WHERE clause
The result of the := operation in the WHERE clause depends on the execution order of the predicate conditions. MySQL's behavior is implementation-dependent, and seekdb does not guarantee compatibility in such scenarios. For example:
SELECT * FROM t1 WHERE (@a:=c1)>10;
-- The query results and the final value of @a depend on the execution, and are not guaranteed to be compatible with MySQL.
SELECT * FROM t1 WHERE c1 in (SELECT @a:=1 FROM t2);
-- Located in the WHERE clause, the results depend on the implementation and are not guaranteed to be compatible with MySQL.
@var:=val appears in the ORDER BY clause
The ORDER BY clause reorders the data, which disrupts the iteration order. When user variables := are used in the ORDER BY clause, the results may be unpredictable. seekdb cannot fully guarantee compatibility with MySQL in such scenarios. When user variables are used in the ORDER BY clause, the behavior is implementation-defined. For example:
CREATE TABLE t1(a INT PRIMARY KEY, b INT);
INSERT INTO t1 VALUES(1, 1), (2, 1), (3, 1), (4, 1), (5, 5), (6, 6);
-- MySQL behavior
SELECT @rownum:=@rownum+1 AS rn, MAX(a), sum(a) FROM t1, (SELECT @rownum:=1) t GROUP BY b ORDER BY rn;
+------+--------+--------+
| rn | max(a) | sum(a) |
+------+--------+--------+
| 2 | 4 | 10 |
| 3 | 5 | 5 |
| 4 | 6 | 6 |
+------+--------+--------+
3 rows in set, 2 warnings
-- When the ORDER BY order is ascending, the final result of rownum points to the last row value, which is 4.
SELECT @rownum;
+---------+
| @rownum |
+---------+
| 4 |
+---------+
1 row in set
SELECT @rownum:=@rownum+1 AS rn, MAX(a), SUM(a) FROM t1, (SELECT @rownum:=1) t GROUP BY b ORDER BY rn desc;
+------+--------+--------+
| rn | max(a) | sum(a) |
+------+--------+--------+
| 4 | 6 | 6 |
| 3 | 5 | 5 |
| 2 | 4 | 10 |
+------+--------+--------+
3 rows in set, 2 warnings
-- When the ORDER BY order is descending, the final result of rownum points to the last row value, which is 2.
SELECT @rownum;
+---------+
| @rownum |
+---------+
| 2 |
+---------+
1 row in set
-- seekdb behavior
SELECT @rownum:=@rownum+1 AS rn, MAX(a), SUM(a) FROM t1, (SELECT @rownum:=1) t GROUP BY b ORDER BY rn;
+------+--------+--------+
| rn | MAX(a) | SUM(a) |
+------+--------+--------+
| 2 | 4 | 10 |
| 3 | 5 | 5 |
| 4 | 6 | 6 |
+------+--------+--------+
3 rows in set, 1 warning (0.002 sec)
-- When the ORDER BY order is ascending, the final result of rownum points to the last row value, which is 4.
SELECT @rownum;
+---------+
| @rownum |
+---------+
| 4 |
+---------+
1 row in set (0.001 sec)
SELECT @rownum:=@rownum+1 AS rn, MAX(a), SUM(a) FROM t1, (SELECT @rownum:=1) t GROUP BY b ORDER BY rn desc;
+------+--------+--------+
| rn | MAX(a) | SUM(a) |
+------+--------+--------+
| 4 | 6 | 6 |
| 3 | 5 | 5 |
| 2 | 4 | 10 |
+------+--------+--------+
3 rows in set, 1 warning (0.002 sec)
-- When the ORDER BY order is descending, the final result of rownum points to the first row value, which is still 4.
SELECT @rownum;
+---------+
| @rownum |
+---------+
| 4 |
+---------+
1 row in set (0.000 sec)
Therefore, if user-defined variables := are used for assignment operations in the ORDER BY clause, or if columns resulting from := operations are indirectly referenced for sorting, seekdb cannot guarantee that the results will remain compatible with MySQL. Therefore, do not use the user-defined variable after the ORDER BY clause for further operations.
@var:=val will cause the query to be executed in serial
When a query contains the @var:=val operation, although seekdb cannot be fully compatible with MySQL in all scenarios, seekdb must ensure that queries containing such operations can be self-explained. Therefore, queries with @var:=val assignment operations will be executed in serial. For example:
EXPLAIN EXTENDED_NOADDR SELECT /*+parallel(2)*/ @a:=c1 FROM t1;
+-------------------------------------------------------------------+
| Query Plan |
+-------------------------------------------------------------------+
| =========================================================== |
| |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
| ----------------------------------------------------------- |
| |0 |DISTRIBUTED TABLE FULL SCAN|t1 |1 |4 | |
| =========================================================== |
| Outputs & filters: |
| ------------------------------------- |
| 0 - output([(T_OP_ASSIGN, 'a', t1.c1)]), filter(nil) |
| access([t1.c1]), partitions(p0) |
| is_index_back=false, is_global_index=false, |
| range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
| Used Hint: |
| ------------------------------------- |
| /*+ |
| |
| PARALLEL(2) |
| */ |
| Qb name trace: |
| ------------------------------------- |
| stmt_id:0, stmt_type:T_EXPLAIN |
| stmt_id:1, SEL$1 |
| Outline Data: |
| ------------------------------------- |
| /*+ |
| BEGIN_OUTLINE_DATA |
| FULL(@"SEL$1" "test"."t1"@"SEL$1") |
| USE_DAS(@"SEL$1" "test"."t1"@"SEL$1") |
| PARALLEL(2) |
| OPTIMIZER_FEATURES_ENABLE('4.0.0.0') |
| END_OUTLINE_DATA |
| */ |
| Optimization Info: |
| ------------------------------------- |
| t1: |
| table_rows:1 |
| physical_range_rows:1 |
| logical_range_rows:1 |
| index_back_rows:0 |
| output_rows:1 |
| table_dop:1 |
| dop_method:DAS DOP |
| avaiable_index_name:[t1] |
| stats version:0 |
| dynamic sampling level:0 |
| Plan Type: |
| LOCAL |
| Note: |
| Degree of Parallelism is 1 because of hint |
+-------------------------------------------------------------------+
48 rows in set