Skip to main content

Data type conversion rules

Type conversion during expression evaluation

When an operator is used with operands of different types in an expression, type conversion is performed to make the operands compatible.

Conversion methods

seekdb supports explicit and implicit data type conversion.

Implicit data type conversion occurs when an operation requires a parameter of a specified data type, but the actual parameter value provided is not of that data type. In such cases, seekdb automatically converts the actual parameter value to the specified data type before proceeding with the operation.

For example, seekdb can automatically convert a string to a number and vice versa when needed.

SELECT 1+'1';
+-------+
| 1+'1' |
+-------+
| 2 |
+-------+
1 row in set (0.001 sec)

Explicit data type conversion in seekdb is achieved using the CAST function. Here's an example:

SELECT 31.4, CAST(31.4 AS TIME);
+------+--------------------+
| 31.4 | CAST(31.4 AS TIME) |
+------+--------------------+
| 31.4 | 00:00:31 |
+------+--------------------+
1 row in set (0.001 sec)

For more information about the CAST function, see CAST.

Conversion rules

Conversion rules for comparison operators

  • If one or both parameters are NULL, the comparison result is NULL, except for the <=> comparison operator. For NULL <=> NULL, the result is true, and no conversion is required.

  • If both parameters are strings, they are compared as strings.

  • If both parameters are integers, they are compared as integers.

  • If not compared with a number, hexadecimal values are treated as binary strings.

  • Single-row subqueries from one or more tables are not considered constants. For example, if a subquery returns an integer to be compared with a DATETIME value, both operands are treated as integers. Since integers cannot be converted to time values, to compare the operands as DATETIME values, use CAST() to explicitly convert the subquery value to DATETIME.

    If one parameter is a TIMESTAMP or DATETIME column and the other is a constant, the constant is converted to a timestamp before the comparison. For safety, always use complete date-time, date, or time strings when performing comparisons. For example, when using BETWEEN with date or time values, use CAST() to explicitly convert the values to the required data type for optimal results.

  • If one parameter is a decimal value, the comparison type depends on the other parameter. If the other parameter is a decimal or integer value, the parameters are compared as decimal values. If the other parameter is a floating-point value, the parameters are compared as floating-point values.

  • In all other cases, the parameters are compared as floating-point (real) numbers. For example, string and numeric operands are compared as floating-point numbers.

    Comparisons between floating-point numbers and large values of the INTEGER type are approximate, because integers are converted to double-precision floating-point numbers before the comparison, and not all 64-bit integers can be accurately represented.

Type downgrading

To achieve efficient comparisons, constants are aligned with column types as much as possible while maintaining the semantics of the comparison. Currently, most comparisons between columns and constants are supported under standard semantics. Remaining comparisons that require casting are listed separately:

  • For int or uint columns, constants of character type, decimal type, and floating-point types (double/float) can be downgraded to integer columns.
  • For decimal columns, floating-point types (double/float) can be downgraded to decimal columns.
  • For year columns, numeric types (including integers, floating-point types, and decimal type), character types, and time types (including date, datetime, timestamp, and time) can be downgraded to year columns.
  • For date columns, character types, datetime, timestamp, and time types can be downgraded to date columns.
  • For timestamp columns, datetime can be downgraded to timestamp.

The following scenarios involve casting on columns (excluding complex types such as arrays, XML, and JSON):

  • For float columns, comparisons are always performed using double, so casting is added when comparing with any constant.
  • For fixed double columns, such as those defined with double(10, 2), if the constant has more decimal places, the precision is aligned during the comparison, and casting is added to the column.
  • For character columns, comparisons between character and numeric types are performed based on numeric order, and the column is cast to a numeric type. Additionally, casting may be added to the column based on the character set comparison rules.

Non-standard comparisons

The logic and type downgrading for non-standard comparisons are similar. They only handle comparisons between constants and columns. If two columns are being compared, no action is taken. Currently, only comparisons between integer constants and character columns are supported. Note that if range is set, the sorting results may not conform to standard semantics. Therefore, the setting should be chosen based on the data characteristics.

Here are some examples:

  1. Create a table named t1 with a biz_day column of type varchar(10).

    CREATE TABLE t1 (biz_day varchar(10));
    Query OK, 0 rows affected (0.216 sec)
  2. View the execution plan under default settings.

    EXPLAIN SELECT * FROM t1 WHERE biz_day BETWEEN 20200101 AND 20200201;
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | =============================================== |
    | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
    | ----------------------------------------------- |
    | |0 |TABLE FULL SCAN|t1 |1 |3 | |
    | =============================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([t1.biz_day]), filter([cast(t1.biz_day, DECIMAL(-1, -1)) >= cast(20200101, DECIMAL(20, 0))], [cast(t1.biz_day, DECIMAL(-1, -1)) <= cast(20200201, |
    | DECIMAL(20, 0))]), rowset=16 |
    | access([t1.biz_day]), partitions(p0) |
    | is_index_back=false, is_global_index=false, filter_before_indexback[false,false], |
    | range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    12 rows in set (0.003 sec)
    • Query plan: By default, the query optimizer converts the biz_day column from a string type to a numeric type before the comparison.
    • Filter condition: filter([cast(t1.biz_day, DECIMAL(-1, -1)) &gt;= 20200101], [cast(t1.biz_day, DECIMAL(-1, -1)) &lt;= 20200201]) indicates that the biz_day column is converted to a numeric type for range comparison.
  3. Set non_standard_comparison_level to equal and view the execution plan.

    EXPLAIN SELECT /*+opt_param('non_standard_comparison_level', 'equal')*/ * FROM t1 WHERE biz_day BETWEEN 20200101 AND 20200201;
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | ====================================================== |
    | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
    | ------------------------------------------------------ |
    | |0 |COLUMN TABLE FULL SCAN|t1 |1 |3 | |
    | ====================================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([t1.biz_day]), filter([cast(t1.biz_day, DECIMAL(-1, -1)) >= cast(20200101, DECIMAL(20, 0))], [cast(t1.biz_day, DECIMAL(-1, -1)) <= cast(20200201, |
    | DECIMAL(20, 0))]), rowset=16 |
    | access([t1.biz_day]), partitions(p0) |
    | is_index_back=false, is_global_index=false, filter_before_indexback[false,false], |
    | range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    12 rows in set (0.003 sec)
    • Query plan: After setting non_standard_comparison_level to equal, the query optimizer converts the biz_day column from a string type to a decimal type before the comparison.
    • Filter condition: filter([cast(t1.biz_day, DECIMAL(-1, -1)) >= cast(20200101, DECIMAL(20, 0))], [cast(t1.biz_day, DECIMAL(-1, -1)) <= cast(20200201, DECIMAL(20, 0))]) indicates that the biz_day column is converted to a decimal type for range comparison. Since between is a range query, the equal value comparison rule for equal does not convert the constant type.
  4. Set non_standard_comparison_level to range and view the execution plan.

    EXPLAIN SELECT /*+opt_param('non_standard_comparison_level', 'range')*/ * FROM t1 WHERE biz_day BETWEEN 20200101 AND 20200201;
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | Query Plan |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | =============================================== |
    | |ID|OPERATOR |NAME|EST.ROWS|EST.TIME(us)| |
    | ----------------------------------------------- |
    | |0 |TABLE FULL SCAN|t1 |1 |3 | |
    | =============================================== |
    | Outputs & filters: |
    | ------------------------------------- |
    | 0 - output([t1.biz_day]), filter([cast(t1.biz_day, DECIMAL(-1, -1)) >= cast(20200101, DECIMAL(20, 0))], [cast(t1.biz_day, DECIMAL(-1, -1)) <= cast(20200201, |
    | DECIMAL(20, 0))]), rowset=16 |
    | access([t1.biz_day]), partitions(p0) |
    | is_index_back=false, is_global_index=false, filter_before_indexback[false,false], |
    | range_key([t1.__pk_increment]), range(MIN ; MAX)always true |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------+
    12 rows in set (0.002 sec)
    • Query plan: After setting non_standard_comparison_level to range, the query optimizer converts integer constants to a string type before the comparison.
    • Filter condition: filter([t1.biz_day >= demote_cast(20200101, VARCHAR(10))], [t1.biz_day <= demote_cast(20200201, VARCHAR(10))]) indicates that integer constants are converted to a string type for range comparison. After non-standard comparison, the comparison is performed based on character order. For example, '202002010' belongs to the character order ['20200101', '20200201'], but it does not fall within the numeric order range [20200101, 20200201].
  • The conversion methods from strings to floating-point numbers and from integers to floating-point numbers may not be the same. The results may be affected by factors such as the compiler version. One way to avoid such issues is to use CAST() to prevent the value from being implicitly converted to a floating-point number.

  • When numeric or time values are implicitly converted to strings, the resulting value has a character set and collation determined by the character_set_connection and collation_connection system variables.

    This means that such conversions generate non-binary strings (CHAR, VARCHAR, or LONGTEXT values). If the connection character set is set to binary, the conversion result is a binary string (BINARY, VARBINARY, or LONGBLOB value).

  • The type conversion during integer expression evaluation is slightly different. For example, when an integer expression is used as part of a CREATE TABLE statement, the new table created will have a column of type INT or BIGINT based on the result type of the expression:

    CREATE TABLE t SELECT integer_expr;

    If the expression's maximum length is not suitable for INT, BIGINT is used. However, you can force the use of BIGINT instead of INT by using an expression with sufficient length:

    CREATE TABLE t SELECT 000000000000000000000;
    Query OK, 1 row affected (0.145 sec)

Implicit data type conversion rules

seekdb automatically converts a value from one data type to another when the conversion makes sense.

The following table shows the implicit conversion matrix for all data types. It does not consider the direction of conversion or the context in which the conversion occurs.

Data typeBOOLINTSMALLINTMEDIUMINTBIGINTSERIALDECIMALNUMERICFLOATDOUBLEBITDATETIMETIMESTAMPDATETIME
YEARYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
CHARYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
VARCHARYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
BINARYYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
VARBINARYYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
TEXTYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
TINYBLOBYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
MEDIUMBLOBYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
LONGBLOBYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
TINYTEXTYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
MEDIUMTEXTYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
LONGTEXTYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
ENUMYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
SETYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes
BLOBYesYesYesYesYesYesYesYesYesYesYesYesYesYesYes

Implicit data type conversion examples

SELECT CAST(BOOL_COLUMN AS YEAR) FROM YOUR_TABLE;
+---------------------------+
| CAST(BOOL_COLUMN AS YEAR) |
+---------------------------+
| 2001 |
+---------------------------+
2 rows in set (0.001 sec)