Column-level character sets and collations
Each "character" column (i.e., a column of type CHAR, VARCHAR, TEXT, or any synonym) has a column character set and a column collation.
Specify the character set and collation for a column
The CREATE TABLE and ALTER TABLE statements have clauses for specifying the character set and collation for a column. The syntax is as follows:
CREATE TABLE table_name (
column_name {CHAR | VARCHAR | TEXT} (column_length)
[CHARACTER SET charset_name]
[COLLATE collation_name]
);
ALTER TABLE table_name MODIFY
col_name {CHAR | VARCHAR | TEXT} (column_length)
CHARACTER SET [=] charset_name
[COLLATE [=] collation_name]
Here are some examples:
CREATE TABLE t (
col1 VARCHAR(5)
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci);
Query OK, 0 rows affected (0.055 sec)
ALTER TABLE t MODIFY
col1 VARCHAR(5)
CHARACTER SET gbk
COLLATE gbk_chinese_ci;
Query OK, 0 rows affected (0.477 sec)
Choose the character set and collation for a column
seekdb chooses the character set and collation for a column as follows:
-
If both
CHARACTER SET charset_nameandCOLLATE collation_nameare specified, the character setcharset_nameand collationcollation_nameare used. -
If
CHARACTER SET charset_nameis specified butCOLLATEis not, the character setcharset_nameand its default collation are used. To view the default collation for each character set, use theSHOW CHARACTER SETstatement. -
If
COLLATE collation_nameis specified butCHARACTER SETis not, the character set associated withcollation_nameand the specified collation are used. -
If neither
CHARACTER SETnorCOLLATEis specified, the character set and collation of the table are used.
If the character set and collation for a column are not specified in the column definition, the character set and collation of the table are used as the default values for the column.
Additionally, after setting the COLLATE for a column, you can directly use ORDER BY to sort the data based on the specified collation. You can also use COLLATE after ORDER BY to set the same character set with different collations.
CREATE TABLE t(a VARCHAR(10) COLLATE gb18030_2022_radical_ci);
Query OK, 0 rows affected (0.066 sec)
INSERT INTO t VALUES('a'),('A'),('b'),('B');
Query OK, 4 rows affected (0.003 sec)
Records: 4 Duplicates: 0 Warnings: 0
SELECT a FROM t ORDER BY a;
+------+
| a |
+------+
| a |
| A |
| b |
| B |
+------+
4 rows in set (0.002 sec)
SELECT a FROM t ORDER BY a COLLATE gb18030_2022_chinese_cs;
+------+
| a |
+------+
| A |
| B |
| a |
| b |
+------+
4 rows in set (0.002 sec)
Conditions for converting column character sets
To convert a binary or non-binary string column to a specified character set, you can use ALTER TABLE. For the conversion to be successful, one of the following conditions must be met:
-
If the column has a binary data type (
BINARY,VARBINARY, orBLOB), all values in the column must be encoded using a single character set (i.e., the character set to which the column is being converted). If a binary column is used to store information from multiple character sets, seekdb cannot determine which character set is used for each value and cannot correctly convert the data. -
If the column has a non-binary data type (
CHAR,VARCHAR, orTEXT), its content should be encoded according to the column's character set. If the column content is encoded using different character sets, you can first convert the column to a binary data type and then convert it to a non-binary column with the desired character set.