Character Sets
The default character set for the SeekDB service is utf8mb4.
seekdb currently supports the following character sets:
-
binary -
gbk -
gb18030 -
utf16 -
utf8mb4/utf8mb3infoTo support seamless migration, seekdb treats
UTF8as a synonym ofUTF8MB4in syntax.utf8mb3is an alias forutf8mb4. -
latin1 -
gb2312 -
gb18030_2022 -
ascii -
tis620 -
ujis -
euckr -
eucjpms -
cp932 -
utf16le -
sjis -
dec8 -
hkscs -
hkscs31 -
big5 -
cp850 -
hp8 -
macroman -
swe7
Implicit conversion from gb18030 to gb18030_2022 is not supported in the current versions of seekdb. However, users can explicitly convert a gb18030 string to gb18030_2022 by using CONVERT. This conversion does not pass through Unicode but retains the original encoding. In the following example, the encoding of the character ‘龴’ remains 0xFE59 both before and after the conversion.
SELECT HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)), HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030));
The return result is as follows:
+--------------------------------------------------+--------------------------------------------------+
| HEX(CONVERT(_gb18030 0xFE59 USING gb18030_2022)) | HEX(CONVERT(_gb18030_2022 0xFE59 USING gb18030)) |
+--------------------------------------------------+--------------------------------------------------+
| FE59 | FE59 |
+--------------------------------------------------+--------------------------------------------------+
1 row in set (0.001 sec)
View supported character sets
Use the following SHOW CHARSET statement to view the available character sets.
SHOW CHARSET;
The returned result is as follows:
+--------------+---------------------------+-------------------------+--------+
| Charset | Description | Default collation | Maxlen |
+--------------+---------------------------+-------------------------+--------+
| binary | Binary pseudo charset | binary | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| gbk | GBK charset | gbk_chinese_ci | 2 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| gb18030 | GB18030 charset | gb18030_chinese_ci | 4 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| gb18030_2022 | GB18030-2022 charset | gb18030_2022_chinese_ci | 4 |
| ascii | US ASCII | ascii_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| sjis | SJIS | sjis_japanese_ci | 2 |
| big5 | BIG5 | big5_chinese_ci | 2 |
| hkscs | HKSCS | hkscs_bin | 2 |
| hkscs31 | HKSCS-ISO UNICODE 31 | hkscs31_bin | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| swe7 | 7bit West European | swe7_swedish_ci | 1 |
+--------------+---------------------------+-------------------------+--------+
24 rows in set (0.008 sec)
Specify a non-default character set
seekdb allows you to specify a nondefault character set for communication with the server. For example, to use the gbk character set, you need to execute the following statement after connecting to the server:
SET NAMES gbk;
Take note of the following: The SET NAMES statement does not change the encoding of the characters input from the client. For example, if the client uses the gb18030_2022 encoding, you must configure the client encoding to gb18030_2022 using SET NAMES, otherwise garbled characters will be generated.
/* The client uses the UTF-8MB4 character set and creates a table t by default, which uses the UTF-8MB4 character set. */
CREATE TABLE t(c VARCHAR(100));
Query OK, 0 rows affected (0.069 sec)
/* Inserted a character with the utf8mb4 character set. */
INSERT INTO t VALUES (0x4368617264657379206572726f6c6520696e6465782074657374696e672063686563657379);
Query OK, 1 row affected (0.003 sec)
/* Modified the character set of the current session, but did not change the character set actually used by the client */
SET NAMES gb18030_2022;
Query OK, 0 rows affected (0.000 sec)
/* still uses utf8mb4 encoding to insert characters */
INSERT INTO t VALUES ('character set');
Query OK, 1 row affected (0.002 sec)
/* The data in table t does not appear to be scrambled. */
SELECT * FROM t;
+----------+
| c |
+----------+
| Character |
| Character |
+----------+
2 rows in set (0.002 sec)
ALTER SESSION SET NLS_CHARACTERSET = 'utf8mb4';
SET NAMES utf8mb4;
Query OK, 0 rows affected (0.000 sec)
SELECT * FROM t;
SELECT * FROM t;
+--------------+
| c |
+--------------+
| Description |
| Feature |
+--------------+
2 rows in set (0.001 sec)