TOKENIZE
Description
This function tokenizes text based on the specified tokenizer and JSON parameters.
Syntax
TOKENIZE('text', ['parser'], ['behavior_ctrl'])
Parameters
| Parameter | Description |
|---|---|
| text | The text to be tokenized. It can be of the TEXT, CHAR, or VARCHAR data type. |
| parser | The name of the tokenizer. Valid values: BENG (basic English), NGRAM (Chinese), SPACE (space), and IK (Chinese). |
| behavior_ctrl | JSON parameters for optional configurations. Valid values:
|
Examples
Use the TOKENIZE function to split the string I Love China into words, and use beng as the delimiter. Then, use JSON parameters to set the output options.
SELECT TOKENIZE('I Love China','beng', '[{"output": "all"}]');
The return result is as follows:
+--------------------------------------------------------+
| TOKENIZE('I Love China','beng', '[{"output": "all"}]') |
+--------------------------------------------------------+
| {"tokens": [{"love": 1}, {"china": 1}], "doc_len": 2} |
+--------------------------------------------------------+
1 row in set (0.001 sec)