CREATE INDEX
Description
This statement is used to create an index. An index is a structure created on a table to sort the values of one or more columns in the database table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.
In the current version of seekdb, the supported index types include unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the user-specified columns, the index table will automatically include the primary key columns of the primary table (or the Hidden PK if no primary key exists). Storing additional columns in the index provides the database optimizer with more options.
Limitations and considerations
- If all index columns in a table are deleted, the created index becomes invalid.
- Full-text indexes can be created only on columns of the
CHAR,VARCHAR, orTEXTtype. - Only local (
LOCAL) full-text indexes can be created in the current version. - The
UNIQUEkeyword cannot be specified when creating a full-text index. - To create a full-text index that involves multiple columns, ensure that these columns have the same character set.
- Only when you create a full-text index can you use
WITH PARSER {SPACE | NGRAM | BENG | IK}to specify a tokenizer.
Permissions
To execute the CREATE INDEX statement, the current user must have the INDEX permission on the corresponding object. For more information about the seekdb permissions, see seekdb permission classification.
Syntax
CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
[USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
[WITH PARSER tokenizer_option]
[PARSER_PROPERTIES[=](parser_properties_list)]
[index_option...] [partition_option] [index_column_group_option]
index_type:
SPATIAL
| UNIQUE
| FULLTEXT
sort_column_key:
column_name [(integer)] [ASC] [ID id]
| (index_expr) [ASC] [ID id]
tokenizer_option:
SPACE
| NGRAM
| BENG
| IK
| NGRAM2
parser_properties_list:
parser_properties, [parser_properties]
parser_properties:
min_token_size = int_value
| max_token_size = int_value
| ngram_token_size = int_value
| ik_mode = 'char_value'
| min_ngram_size = int_value
| max_ngram_size = int_value
index_option:
GLOBAL
| LOCAL
| BLOCK_SIZE integer
| COMMENT STRING_VALUE
| STORING (column_name [, column_name...])
| WITH_ROWID
| WITH PARSER STRING_VALUE
| index_using_algorithm
| visibility_option
| DATA_TABLE_ID data_table_id
| INDEX_TABLE_ID index_table_id
| VIRTUAL_COLUMN_ID virtual_column_id
| MAX_USED_PART_ID used_part_id
index_column_group_option:
WITH COLUMN GROUP([all columns, ]each column)
Parameters
| Parameter | Description |
|---|---|
| hint_options | The hint options. This parameter is optional. Currently, only the parallel option is supported, for example, /*+ parallel(10) */. |
| index_type | The type of the index. This parameter is optional. For more information about the index types, see index_type. |
| IF NOT EXISTS | Indicates whether to create the index if it already exists. If the index exists and the IF NOT EXISTS option is not specified, an error is returned. |
| index_name | The name of the index to be created. |
| USING BTREE | The index type. This parameter is optional. Currently, only USING BTREE is supported. |
| table_name | The name of the table on which to create the index. You can specify the table name directly or use the schema_name.table_name format to specify the table name and the name of the database to which the table belongs. |
| sort_column_key | The key of the sorting column. You can specify multiple sorting columns when you create an index. Separate the sorting columns with commas (,). For more information, see sort_column_key. |
| WITH PARSER tokenizer_option | The tokenizer for the full-text index. This parameter is optional. For more information, see tokenizer_option. |
| PARSER_PROPERTIES[=](parser_properties_list) | The properties of the tokenizer. This parameter is optional. For more information, see parser_properties. |
| index_option | The index options. You can specify multiple index options when you create an index. Separate the index options with spaces. For more information, see index_option. |
| partition_option | The options for creating index partitions. You can specify hash partitions, key partitions, range partitions, list partitions, or external table partitions. |
| index_column_group_option | The index options. For more information, see index_column_group_option. |
index_type
SPATIAL:This keyword specifies to create a spatial index.UNIQUE:This keyword specifies to create a unique index.FULLTEXT:This keyword specifies to create a full-text index.
For more information about how to create indexes of different types, see Create an index.
sort_column_key
-
column_name [(integer)] [ASC] [ID id]:This option specifies a column name as the sort key.-
column_name:The name of the column to be sorted. -
integer:The length limit of the sort key. This option is optional. -
ASC:This option specifies to sort in ascending order. Currently, descending (DESC) order is not supported. -
ID id:The ID of the sort key. This option is optional.
For example, create an index named
index3on thec1column of thet3table and specify that the index is sorted in ascending order.CREATE INDEX index3 ON t3 (c1 ASC); -
-
(index_expr) [ASC] [ID id]:This option specifies an index expression as the sort key. An index expression can be defined by using an expression or a function. It can contain the following options:-
(index_expr):The index expression. It can be a Boolean expression, such asc1=c1. Currently, seekdb does not support creating function indexes on generated columns. For information about the expressions supported by function indexes, see List of system functions supported by function indexes. -
ASC:This option specifies to sort in ascending order. Currently, descending (DESC) order is not supported. -
ID id:The ID of the sort key. This option is optional.
For example, create an index named
index4on thet4table, specifyc1+c2as the index expression, and specify that the index is sorted in ascending order.CREATE INDEX index4 ON t4 ((c1 + c2) ASC); -
You can specify multiple sort columns when you create an index. Separate the sort columns with commas (,). For example, create an index named index5 on the t5 table and specify that the index uses the c1 column and the c2+c3 expression as the sort keys.
CREATE INDEX index5 ON t5 (c1, (c2+c3));
tokenizer_option
-
SPACE:The default value. This option specifies to tokenize by using spaces. You can specify the following properties:Property Value Range min_token_size [1, 16] max_token_size [10, 84] -
NGRAM:This option specifies to tokenize by using the N-Gram (Chinese) tokenizer. You can specify the following properties:Property Value Range ngram_token_size [1, 10] -
NGRAM2:This option specifies to split text into continuous characters in themin_ngram_sizetomax_ngram_sizerange. You can specify the following properties:Property Value Range min_ngram_size [1, 16] max_ngram_size [1, 16] infoseekdb supports the
NGRAM2tokenizer. -
BENG:This option specifies to tokenize by using the Beng (basic English) tokenizer. You can specify the following properties:Property Value Range min_token_size [1, 16] max_token_size [10, 84] -
IK:This option specifies to tokenize by using the IK (Chinese) tokenizer. Currently, only theutf-8character set is supported. You can specify the following property:Property Value Range ik_mode smartmax_word
infoseekdb supports the
IKtokenizer.
You can use the TOKENIZE function to view the tokenization result of text in the specified tokenizer and JSON-formatted parameters.
parser_properties
-
min_token_size:The minimum token length. The default value is 3, and the value range is 1 to 16. -
max_token_size:The maximum token length. The default value is 84, and the value range is 10 to 84. -
ngram_token_size:The token length of theNGRAMtokenizer. This property is valid only for theNGRAMtokenizer. The default value is 2, and the value range is 1 to 10. -
ik_mode: The tokenization mode of theIKtokenizer. Valid values:smart: The default value. This mode uses the words in the dictionary to improve the accuracy of tokenization. The boundaries of the words in the dictionary are prioritized, which may reduce unnecessary expansion.max_word: This mode identifies the words defined in the dictionary but does not affect the maximum expansion of tokenization. Even if the dictionary defines words, themax_wordmode still tries to split the text into more words.
Here is an example:
-
Create a table named
tbl1.CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT); -
Create a full-text index on the
col2column of thetbl1table, specify theIKtokenizer, and set the tokenizer properties by usingPARSER_PROPERTIES.CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');
index_option
-
GLOBAL:This option specifies to create a global index. -
LOCAL:This option specifies to create a local index. -
BLOCK_SIZE integer:The size of an index block, that is, the number of bytes in each index block. -
COMMENT STRING_VALUE:The comment of the index. -
STORING (column_name [, column_name...]):The columns to be stored in the index. Separate the columns with commas (,). -
WITH_ROWID:This option specifies to create an index that contains the row identifier. -
WITH PARSER STRING_VALUE:The parser required by the index. -
index_using_algorithm:The algorithm used by the index. You can specify USING BTREE or USING HASH. However, you cannot create a multi-value index by using the USING HASH option. -
visibility_option:The visibility of the index. -
DATA_TABLE_ID data_table_id:The ID of the data table to which the index belongs. -
INDEX_TABLE_ID index_table_id:The ID of the index table. -
VIRTUAL_COLUMN_ID virtual_column_id:The ID of the virtual column. -
MAX_USED_PART_ID used_part_id:The maximum partition ID used by the index.
index_column_group_option
WITH COLUMN GROUP(all columns, each column): specifies to create a rowstore and columnstore redundant index.WITH COLUMN GROUP(each column): specifies to create a columnstore index.
Example
Run the following SQL statement to create a columnstore index for a table.
-
Run the following SQL statement to create a table named
test_tbl1.CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50)); -
Run the following SQL statement to create a columnstore index named
idx1_test_tbl1for thetest_tbl1table, which references thecol1column.CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);