Version: V1.0.0

CREATE INDEX

Description

This statement is used to create an index. An index is a structure created on a table to sort the values of one or more columns in the database table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.

In the current version of seekdb, the supported index types include unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the user-specified columns, the index table will automatically include the primary key columns of the primary table (or the Hidden PK if no primary key exists). Storing additional columns in the index provides the database optimizer with more options.

Limitations and considerations

If all index columns in a table are deleted, the created index becomes invalid.
Full-text indexes can be created only on columns of the CHAR, VARCHAR, or TEXT type.
Only local (LOCAL) full-text indexes can be created in the current version.
The UNIQUE keyword cannot be specified when creating a full-text index.
To create a full-text index that involves multiple columns, ensure that these columns have the same character set.
Only when you create a full-text index can you use WITH PARSER {SPACE | NGRAM | BENG | IK} to specify a tokenizer.

Permissions

To execute the CREATE INDEX statement, the current user must have the INDEX permission on the corresponding object. For more information about the seekdb permissions, see seekdb permission classification.

Syntax

CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
    [USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
    [WITH PARSER tokenizer_option]
    [PARSER_PROPERTIES[=](parser_properties_list)]
    [index_option...] [partition_option] [index_column_group_option]

index_type:
    SPATIAL
    | UNIQUE
    | FULLTEXT

sort_column_key:
      column_name [(integer)] [ASC] [ID id]
    | (index_expr) [ASC] [ID id]

tokenizer_option:
    SPACE
    | NGRAM
    | BENG
    | IK
    | NGRAM2

parser_properties_list:
    parser_properties, [parser_properties]

parser_properties:
    min_token_size = int_value
    | max_token_size = int_value
    | ngram_token_size = int_value
    | ik_mode = 'char_value'
    | min_ngram_size = int_value
    | max_ngram_size = int_value

index_option:
      GLOBAL
    | LOCAL
    | BLOCK_SIZE integer
    | COMMENT STRING_VALUE
    | STORING (column_name [, column_name...])
    | WITH_ROWID
    | WITH PARSER STRING_VALUE
    | index_using_algorithm
    | visibility_option
    | DATA_TABLE_ID data_table_id
    | INDEX_TABLE_ID index_table_id
    | VIRTUAL_COLUMN_ID virtual_column_id
    | MAX_USED_PART_ID used_part_id

index_column_group_option：
    WITH COLUMN GROUP([all columns, ]each column)

Parameters

Parameter	Description
hint_options	The hint options. This parameter is optional. Currently, only the `parallel` option is supported, for example, `/+ parallel(10) /`.
index_type	The type of the index. This parameter is optional. For more information about the index types, see index_type.
IF NOT EXISTS	Indicates whether to create the index if it already exists. If the index exists and the `IF NOT EXISTS` option is not specified, an error is returned.
index_name	The name of the index to be created.
USING BTREE	The index type. This parameter is optional. Currently, only `USING BTREE` is supported.
table_name	The name of the table on which to create the index. You can specify the table name directly or use the `schema_name.table_name` format to specify the table name and the name of the database to which the table belongs.
sort_column_key	The key of the sorting column. You can specify multiple sorting columns when you create an index. Separate the sorting columns with commas (`,`). For more information, see sort_column_key.
WITH PARSER tokenizer_option	The tokenizer for the full-text index. This parameter is optional. For more information, see tokenizer_option.
PARSER_PROPERTIES[=](parser_properties_list)	The properties of the tokenizer. This parameter is optional. For more information, see parser_properties.
index_option	The index options. You can specify multiple index options when you create an index. Separate the index options with spaces. For more information, see index_option.
partition_option	The options for creating index partitions. You can specify hash partitions, key partitions, range partitions, list partitions, or external table partitions.
index_column_group_option	The index options. For more information, see index_column_group_option.

index_type

SPATIAL：This keyword specifies to create a spatial index.
UNIQUE：This keyword specifies to create a unique index.
FULLTEXT：This keyword specifies to create a full-text index.

For more information about how to create indexes of different types, see Create an index.

sort_column_key

column_name [(integer)] [ASC] [ID id]：This option specifies a column name as the sort key.
- column_name：The name of the column to be sorted.
- integer：The length limit of the sort key. This option is optional.
- ASC：This option specifies to sort in ascending order. Currently, descending (DESC) order is not supported.
- ID id：The ID of the sort key. This option is optional.
For example, create an index named index3 on the c1 column of the t3 table and specify that the index is sorted in ascending order.
```
CREATE INDEX index3 ON t3 (c1 ASC);
```
(index_expr) [ASC] [ID id]：This option specifies an index expression as the sort key. An index expression can be defined by using an expression or a function. It can contain the following options:
- (index_expr)：The index expression. It can be a Boolean expression, such as c1=c1. Currently, seekdb does not support creating function indexes on generated columns. For information about the expressions supported by function indexes, see List of system functions supported by function indexes.
- ASC：This option specifies to sort in ascending order. Currently, descending (DESC) order is not supported.
- ID id：The ID of the sort key. This option is optional.
For example, create an index named index4 on the t4 table, specify c1+c2 as the index expression, and specify that the index is sorted in ascending order.
```
CREATE INDEX index4 ON t4 ((c1 + c2) ASC);
```

You can specify multiple sort columns when you create an index. Separate the sort columns with commas (,). For example, create an index named index5 on the t5 table and specify that the index uses the c1 column and the c2+c3 expression as the sort keys.

CREATE INDEX index5 ON t5 (c1, (c2+c3));

tokenizer_option

SPACE：The default value. This option specifies to tokenize by using spaces. You can specify the following properties:

Property Value Range
min_token_size [1, 16]
max_token_size [10, 84]
NGRAM：This option specifies to tokenize by using the N-Gram (Chinese) tokenizer. You can specify the following properties:

Property Value Range
ngram_token_size [1, 10]
NGRAM2：This option specifies to split text into continuous characters in the min_ngram_size to max_ngram_size range. You can specify the following properties:

Property Value Range
min_ngram_size [1, 16]
max_ngram_size [1, 16]

info
seekdb supports the NGRAM2 tokenizer.
BENG：This option specifies to tokenize by using the Beng (basic English) tokenizer. You can specify the following properties:

Property Value Range
min_token_size [1, 16]
max_token_size [10, 84]
IK：This option specifies to tokenize by using the IK (Chinese) tokenizer. Currently, only the utf-8 character set is supported. You can specify the following property:
Property Value Range
ik_mode
smart
max_word
info
seekdb supports the IK tokenizer.

Property	Value Range
min_token_size	[1, 16]
max_token_size	[10, 84]

Property	Value Range
ngram_token_size	[1, 10]

Property	Value Range
min_ngram_size	[1, 16]
max_ngram_size	[1, 16]

Property	Value Range
min_token_size	[1, 16]
max_token_size	[10, 84]

Property	Value Range
ik_mode	`smart` `max_word`

You can use the TOKENIZE function to view the tokenization result of text in the specified tokenizer and JSON-formatted parameters.

parser_properties

min_token_size：The minimum token length. The default value is 3, and the value range is 1 to 16.
max_token_size：The maximum token length. The default value is 84, and the value range is 10 to 84.
ngram_token_size：The token length of the NGRAM tokenizer. This property is valid only for the NGRAM tokenizer. The default value is 2, and the value range is 1 to 10.
ik_mode: The tokenization mode of the IK tokenizer. Valid values:
- smart: The default value. This mode uses the words in the dictionary to improve the accuracy of tokenization. The boundaries of the words in the dictionary are prioritized, which may reduce unnecessary expansion.
- max_word: This mode identifies the words defined in the dictionary but does not affect the maximum expansion of tokenization. Even if the dictionary defines words, the max_word mode still tries to split the text into more words.

Here is an example:

Create a table named tbl1.

CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT);

Create a full-text index on the col2 column of the tbl1 table, specify the IK tokenizer, and set the tokenizer properties by using PARSER_PROPERTIES.
```
CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');
```

index_option

GLOBAL：This option specifies to create a global index.
LOCAL：This option specifies to create a local index.
BLOCK_SIZE integer：The size of an index block, that is, the number of bytes in each index block.
COMMENT STRING_VALUE：The comment of the index.
STORING (column_name [, column_name...])：The columns to be stored in the index. Separate the columns with commas (,).
WITH_ROWID：This option specifies to create an index that contains the row identifier.
WITH PARSER STRING_VALUE：The parser required by the index.
index_using_algorithm：The algorithm used by the index. You can specify USING BTREE or USING HASH. However, you cannot create a multi-value index by using the USING HASH option.
visibility_option：The visibility of the index.
DATA_TABLE_ID data_table_id：The ID of the data table to which the index belongs.
INDEX_TABLE_ID index_table_id：The ID of the index table.
VIRTUAL_COLUMN_ID virtual_column_id：The ID of the virtual column.
MAX_USED_PART_ID used_part_id：The maximum partition ID used by the index.

index_column_group_option

WITH COLUMN GROUP(all columns, each column) : specifies to create a rowstore and columnstore redundant index.
WITH COLUMN GROUP(each column) : specifies to create a columnstore index.

Example

Run the following SQL statement to create a columnstore index for a table.

Run the following SQL statement to create a table named test_tbl1.
```
CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50));
```
Run the following SQL statement to create a columnstore index named idx1_test_tbl1 for the test_tbl1 table, which references the col1 column.
```
CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);
```

CREATE INDEX

Description

Limitations and considerations

Permissions

Syntax

Parameters

index_type

sort_column_key

tokenizer_option

parser_properties

index_option

index_column_group_option

Example

References

Contents

Description​

Limitations and considerations​

Permissions​

Syntax​

Parameters​

index_type​

sort_column_key​

tokenizer_option​

parser_properties​

index_option​

index_column_group_option​

Example​

References​

Contents

Description

Limitations and considerations

Permissions

Syntax

Parameters

index_type

sort_column_key

tokenizer_option

parser_properties

index_option

index_column_group_option

Example

References