Skip to main content
Version: V1.0.0

CREATE INDEX

Description

This statement is used to create an index. An index is a structure created on a table to sort the values of one or more columns in the database table. Its main purpose is to improve query speed and reduce the performance overhead of the database system.

In the current version of seekdb, the supported index types include unique and non-unique indexes. When creating an index, you can use the STORING(column_name,...) clause to specify redundant columns in the index table. In addition to the user-specified columns, the index table will automatically include the primary key columns of the primary table (or the Hidden PK if no primary key exists). Storing additional columns in the index provides the database optimizer with more options.

Limitations and considerations

  • If all index columns in a table are deleted, the created index becomes invalid.
  • Full-text indexes can be created only on columns of the CHAR, VARCHAR, or TEXT type.
  • Only local (LOCAL) full-text indexes can be created in the current version.
  • The UNIQUE keyword cannot be specified when creating a full-text index.
  • To create a full-text index that involves multiple columns, ensure that these columns have the same character set.
  • Only when you create a full-text index can you use WITH PARSER {SPACE | NGRAM | BENG | IK} to specify a tokenizer.

Permissions

To execute the CREATE INDEX statement, the current user must have the INDEX permission on the corresponding object. For more information about the seekdb permissions, see seekdb permission classification.

Syntax

CREATE [hint_options] [index_type] INDEX [IF NOT EXISTS] index_name
[USING BTREE] ON table_name (sort_column_key [, sort_column_key... ])
[WITH PARSER tokenizer_option]
[PARSER_PROPERTIES[=](parser_properties_list)]
[index_option...] [partition_option] [index_column_group_option]

index_type:
SPATIAL
| UNIQUE
| FULLTEXT

sort_column_key:
column_name [(integer)] [ASC] [ID id]
| (index_expr) [ASC] [ID id]

tokenizer_option:
SPACE
| NGRAM
| BENG
| IK
| NGRAM2

parser_properties_list:
parser_properties, [parser_properties]

parser_properties:
min_token_size = int_value
| max_token_size = int_value
| ngram_token_size = int_value
| ik_mode = 'char_value'
| min_ngram_size = int_value
| max_ngram_size = int_value

index_option:
GLOBAL
| LOCAL
| BLOCK_SIZE integer
| COMMENT STRING_VALUE
| STORING (column_name [, column_name...])
| WITH_ROWID
| WITH PARSER STRING_VALUE
| index_using_algorithm
| visibility_option
| DATA_TABLE_ID data_table_id
| INDEX_TABLE_ID index_table_id
| VIRTUAL_COLUMN_ID virtual_column_id
| MAX_USED_PART_ID used_part_id

index_column_group_option:
WITH COLUMN GROUP([all columns, ]each column)

Parameters

ParameterDescription
hint_optionsThe hint options. This parameter is optional. Currently, only the parallel option is supported, for example, /*+ parallel(10) */.
index_typeThe type of the index. This parameter is optional. For more information about the index types, see index_type.
IF NOT EXISTSIndicates whether to create the index if it already exists. If the index exists and the IF NOT EXISTS option is not specified, an error is returned.
index_nameThe name of the index to be created.
USING BTREEThe index type. This parameter is optional. Currently, only USING BTREE is supported.
table_nameThe name of the table on which to create the index. You can specify the table name directly or use the schema_name.table_name format to specify the table name and the name of the database to which the table belongs.
sort_column_keyThe key of the sorting column. You can specify multiple sorting columns when you create an index. Separate the sorting columns with commas (,). For more information, see sort_column_key.
WITH PARSER tokenizer_optionThe tokenizer for the full-text index. This parameter is optional. For more information, see tokenizer_option.
PARSER_PROPERTIES[=](parser_properties_list)The properties of the tokenizer. This parameter is optional. For more information, see parser_properties.
index_optionThe index options. You can specify multiple index options when you create an index. Separate the index options with spaces. For more information, see index_option.
partition_optionThe options for creating index partitions. You can specify hash partitions, key partitions, range partitions, list partitions, or external table partitions.
index_column_group_optionThe index options. For more information, see index_column_group_option.

index_type

  • SPATIAL:This keyword specifies to create a spatial index.
  • UNIQUE:This keyword specifies to create a unique index.
  • FULLTEXT:This keyword specifies to create a full-text index.

For more information about how to create indexes of different types, see Create an index.

sort_column_key

  • column_name [(integer)] [ASC] [ID id]:This option specifies a column name as the sort key.

    • column_name:The name of the column to be sorted.

    • integer:The length limit of the sort key. This option is optional.

    • ASC:This option specifies to sort in ascending order. Currently, descending (DESC) order is not supported.

    • ID id:The ID of the sort key. This option is optional.

    For example, create an index named index3 on the c1 column of the t3 table and specify that the index is sorted in ascending order.

    CREATE INDEX index3 ON t3 (c1 ASC);
  • (index_expr) [ASC] [ID id]:This option specifies an index expression as the sort key. An index expression can be defined by using an expression or a function. It can contain the following options:

    • (index_expr):The index expression. It can be a Boolean expression, such as c1=c1. Currently, seekdb does not support creating function indexes on generated columns. For information about the expressions supported by function indexes, see List of system functions supported by function indexes.

    • ASC:This option specifies to sort in ascending order. Currently, descending (DESC) order is not supported.

    • ID id:The ID of the sort key. This option is optional.

    For example, create an index named index4 on the t4 table, specify c1+c2 as the index expression, and specify that the index is sorted in ascending order.

    CREATE INDEX index4 ON t4 ((c1 + c2) ASC);

You can specify multiple sort columns when you create an index. Separate the sort columns with commas (,). For example, create an index named index5 on the t5 table and specify that the index uses the c1 column and the c2+c3 expression as the sort keys.

CREATE INDEX index5 ON t5 (c1, (c2+c3));

tokenizer_option

  • SPACE:The default value. This option specifies to tokenize by using spaces. You can specify the following properties:

    PropertyValue Range
    min_token_size[1, 16]
    max_token_size[10, 84]
  • NGRAM:This option specifies to tokenize by using the N-Gram (Chinese) tokenizer. You can specify the following properties:

    PropertyValue Range
    ngram_token_size[1, 10]
  • NGRAM2:This option specifies to split text into continuous characters in the min_ngram_size to max_ngram_size range. You can specify the following properties:

    PropertyValue Range
    min_ngram_size[1, 16]
    max_ngram_size[1, 16]
    info

    seekdb supports the NGRAM2 tokenizer.

  • BENG:This option specifies to tokenize by using the Beng (basic English) tokenizer. You can specify the following properties:

    PropertyValue Range
    min_token_size[1, 16]
    max_token_size[10, 84]
  • IK:This option specifies to tokenize by using the IK (Chinese) tokenizer. Currently, only the utf-8 character set is supported. You can specify the following property:

    PropertyValue Range
    ik_mode
    • smart
    • max_word
    info

    seekdb supports the IK tokenizer.

You can use the TOKENIZE function to view the tokenization result of text in the specified tokenizer and JSON-formatted parameters.

parser_properties

  • min_token_size:The minimum token length. The default value is 3, and the value range is 1 to 16.

  • max_token_size:The maximum token length. The default value is 84, and the value range is 10 to 84.

  • ngram_token_size:The token length of the NGRAM tokenizer. This property is valid only for the NGRAM tokenizer. The default value is 2, and the value range is 1 to 10.

  • ik_mode: The tokenization mode of the IK tokenizer. Valid values:

    • smart: The default value. This mode uses the words in the dictionary to improve the accuracy of tokenization. The boundaries of the words in the dictionary are prioritized, which may reduce unnecessary expansion.
    • max_word: This mode identifies the words defined in the dictionary but does not affect the maximum expansion of tokenization. Even if the dictionary defines words, the max_word mode still tries to split the text into more words.

Here is an example:

  1. Create a table named tbl1.

    CREATE TABLE tbl1 (col1 VARCHAR(200), col2 TEXT);
  2. Create a full-text index on the col2 column of the tbl1 table, specify the IK tokenizer, and set the tokenizer properties by using PARSER_PROPERTIES.

    CREATE FULLTEXT INDEX fidx_tbl1 ON tbl1(col2) WITH PARSER IK PARSER_PROPERTIES=(ik_mode='max_word');

index_option

  • GLOBAL:This option specifies to create a global index.

  • LOCAL:This option specifies to create a local index.

  • BLOCK_SIZE integer:The size of an index block, that is, the number of bytes in each index block.

  • COMMENT STRING_VALUE:The comment of the index.

  • STORING (column_name [, column_name...]):The columns to be stored in the index. Separate the columns with commas (,).

  • WITH_ROWID:This option specifies to create an index that contains the row identifier.

  • WITH PARSER STRING_VALUE:The parser required by the index.

  • index_using_algorithm:The algorithm used by the index. You can specify USING BTREE or USING HASH. However, you cannot create a multi-value index by using the USING HASH option.

  • visibility_option:The visibility of the index.

  • DATA_TABLE_ID data_table_id:The ID of the data table to which the index belongs.

  • INDEX_TABLE_ID index_table_id:The ID of the index table.

  • VIRTUAL_COLUMN_ID virtual_column_id:The ID of the virtual column.

  • MAX_USED_PART_ID used_part_id:The maximum partition ID used by the index.

index_column_group_option

  • WITH COLUMN GROUP(all columns, each column) : specifies to create a rowstore and columnstore redundant index.
  • WITH COLUMN GROUP(each column) : specifies to create a columnstore index.

Example

Run the following SQL statement to create a columnstore index for a table.

  1. Run the following SQL statement to create a table named test_tbl1.

    CREATE TABLE test_tbl1 (col1 INT, col2 VARCHAR(50));
  2. Run the following SQL statement to create a columnstore index named idx1_test_tbl1 for the test_tbl1 table, which references the col1 column.

    CREATE INDEX idx1_test_tbl1 ON test_tbl1 (col1) WITH COLUMN GROUP(each column);

References