IVF series index
This topic describes the syntax and examples of creating, searching, and dropping IVF series indexes in seekdb.
Syntax and description
Create
IVF series indexes include IVF and IVF_PQ indexes. You can create an IVF series index in two ways: create an index when you create a table and create an index after you create a table. When you create an index, note the following:
- The
VECTORkeyword must be specified. - The parameters and descriptions of an index created after you create a table are the same as those of an index created when you create a table.
- If the amount of data is large, we recommend that you write all the data and then create an index to obtain the best search performance.
- We recommend that you create an IVF or IVF_PQ index after you write all the data and rebuild the index after you write a large amount of incremental data. For more information about how to create an index, see the examples in the following sections.
- The length of the index name for an IVF index cannot exceed 33 characters. Otherwise, an exception may be returned because the name of the index auxiliary table exceeds the
index_namelimit. In later versions, the length of the index name will be supported.
Create an index when you create a table
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Create an index after you create a table
Syntax:
-- You can set the parallelism to improve the index construction performance. The maximum value of the parallelism parameter is twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
The following table describes the parameters.
| Parameter | Default Value | Value Range | Required | Description | Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The type of vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | ivf_flat/ivf_pq | Yes | The type of IVF index. | ||
| lib | ob | ob | No | The type of vector index library. | |
| nlist | 128 | [1,65536] | No | The number of cluster centers. | |
| sample_per_nlist | 256 | [1,int64_max] | Yes | The number of samples for each cluster center. This parameter is used for an index created after you create a table. | |
| nbits | 8 | [1,24] | No | The number of bits for quantization. This parameter is applicable only to an IVF_PQ index. | We recommend that you set this parameter to 8. We recommend that you set this parameter to a value in the [8,10] range. A larger value indicates higher quantization accuracy and higher search accuracy, but also affects the search performance. |
| m | No default value. Must be specified. | [1,65536] | Yes | The dimension of the quantized vector. This parameter is applicable only to an IVF_PQ index. | A larger value indicates higher search accuracy but also affects the search performance and index construction performance. |
Search
IVF series indexes support approximate nearest neighbor (ANN) search, which does not guarantee 100% accuracy of the search results. The recall rate is used to measure the accuracy of the search results. For example, if you query 10 nearest neighbors and the search result can stably return 9 correct results, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);
Syntax requirements:
- You must specify the
APPROXIMATE/APPROXkeyword. Otherwise, the search will not use the vector index but will perform a full table scan. - The
ORDER BYandLIMITclauses must be specified. - The
ORDER BYclause supports only a single vector condition. - The value of
LIMIT + OFFSETmust be in the range of(0, 16384]. - If you do not specify the
LIMITclause, an error is returned.
Rules for using distance functions:
- If you specify the
APPROXIMATE/APPROXkeyword, the search will use the vector index only if the distance function is supported in the current version and is compatible with the vector index algorithm. - If you specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index if the distance function is not compatible with the vector index algorithm, but no error will be returned. - If you specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index and an error will be returned if the distance function is not supported in the current version. - If you do not specify the
APPROXIMATE/APPROXkeyword, the search will not use the vector index, but no error will be returned if the distance function is supported in the current version.
Other considerations:
- The
WHEREclause specifies the filter conditions for the vector index search. - The recall rate is affected by the construction parameters and search parameters.
- The search parameters of an index are specified when you create the index and cannot be modified later. However, you can set the number of cluster centers for search by using the session variable
ob_ivf_nprobes. If you set this session variable, its value will be used for search. For more information about how to set this session variable, see ob_ivf_nprobes.
Drop
Syntax:
DROP INDEX index_name ON table_name;
Examples
Create an index when you create a table
IVF example
CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));
Create an index after you create a table
IVF example
Create a test table.
CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an IVF index.
CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);
Drop
DROP INDEX vec_idx1 ON vec_table;
Query the dropped index.
SHOW INDEX FROM vec_table;
The return result is as follows:
Empty set
References
- For more information about how to query the estimated memory and actual memory occupied by an IVF series index, see Index memory management.
- For more information about how to monitor and maintain an IVF series index, see Index monitoring and maintenance.
- For more information about how to optimize the performance of an IVF series index, see Index performance optimization.