Skip to main content

IVF series index

This topic describes the syntax and examples of creating, searching, and dropping IVF series indexes in seekdb.

Syntax and description

Create

IVF series indexes include IVF and IVF_PQ indexes. You can create an IVF series index in two ways: create an index when you create a table and create an index after you create a table. When you create an index, note the following:

  • The VECTOR keyword must be specified.
  • The parameters and descriptions of an index created after you create a table are the same as those of an index created when you create a table.
  • If the amount of data is large, we recommend that you write all the data and then create an index to obtain the best search performance.
  • We recommend that you create an IVF or IVF_PQ index after you write all the data and rebuild the index after you write a large amount of incremental data. For more information about how to create an index, see the examples in the following sections.
  • The length of the index name for an IVF index cannot exceed 33 characters. Otherwise, an exception may be returned because the name of the index auxiliary table exceeds the index_name limit. In later versions, the length of the index name will be supported.

Create an index when you create a table

Syntax:

CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);

Create an index after you create a table

Syntax:

-- You can set the parallelism to improve the index construction performance. The maximum value of the parallelism parameter is twice the number of CPU cores.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);

The following table describes the parameters.

ParameterDefault ValueValue RangeRequiredDescriptionNotes
distancel2/inner_product/cosineYesThe type of vector distance algorithm.l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance.
typeivf_flat/ivf_pqYesThe type of IVF index.
libobobNoThe type of vector index library.
nlist128[1,65536]NoThe number of cluster centers.
sample_per_nlist256[1,int64_max]YesThe number of samples for each cluster center. This parameter is used for an index created after you create a table.
nbits8[1,24]NoThe number of bits for quantization.
This parameter is applicable only to an IVF_PQ index.
We recommend that you set this parameter to 8. We recommend that you set this parameter to a value in the [8,10] range. A larger value indicates higher quantization accuracy and higher search accuracy, but also affects the search performance.
mNo default value. Must be specified.[1,65536]YesThe dimension of the quantized vector.
This parameter is applicable only to an IVF_PQ index.
A larger value indicates higher search accuracy but also affects the search performance and index construction performance.

IVF series indexes support approximate nearest neighbor (ANN) search, which does not guarantee 100% accuracy of the search results. The recall rate is used to measure the accuracy of the search results. For example, if you query 10 nearest neighbors and the search result can stably return 9 correct results, the recall rate is 90%.

SELECT ... FROM table_name 
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);

Syntax requirements:

  • You must specify the APPROXIMATE/APPROX keyword. Otherwise, the search will not use the vector index but will perform a full table scan.
  • The ORDER BY and LIMIT clauses must be specified.
  • The ORDER BY clause supports only a single vector condition.
  • The value of LIMIT + OFFSET must be in the range of (0, 16384].
  • If you do not specify the LIMIT clause, an error is returned.

Rules for using distance functions:

  • If you specify the APPROXIMATE/APPROX keyword, the search will use the vector index only if the distance function is supported in the current version and is compatible with the vector index algorithm.
  • If you specify the APPROXIMATE/APPROX keyword, the search will not use the vector index if the distance function is not compatible with the vector index algorithm, but no error will be returned.
  • If you specify the APPROXIMATE/APPROX keyword, the search will not use the vector index and an error will be returned if the distance function is not supported in the current version.
  • If you do not specify the APPROXIMATE/APPROX keyword, the search will not use the vector index, but no error will be returned if the distance function is supported in the current version.

Other considerations:

  • The WHERE clause specifies the filter conditions for the vector index search.
  • The recall rate is affected by the construction parameters and search parameters.
  • The search parameters of an index are specified when you create the index and cannot be modified later. However, you can set the number of cluster centers for search by using the session variable ob_ivf_nprobes. If you set this session variable, its value will be used for search. For more information about how to set this session variable, see ob_ivf_nprobes.

Drop

Syntax:

DROP INDEX index_name ON table_name;

Examples

Create an index when you create a table

IVF example

CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));

Create an index after you create a table

IVF example

Create a test table.

CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an IVF index.

CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);

Drop

DROP INDEX vec_idx1 ON vec_table;

Query the dropped index.

SHOW INDEX FROM vec_table;

The return result is as follows:

Empty set

References