Skip to main content

HNSW index

This topic describes how to create, search, and delete HNSW indexes in SeekDB.

Syntax and description

Create

The HNSW series index includes HNSW, HNSW_SQ, and HNSW_BQ indexes. You can create an HNSW series index when you create a table or after you create a table. When you create an HNSW series index, note the following:

  • The VECTOR keyword is required.
  • The parameters and their descriptions for creating an index after you create a table are the same as those for creating an index when you create a table.
  • If the data volume is large, we recommend that you write the data and then create an index to obtain the best search performance.
  • We recommend that you create an HNSW_SQ index after you write the data and rebuild the index after you write a large amount of incremental data. For more information about how to create an HNSW_SQ index, see the example in the following section.
  • When you create an HNSW index, the index name cannot exceed 25 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the index_name limit. In later versions, longer index names will be supported.
  • We recommend that you create an HNSW series index on a heap table to obtain the best search performance.

Create an index when you create a table

Syntax:

CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);

Create an index after you create a table

Syntax:

-- You can set the parallelism to improve the index construction performance. The maximum parallelism is the number of CPU cores multiplied by 2.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);

The following table describes the parameters.

ParameterDefault valueValue rangeRequiredDescriptionNotes
distancel2/inner_product/cosineYesThe type of vector distance algorithm.l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance.
typehnsw/hnsw_sq/hnsw_bqYesThe index type.
libvsagvsagNoThe type of vector index library.Only the VSAG vector library is supported.
m16[5,128]NoThe maximum number of neighbors for each node.A larger value indicates a slower index construction but better search performance.
ef_construction200[5,1000]NoThe size of the candidate set during index construction.A larger value indicates a slower index construction but better index quality. ef_construction must be greater than m.
ef_search64[1,1000]NoThe size of the candidate set during search.A larger value indicates a slower search but higher recall rate.
extra_info_max_size0[0,16384]NoThe maximum size of the primary key information (unit: bytes). The primary key information is stored in the index to speed up the search.0: The primary key information is not stored.
1: The primary key information is forcibly stored, and the size limit is ignored. In this case, the primary key type (see the following section) must be a supported type.
Greater than 1: The maximum size of the primary key information (unit: bytes). In this case, the following conditions must be met:
  • The size of the primary key information (calculated as described in the following section) must be less than the set size limit.
  • The primary key type must be a supported type.
  • The table must have a primary key.
refine_k4.0[1.0,1000.0]NoThis parameter is specified only when you create an HNSW_BQ index.
This parameter is of the float type and is used to adjust the reordering ratio of the quantized vector index.
You can specify this parameter when you create an index or during search:
  • If you do not specify this parameter during search, the value specified when you create an index is used.
  • If you specify this parameter during search, the value specified during search is used.
refine_typesq8sq8/fp32NoThis parameter is specified only when you create an HNSW_BQ index.
This parameter specifies the construction accuracy of the quantized vector index.
This value improves the efficiency by reducing the memory overhead and index construction time during index construction, but may affect the recall rate.
bq_bits_query320/4/32NoThis parameter is specified only when you create an HNSW_BQ index.
This parameter specifies the search accuracy of the quantized vector index, in bits.
This value improves the efficiency by reducing the memory overhead and index construction time during index construction, but may affect the recall rate.
bq_use_fhttruetrue/falseNoThis parameter is specified only when you create an HNSW_BQ index.
This parameter specifies whether to use FHT for search. FHT (Fast Hadamard Transform) is an algorithm for accelerating vector inner product calculation.

The following primary key types are supported for the extra_info_max_size parameter:

The size of the primary key information is calculated as follows:

SET @table_name = 'test'; -- Replace with the actual table name.
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- The size of the primary key information of a numeric type is 8 bytes.
WHEN oc.data_type = 22 THEN oc.data_length -- The size of the primary key information of a VARCHAR type is the data length.
ELSE NULL -- The size of the primary key information of other types is not supported.
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- The calculation result is 8 bytes.

Search in the HNSW series index is an approximate nearest neighbor (ANN) search and does not guarantee 100% result accuracy. The recall rate is used to measure the accuracy of the search. For example, if 9 out of 10 nearest neighbors can be stably returned, the recall rate is 90%.

SELECT ... FROM table_name 
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);

Syntax requirements:

  • The APPROXIMATE/APPROX keyword must be specified to use the vector index instead of a full table scan.
  • The ORDER BY and LIMIT clauses must be specified.
  • The ORDER BY clause supports only one vector condition.
  • The value of LIMIT + OFFSET must be in the range (0, 16384].
  • If the LIMIT clause is not specified, an error is returned.

Rules for using distance functions:

  • If the APPROXIMATE/APPROX keyword is specified, the search uses the vector index if the distance function supported by the current version matches the vector index algorithm.
  • If the APPROXIMATE/APPROX keyword is specified, and the distance function does not match the vector index algorithm, the search does not use the vector index, but no error is returned.
  • If the APPROXIMATE/APPROX keyword is specified, and the distance function is not supported by the current version, the search does not use the vector index, and an error is returned.
  • If the APPROXIMATE/APPROX keyword is not specified, and the distance function is supported by the current version, the search does not use the vector index, but no error is returned.

Other considerations:

  • The WHERE condition is used as a filter for vector index search.
  • The recall rate is affected by the build parameters and search parameters.
  • Index search parameters can be set when creating the index. If you need to adjust the parameters after the index is created, you can modify the table-level attributes using DBMS_VECTOR.REBUILD_INDEX or specify the query-level parameter ef_search in the query.

Delete

The syntax for deleting a vector index is as follows:

DROP INDEX index_name ON table_name;

Create, search, and delete examples

Create during table creation

HNSW example

Create a test table.

CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag),  VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));

Write test data.

INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

Perform approximate nearest neighbor search.

SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;

The returned result is as follows:

+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));

The distance parameter of an HNSW_BQ index supports l2 and cosine.

Create after table creation

HNSW example

Create a test table.

CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));

Create an HNSW index.

CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);

View the created table.

SHOW CREATE TABLE vec_table_hnsw;

The returned result is as follows:

+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

View the created index.

SHOW INDEX FROM vec_table_hnsw;

The returned result is as follows:

+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set

HNSW_SQ example

Create a test table.

CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an HNSW_SQ index.

CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);

The distance parameter of an HNSW_BQ index supports l2 and cosine.

Delete

DROP INDEX vec_idx1 ON vec_table;

View the deleted index.

SHOW INDEX FROM vec_table;

The returned result is as follows:

Empty set

References