HNSW index
This topic describes how to create, search, and delete HNSW indexes in SeekDB.
Syntax and description
Create
The HNSW series index includes HNSW, HNSW_SQ, and HNSW_BQ indexes. You can create an HNSW series index when you create a table or after you create a table. When you create an HNSW series index, note the following:
- The
VECTORkeyword is required. - The parameters and their descriptions for creating an index after you create a table are the same as those for creating an index when you create a table.
- If the data volume is large, we recommend that you write the data and then create an index to obtain the best search performance.
- We recommend that you create an HNSW_SQ index after you write the data and rebuild the index after you write a large amount of incremental data. For more information about how to create an HNSW_SQ index, see the example in the following section.
- When you create an HNSW index, the index name cannot exceed 25 characters. Otherwise, an exception may be returned because the index auxiliary table name exceeds the
index_namelimit. In later versions, longer index names will be supported. - We recommend that you create an HNSW series index on a heap table to obtain the best search performance.
Create an index when you create a table
Syntax:
CREATE TABLE table_name (
column_name1 data_type1,
column_name2 VECTOR(dim),
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);
Create an index after you create a table
Syntax:
-- You can set the parallelism to improve the index construction performance. The maximum parallelism is the number of CPU cores multiplied by 2.
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name2) WITH (param1=value1, param2=value2, ...);
The following table describes the parameters.
| Parameter | Default value | Value range | Required | Description | Notes |
|---|---|---|---|---|---|
| distance | l2/inner_product/cosine | Yes | The type of vector distance algorithm. | l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance. | |
| type | hnsw/hnsw_sq/hnsw_bq | Yes | The index type. | ||
| lib | vsag | vsag | No | The type of vector index library. | Only the VSAG vector library is supported. |
| m | 16 | [5,128] | No | The maximum number of neighbors for each node. | A larger value indicates a slower index construction but better search performance. |
| ef_construction | 200 | [5,1000] | No | The size of the candidate set during index construction. | A larger value indicates a slower index construction but better index quality. ef_construction must be greater than m. |
| ef_search | 64 | [1,1000] | No | The size of the candidate set during search. | A larger value indicates a slower search but higher recall rate. |
| extra_info_max_size | 0 | [0,16384] | No | The maximum size of the primary key information (unit: bytes). The primary key information is stored in the index to speed up the search. | 0: The primary key information is not stored.1: The primary key information is forcibly stored, and the size limit is ignored. In this case, the primary key type (see the following section) must be a supported type.Greater than 1: The maximum size of the primary key information (unit: bytes). In this case, the following conditions must be met:
|
| refine_k | 4.0 | [1.0,1000.0] | No | This parameter is specified only when you create an HNSW_BQ index. This parameter is of the float type and is used to adjust the reordering ratio of the quantized vector index. | You can specify this parameter when you create an index or during search:
|
| refine_type | sq8 | sq8/fp32 | No | This parameter is specified only when you create an HNSW_BQ index. This parameter specifies the construction accuracy of the quantized vector index. | This value improves the efficiency by reducing the memory overhead and index construction time during index construction, but may affect the recall rate. |
| bq_bits_query | 32 | 0/4/32 | No | This parameter is specified only when you create an HNSW_BQ index. This parameter specifies the search accuracy of the quantized vector index, in bits. | This value improves the efficiency by reducing the memory overhead and index construction time during index construction, but may affect the recall rate. |
| bq_use_fht | true | true/false | No | This parameter is specified only when you create an HNSW_BQ index. This parameter specifies whether to use FHT for search. FHT (Fast Hadamard Transform) is an algorithm for accelerating vector inner product calculation. |
The following primary key types are supported for the extra_info_max_size parameter:
- Numeric types: Integer, float, and Bit_value types.
- Date and time types
- Character types: VARCHAR type.
The size of the primary key information is calculated as follows:
SET @table_name = 'test'; -- Replace with the actual table name.
SELECT
CASE
WHEN COUNT(*) <> COUNT(result_value) THEN 'not support'
ELSE COALESCE(SUM(result_value), 'not support')
END AS extra_info_size
FROM (
SELECT
CASE
WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- The size of the primary key information of a numeric type is 8 bytes.
WHEN oc.data_type = 22 THEN oc.data_length -- The size of the primary key information of a VARCHAR type is the data length.
ELSE NULL -- The size of the primary key information of other types is not supported.
END AS result_value
FROM
oceanbase.__all_column oc
JOIN
oceanbase.__all_virtual_data_type vdt
ON
oc.data_type = vdt.data_type
WHERE
oc.rowkey_position != 0
AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;
-- The calculation result is 8 bytes.
Search
Search in the HNSW series index is an approximate nearest neighbor (ANN) search and does not guarantee 100% result accuracy. The recall rate is used to measure the accuracy of the search. For example, if 9 out of 10 nearest neighbors can be stably returned, the recall rate is 90%.
SELECT ... FROM table_name
ORDER BY distance_function(column_name, vector_expr) [APPROXIMATE|APPROX]
LIMIT num (OFFSET num);
Syntax requirements:
- The
APPROXIMATE/APPROXkeyword must be specified to use the vector index instead of a full table scan. - The
ORDER BYandLIMITclauses must be specified. - The
ORDER BYclause supports only one vector condition. - The value of
LIMIT + OFFSETmust be in the range (0, 16384]. - If the
LIMITclause is not specified, an error is returned.
Rules for using distance functions:
- If the
APPROXIMATE/APPROXkeyword is specified, the search uses the vector index if the distance function supported by the current version matches the vector index algorithm. - If the
APPROXIMATE/APPROXkeyword is specified, and the distance function does not match the vector index algorithm, the search does not use the vector index, but no error is returned. - If the
APPROXIMATE/APPROXkeyword is specified, and the distance function is not supported by the current version, the search does not use the vector index, and an error is returned. - If the
APPROXIMATE/APPROXkeyword is not specified, and the distance function is supported by the current version, the search does not use the vector index, but no error is returned.
Other considerations:
- The
WHEREcondition is used as a filter for vector index search. - The recall rate is affected by the build parameters and search parameters.
- Index search parameters can be set when creating the index. If you need to adjust the parameters after the index is created, you can modify the table-level attributes using DBMS_VECTOR.REBUILD_INDEX or specify the query-level parameter
ef_searchin the query.
Delete
The syntax for deleting a vector index is as follows:
DROP INDEX index_name ON table_name;
Create, search, and delete examples
Create during table creation
HNSW example
Create a test table.
CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag), VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));
Write test data.
INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');
Perform approximate nearest neighbor search.
SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;
The returned result is as follows:
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0 | c2 | c3 |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| 3 | 3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set
HNSW_SQ example
CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));
HNSW_BQ example
CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));
The distance parameter of an HNSW_BQ index supports l2 and cosine.
Create after table creation
HNSW example
Create a test table.
CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));
Create an HNSW index.
CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);
View the created table.
SHOW CREATE TABLE vec_table_hnsw;
The returned result is as follows:
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
`id` int(11) DEFAULT NULL,
`c2` VECTOR(10) DEFAULT NULL,
VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
View the created index.
SHOW INDEX FROM vec_table_hnsw;
The returned result is as follows:
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table | 1 | vec_idx1 | 1 | c2 | A | NULL | NULL | NULL | YES | VECTOR | available | | YES | NULL |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set
HNSW_SQ example
Create a test table.
CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));
Create an HNSW_SQ index.
CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);
HNSW_BQ example
CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);
The distance parameter of an HNSW_BQ index supports l2 and cosine.
Delete
DROP INDEX vec_idx1 ON vec_table;
View the deleted index.
SHOW INDEX FROM vec_table;
The returned result is as follows:
Empty set
References
- For information about how to estimate the memory usage of HNSW series indexes and how to query the actual memory usage, see Index memory management.
- For information about how to monitor and maintain HNSW series indexes, see Index monitoring and maintenance.