Skip to main content

In-memory sparse vector index

This topic describes how to create, query, and use in-memory sparse vector indexes in seekdb.

Overview

In-memory sparse vector indexes are an efficient index type provided by seekdb for sparse vector data (vectors where most elements are zero). In-memory sparse vector indexes must be fully loaded into memory and support DML and real-time queries.

To improve the query performance of sparse vectors, seekdb integrates the sparse vector index (SINDI) from the VSAG algorithm library. This index performs better than disk-based sparse vector indexes and is suitable for use when memory resources are sufficient.

Feature support

In-memory sparse vector indexes support the following features:

ModuleFeatureDescription
DDLCreate a sparse vector index during table creationYou can create a sparse vector index on a SPARSEVECTOR column when creating a table. The maximum supported dimension is 500,000.
DDLCreate a sparse vector index after table creationSupports creating a sparse vector index on a SPARSEVECTOR column of an existing table. The maximum supported dimension is 500,000.
DMLInsert, update, deleteThe syntax for DML operations is exactly the same as that for regular vector indexes.
RetrievalVector retrievalSupports retrieval using SQL functions.
RetrievalQuery parametersSupports setting query-level parameters through the parameters clause during retrieval.
DBMS_VECTORREFRESH_INDEXPerforms incremental index refresh.
DBMS_VECTORREBUILD_INDEXPerforms full index rebuild.

Index memory estimation and actual usage query

Supports index memory estimation through the DBMS_VECTOR system package. The usage is the same as that for dense indexes. Here, only the special requirements for sparse vector indexes are described:

  • The IDX_TYPE parameter must be set to SINDI, case-insensitive.

Creation syntax and description

In-memory sparse vector indexes support two creation methods: creation during table creation and creation after table creation. When creating an index, note the following:

  • The maximum supported dimension for columns on which sparse vector indexes are created is 500,000.
  • Sparse vector indexes must be created on columns of the SPARSEVECTOR type.
  • The VECTOR keyword is required when creating an index.
  • The index type must be set to sindi, which indicates creating an in-memory sparse vector index.
  • Only the inner_product (inner product) distance algorithm is supported.
  • The parameters and descriptions for an index created after table creation are the same as those for an index created during table creation.

Create during table creation

Supports using the CREATE TABLE statement to create a sparse vector index.

Syntax

CREATE TABLE table_name (
column_name1 data_type1,
column_name2 SPARSEVECTOR,
...,
VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);

Parameter description

ParameterDefault valueValue rangeRequiredDescriptionRemarks
distanceinner_productYesSpecifies the vector distance algorithm type.Sparse vector indexes support only inner product (inner_product) as the distance algorithm.
typesindiYesSpecifies the index algorithm type.Indicates creating an in-memory sparse vector index.
libvsagvsagNoSpecifies the vector index library type.Currently, only the VSAG vector library is supported.
prunefalsetrue/falseNoWhether to perform pruning on vectors.When prune is true, you need to set the refine and drop_ratio_build parameters. When prune is false, full-precision retrieval can be provided. If refine is set to true or drop_ratio_build is not 0, an error will be returned.
refinefalsetrue/falseNoWhether reranking is needed.When set to true, the original sparse vectors are retrieved for the search results to perform high-precision distance calculation and reranking, which means an additional copy of the original vector data needs to be stored. Can be set only when prune=true.
drop_ratio_build0[0, 0.9]NoThe pruning ratio for sparse vector data.When a new sparse vector is inserted, the query_length * drop_ratio_build smallest values are pruned based on value size. If refine is true, the original vector data is preserved. Otherwise, only the pruned data is retained. Can be set only when prune=true.
drop_ratio_search0[0, 0.9]NoThe pruning ratio for sparse vector values during retrieval.The larger the value, the more pruning is performed, the lower the accuracy, and the higher the performance. Can also be set through the parameters clause during retrieval, and query parameters have higher priority.
refine_k4.0[1.0, 1000.0]NoIndicates the proportion of results participating in reranking.Retrieves limit_k * refine_k results and obtains the original vectors for reranking. Meaningful only when refine=true. Can also be set through the parameters clause during retrieval, and query parameters have higher priority.

Create after table creation

Supports creating a sparse vector index on a SPARSEVECTOR column of an existing table.

Syntax

CREATE VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);

Parameter description

The parameter description is the same as that for creating an index during table creation. For details, see the section above.

Create, update, and delete examples

Create during table creation

Create the test table sparse_t1 and create a sparse vector index:

CREATE TABLE sparse_t1 (
c1 INT PRIMARY KEY,
c2 SPARSEVECTOR,
VECTOR INDEX sparse_idx1(c2)
WITH (lib=vsag, type=sindi, distance=inner_product)
);

Insert sparse vector data into the test table:

INSERT INTO sparse_t1 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
INSERT INTO sparse_t1 VALUES(2, '{3:0.3, 2:0.2, 4:0.4}');
INSERT INTO sparse_t1 VALUES(3, '{3:0.3, 4:0.4, 5:0.5}');

Query the test table:

SELECT * FROM sparse_t1;

The return result is as follows:

+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | {1:0.1,2:0.2,3:0.3} |
| 2 | {2:0.2,3:0.3,4:0.4} |
| 3 | {3:0.3,4:0.4,5:0.5} |
+----+---------------------+
3 rows in set

Create after table creation

Create a sparse vector index after creating the test table:

CREATE TABLE sparse_t2 (
c1 INT PRIMARY KEY,
c2 SPARSEVECTOR
);

CREATE VECTOR INDEX sparse_idx2 ON sparse_t2(c2)
WITH (lib=vsag, type=sindi, distance=inner_product,
prune=true, refine=true, drop_ratio_build=0.1,
drop_ratio_search=0.5, refine_k=2.0);

Insert sparse vector data into the test table:

INSERT INTO sparse_t2 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');

Query the test table:

SELECT * FROM sparse_t2;

The return result is as follows:

+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 1 | {1:0.1,2:0.2,3:0.3} |
+----+---------------------+
1 row in set

Update

When updating sparse vector data, the index is automatically maintained:

UPDATE sparse_t1 SET c2 = '{1:0.1}' WHERE c1 = 1;

Delete

The delete operation is the same as that for regular vector indexes. You can directly delete the data:

DELETE FROM sparse_t1 WHERE c1 = 1;

Retrieval

The retrieval syntax for sparse vector indexes is similar to that for dense vector indexes, using the APPROXIMATE/APPROX keyword for approximate nearest neighbor retrieval.

Syntax

SELECT ... FROM table_name 
ORDER BY inner_product(column_name, query_vector) [APPROXIMATE|APPROX]
LIMIT n [PARAMETERS(param1=value1, param2=value2)];

Where:

  • column_name: The SPARSEVECTOR column specified when creating the sparse vector index.
  • query_vector: The query vector, which can be a string in sparse vector format, such as '{1:2.4, 3:1.5}'.
  • n: The number of result rows to return.
  • PARAMETERS: Optional query-level parameters for setting drop_ratio_search and refine_k.

Retrieval usage notes

For detailed requirements, see Dense vector index. Here, only the special requirements for sparse vector indexes are described:

  • Query parameter priority: Query-level parameters set by PARAMETERS > Query parameters set when building the index > Default values.
  • drop_ratio_search: Value range [0, 0.9], default value 0. The pruning ratio for sparse vector values during retrieval. The larger the value, the more pruning is performed, the lower the accuracy, and the higher the performance. Prunes the query_length * drop_ratio_search smallest values based on value size. Since pruning all values is meaningless, at least one value is always retained.
  • refine_k: Value range [1.0, 1000.0], default value 4.0. Indicates the proportion of results participating in reranking. Queries limit_k * refine_k results and obtains the original vectors for reranking. Effective only when refine=true.

Usage examples

Regular query

CREATE TABLE t1 (
c1 INT PRIMARY KEY,
c2 SPARSEVECTOR,
VECTOR INDEX idx1(c2)
WITH (lib=vsag, type=sindi, distance=inner_product)
);

INSERT INTO t1 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
INSERT INTO t1 VALUES(2, '{3:0.3, 2:0.2, 4:0.4}');
INSERT INTO t1 VALUES(3, '{3:0.3, 4:0.4, 5:0.5}');
INSERT INTO t1 VALUES(4, '{5:0.5, 4:0.4, 6:0.6}');
INSERT INTO t1 VALUES(5, '{5:0.5, 6:0.6, 7:0.7}');

SELECT * FROM t1
ORDER BY negative_inner_product(c2, '{3:0.3, 4:0.4}')
APPROXIMATE LIMIT 4;

The return result is as follows:

+----+---------------------+
| c1 | c2 |
+----+---------------------+
| 2 | {2:0.2,3:0.3,4:0.4} |
| 3 | {3:0.3,4:0.4,5:0.5} |
| 4 | {4:0.4,5:0.5,6:0.6} |
| 1 | {1:0.1,2:0.2,3:0.3} |
+----+---------------------+

Use query parameters

SELECT *, negative_inner_product(c2, '{3:0.3, 4:0.4}')  
AS score FROM t1
ORDER BY score APPROXIMATE LIMIT 4
PARAMETERS(drop_ratio_search=0.5);

The return result is as follows:

+----+---------------------+---------------------+
| c1 | c2 | score |
+----+---------------------+---------------------+
| 4 | {4:0.4,5:0.5,6:0.6} | -0.1600000113248825 |
| 3 | {3:0.3,4:0.4,5:0.5} | -0.2500000149011612 |
| 2 | {2:0.2,3:0.3,4:0.4} | -0.2500000149011612 |
+----+---------------------+---------------------+
3 rows in set

Index monitoring and maintenance

In-memory sparse vector indexes provide monitoring views and support using the DBMS_VECTOR system package for index maintenance, including incremental refresh and full rebuild. The usage is the same as that for dense indexes.