In-memory sparse vector index

This topic describes how to create, query, and use in-memory sparse vector indexes in seekdb.

Overview

In-memory sparse vector indexes are an efficient index type provided by seekdb for sparse vector data (vectors where most elements are zero). In-memory sparse vector indexes must be fully loaded into memory and support DML and real-time queries.

To improve the query performance of sparse vectors, seekdb integrates the sparse vector index (SINDI) from the VSAG algorithm library. This index performs better than disk-based sparse vector indexes and is suitable for use when memory resources are sufficient.

Feature support

In-memory sparse vector indexes support the following features:

Module	Feature	Description
DDL	Create a sparse vector index during table creation	You can create a sparse vector index on a `SPARSEVECTOR` column when creating a table. The maximum supported dimension is 500,000.
DDL	Create a sparse vector index after table creation	Supports creating a sparse vector index on a `SPARSEVECTOR` column of an existing table. The maximum supported dimension is 500,000.
DML	Insert, update, delete	The syntax for DML operations is exactly the same as that for regular vector indexes.
Retrieval	Vector retrieval	Supports retrieval using SQL functions.
Retrieval	Query parameters	Supports setting query-level parameters through the `parameters` clause during retrieval.
DBMS_VECTOR	`REFRESH_INDEX`	Performs incremental index refresh.
DBMS_VECTOR	`REBUILD_INDEX`	Performs full index rebuild.

Index memory estimation and actual usage query

Supports index memory estimation through the DBMS_VECTOR system package. The usage is the same as that for dense indexes. Here, only the special requirements for sparse vector indexes are described:

The IDX_TYPE parameter must be set to SINDI, case-insensitive.

Creation syntax and description

In-memory sparse vector indexes support two creation methods: creation during table creation and creation after table creation. When creating an index, note the following:

The maximum supported dimension for columns on which sparse vector indexes are created is 500,000.
Sparse vector indexes must be created on columns of the SPARSEVECTOR type.
The VECTOR keyword is required when creating an index.
The index type must be set to sindi, which indicates creating an in-memory sparse vector index.
Only the inner_product (inner product) distance algorithm is supported.
The parameters and descriptions for an index created after table creation are the same as those for an index created during table creation.

Create during table creation

Supports using the CREATE TABLE statement to create a sparse vector index.

Syntax

CREATE TABLE table_name (
    column_name1 data_type1,
    column_name2 SPARSEVECTOR,
    ...,
    VECTOR INDEX index_name (column_name2) WITH (param1=value1, param2=value2, ...)
);

Parameter description

Parameter	Default value	Value range	Required	Description	Remarks
`distance`		`inner_product`	Yes	Specifies the vector distance algorithm type.	Sparse vector indexes support only inner product (`inner_product`) as the distance algorithm.
`type`		`sindi`	Yes	Specifies the index algorithm type.	Indicates creating an in-memory sparse vector index.
`lib`	`vsag`	`vsag`	No	Specifies the vector index library type.	Currently, only the VSAG vector library is supported.
`prune`	`false`	`true`/`false`	No	Whether to perform pruning on vectors.	When `prune` is `true`, you need to set the `refine` and `drop_ratio_build` parameters. When `prune` is `false`, full-precision retrieval can be provided. If `refine` is set to `true` or `drop_ratio_build` is not `0`, an error will be returned.
`refine`	`false`	`true`/`false`	No	Whether reranking is needed.	When set to `true`, the original sparse vectors are retrieved for the search results to perform high-precision distance calculation and reranking, which means an additional copy of the original vector data needs to be stored. Can be set only when `prune=true`.
`drop_ratio_build`	`0`	`[0, 0.9]`	No	The pruning ratio for sparse vector data.	When a new sparse vector is inserted, the `query_length * drop_ratio_build` smallest values are pruned based on value size. If `refine` is `true`, the original vector data is preserved. Otherwise, only the pruned data is retained. Can be set only when `prune=true`.
`drop_ratio_search`	`0`	`[0, 0.9]`	No	The pruning ratio for sparse vector values during retrieval.	The larger the value, the more pruning is performed, the lower the accuracy, and the higher the performance. Can also be set through the `parameters` clause during retrieval, and query parameters have higher priority.
`refine_k`	`4.0`	`[1.0, 1000.0]`	No	Indicates the proportion of results participating in reranking.	Retrieves `limit_k * refine_k` results and obtains the original vectors for reranking. Meaningful only when `refine=true`. Can also be set through the `parameters` clause during retrieval, and query parameters have higher priority.

Create after table creation

Supports creating a sparse vector index on a SPARSEVECTOR column of an existing table.

Syntax

CREATE VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);

Parameter description

The parameter description is the same as that for creating an index during table creation. For details, see the section above.

Create, update, and delete examples

Create during table creation

Create the test table sparse_t1 and create a sparse vector index:

CREATE TABLE sparse_t1 (
    c1 INT PRIMARY KEY,
    c2 SPARSEVECTOR,
    VECTOR INDEX sparse_idx1(c2) 
    WITH (lib=vsag, type=sindi, distance=inner_product)
);

Insert sparse vector data into the test table:

INSERT INTO sparse_t1 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
INSERT INTO sparse_t1 VALUES(2, '{3:0.3, 2:0.2, 4:0.4}');
INSERT INTO sparse_t1 VALUES(3, '{3:0.3, 4:0.4, 5:0.5}');

Query the test table:

SELECT * FROM sparse_t1;

The return result is as follows:

+----+---------------------+
| c1 | c2                  |
+----+---------------------+
|  1 | {1:0.1,2:0.2,3:0.3} |
|  2 | {2:0.2,3:0.3,4:0.4} |
|  3 | {3:0.3,4:0.4,5:0.5} |
+----+---------------------+
3 rows in set

Create after table creation

Create a sparse vector index after creating the test table:

CREATE TABLE sparse_t2 (
    c1 INT PRIMARY KEY,
    c2 SPARSEVECTOR
);

CREATE VECTOR INDEX sparse_idx2 ON sparse_t2(c2) 
WITH (lib=vsag, type=sindi, distance=inner_product, 
      prune=true, refine=true, drop_ratio_build=0.1, 
      drop_ratio_search=0.5, refine_k=2.0);

Insert sparse vector data into the test table:

INSERT INTO sparse_t2 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');

Query the test table:

SELECT * FROM sparse_t2;

The return result is as follows:

+----+---------------------+
| c1 | c2                  |
+----+---------------------+
|  1 | {1:0.1,2:0.2,3:0.3} |
+----+---------------------+
1 row in set

Update

When updating sparse vector data, the index is automatically maintained:

UPDATE sparse_t1 SET c2 = '{1:0.1}' WHERE c1 = 1;

Delete

The delete operation is the same as that for regular vector indexes. You can directly delete the data:

DELETE FROM sparse_t1 WHERE c1 = 1;

Retrieval

The retrieval syntax for sparse vector indexes is similar to that for dense vector indexes, using the APPROXIMATE/APPROX keyword for approximate nearest neighbor retrieval.

Syntax

SELECT ... FROM table_name 
ORDER BY inner_product(column_name, query_vector) [APPROXIMATE|APPROX] 
LIMIT n [PARAMETERS(param1=value1, param2=value2)];

Where:

column_name: The SPARSEVECTOR column specified when creating the sparse vector index.
query_vector: The query vector, which can be a string in sparse vector format, such as '{1:2.4, 3:1.5}'.
n: The number of result rows to return.
PARAMETERS: Optional query-level parameters for setting drop_ratio_search and refine_k.

Retrieval usage notes

For detailed requirements, see Dense vector index. Here, only the special requirements for sparse vector indexes are described:

Query parameter priority: Query-level parameters set by PARAMETERS > Query parameters set when building the index > Default values.
drop_ratio_search: Value range [0, 0.9], default value 0. The pruning ratio for sparse vector values during retrieval. The larger the value, the more pruning is performed, the lower the accuracy, and the higher the performance. Prunes the query_length * drop_ratio_search smallest values based on value size. Since pruning all values is meaningless, at least one value is always retained.
refine_k: Value range [1.0, 1000.0], default value 4.0. Indicates the proportion of results participating in reranking. Queries limit_k * refine_k results and obtains the original vectors for reranking. Effective only when refine=true.

Usage examples

Regular query

CREATE TABLE t1 (
    c1 INT PRIMARY KEY, 
    c2 SPARSEVECTOR,
    VECTOR INDEX idx1(c2) 
    WITH (lib=vsag, type=sindi, distance=inner_product)
);

INSERT INTO t1 VALUES(1, '{1:0.1, 2:0.2, 3:0.3}');
INSERT INTO t1 VALUES(2, '{3:0.3, 2:0.2, 4:0.4}');
INSERT INTO t1 VALUES(3, '{3:0.3, 4:0.4, 5:0.5}');
INSERT INTO t1 VALUES(4, '{5:0.5, 4:0.4, 6:0.6}');
INSERT INTO t1 VALUES(5, '{5:0.5, 6:0.6, 7:0.7}');

SELECT * FROM t1 
ORDER BY negative_inner_product(c2, '{3:0.3, 4:0.4}') 
APPROXIMATE LIMIT 4;

The return result is as follows:

+----+---------------------+
| c1 | c2                  |
+----+---------------------+
|  2 | {2:0.2,3:0.3,4:0.4} |
|  3 | {3:0.3,4:0.4,5:0.5} |
|  4 | {4:0.4,5:0.5,6:0.6} |
|  1 | {1:0.1,2:0.2,3:0.3} |
+----+---------------------+

Use query parameters

SELECT *, negative_inner_product(c2, '{3:0.3, 4:0.4}')  
AS score  FROM t1  
ORDER BY score APPROXIMATE LIMIT 4  
PARAMETERS(drop_ratio_search=0.5);

The return result is as follows:

+----+---------------------+---------------------+
| c1 | c2                  | score               |
+----+---------------------+---------------------+
|  4 | {4:0.4,5:0.5,6:0.6} | -0.1600000113248825 |
|  3 | {3:0.3,4:0.4,5:0.5} | -0.2500000149011612 |
|  2 | {2:0.2,3:0.3,4:0.4} | -0.2500000149011612 |
+----+---------------------+---------------------+
3 rows in set

Index monitoring and maintenance

In-memory sparse vector indexes provide monitoring views and support using the DBMS_VECTOR system package for index maintenance, including incremental refresh and full rebuild. The usage is the same as that for dense indexes.

For detailed information about sparse vector data types, see Vector data type.
For detailed information about vector distance functions, see Vector functions.
For monitoring and maintenance of dense vector indexes, see Vector index monitoring/maintenance.
For index memory estimation and actual usage query of vector indexes, see Index memory estimation and actual usage query.

Overview​

Feature support​

Index memory estimation and actual usage query​

Creation syntax and description​

Create during table creation​

Syntax​

Parameter description​

Create after table creation​

Syntax​

Parameter description​

Create, update, and delete examples​

Create during table creation​

Create after table creation​

Update​

Delete​

Retrieval​

Syntax​

Retrieval usage notes​

Usage examples​

Regular query​

Use query parameters​

Index monitoring and maintenance​

Related documentation​

Contents

Overview

Feature support

Index memory estimation and actual usage query

Creation syntax and description

Create during table creation

Syntax

Parameter description

Create after table creation

Syntax

Parameter description

Create, update, and delete examples

Create during table creation

Create after table creation

Update

Delete

Retrieval

Syntax

Retrieval usage notes

Usage examples

Regular query

Use query parameters

Index monitoring and maintenance

Related documentation