Dense vector index

This topic describes how to create, query, maintain, and drop a dense vector index in seekdb.

Index types

The following table describes the vector index types supported by seekdb.

Index type	Description	Scenarios
HNSW	The maximum dimension of indexed columns is 4096. The HNSW index is a memory-based index that must be fully loaded into memory. It supports DML and real-time queries.
HNSW_SQ	The HNSW_SQ index offers similar construction speed, query performance, and recall rate as the HNSW index, but reduces overall memory usage to 1/2 to 1/3 of the original.	Scenarios with high performance and recall rate requirements.
HNSW_BQ	The HNSW_BQ index has a slightly lower recall rate compared to the HNSW index, but significantly reduces memory usage. The BQ quantization compression algorithm (Rabitq) can compress vectors to 1/32 of their original size. The memory optimization effect of the HNSW_BQ index becomes more pronounced as the vector dimension increases.
IVF	An IVF index implemented based on database tables, which does not require resident memory.	Scenarios with lower performance requirements but large data volumes and cost sensitivity.
IVF_PQ	An IVF_PQ index implemented based on database tables, which does not require resident memory. On top of IVF, PQ quantization technology is applied. The recall rate of the index is slightly lower than that of the IVF index, but the performance is higher. The PQ quantization compression algorithm can generally compress vectors to 1/16 to 1/32 of their original size.	Scenarios with lower performance requirements but large data volumes and cost sensitivity.
IVF_SQ (Experimental feature)	An IVF_SQ index implemented based on database tables, which does not require resident memory. On top of IVF, SQ quantization technology is applied. The recall rate of the index is slightly lower than that of the IVF index, but the performance is higher. The SQ quantization compression algorithm can generally compress vectors to 1/3 to 1/4 of their original size.	Scenarios with lower performance requirements but large data volumes and cost sensitivity.

Some other notes:

Dense vector indexes support L2, inner product (IP), and cosine distance as the index distance algorithm.
Vector index queries support calling some distance functions. For more information, see Use SQL functions.
Vector queries with filter conditions are supported. The filter conditions can be scalar conditions or spatial relationships, such as ST_Intersects. Multi-value indexes, full-text indexes, and global indexes are not supported as pre-filterers.
You can create vector and full-text indexes on the same table.
For more information about how vector indexes support offline DDL operations, see Offline DDL.

The limitations are described as follows:

For V1.0.0, creating columnstore vector indexes is currently not supported.

Index memory estimation and actual usage query

You can estimate the memory required for vector indexes using the DBMS_VECTOR system package:

Before creating a table, you can estimate index memory requirements by using the INDEX_VECTOR_MEMORY_ADVISOR procedure.
After a table is created and data is inserted, you can estimate index memory requirements by using the INDEX_VECTOR_MEMORY_ESTIMATE procedure.

The vector index memory estimation provides two key pieces of information: the minimum memory configuration required to create a vector index, and the actual memory usage after creating HNSW_SQ and IVF indexes.

We also provide the configuration item load_vector_index_on_follower to control whether the follower role automatically loads in-memory vector indexes. For syntax and examples, see load_vector_index_on_follower. If weak reads are not needed, you can disable this configuration item to reduce the memory used by vector indexes.

Creation syntax and description

seekdb vector indexes can be created during table creation or after the table is created. When creating a vector index, note the following:

The VECTOR keyword is required when creating a vector index.
The parameters and descriptions for an index created after the table is created are the same as those for an index created during table creation.
If a large amount of data is involved, we recommend that you write the data first and then create the index to achieve the optimal query performance.
It is recommended to create HNSW_SQ, IVF, IVF_SQ, and IVF_PQ indexes after data is inserted, and to rebuild the indexes after a significant amount of new data is added. For detailed instructions on creating each index, see the specific examples below.

tab HNSW/HNSW_SQ/HNSW_BQ

Syntax for creating an index during table creation:

CREATE TABLE table_name (
    column_name1 data_type1,
    column_name2 data_type2,
    ...,
    VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);

Syntax for creating an index after table creation:

-- Creating an index after table creation supports setting parallel degree to improve index construction performance. The maximum parallel degree should not exceed CPU cores * 2
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);

param parameter description:

Parameter	Default value	Value range	Required	Description	Remarks
distance		l2/inner_product/cosine	Yes	The vector distance function type.	l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance.
type		currently supported `hnsw` / `hnsw_sq`/ `hnsw_bq`.	Yes	The index type.
lib	vsag	vsag	No	The vector index library type.	At present, only the VSAG vector library is supported.
m	16	[5,128]	No	The maximum number of neighbors of each node.	The larger the value, the slower the index construction, but the better the query performance.
ef_construction	200	[5,1000]	No	The size of the candidate set during index construction.	The larger the value, the slower the index construction, but the better the index quality. `ef_construction` must be greater than `m`.
ef_search	64	[1,1000]	No	The size of the candidate set during a query.	The larger the value, the slower the query, but the higher the recall rate.
extra_info_max_size	0	[0,16384]	No	The maximum size of each primary key information (in bytes). Storing the primary key of the table in the index can speed up queries.	`0`: The primary key information is not stored. `1`: The primary key information is forcibly stored, regardless of the size limit. In this case, the primary key type (see below) must be a supported type. `Greater than 1`: The maximum size of the primary key information (in bytes) is specified. In this case, the following conditions must be met: The size of the primary key information (calculation method see below) must be less than the specified size limit. The primary key type must be a supported type. The table is not a table without a primary key.
refine_k	4.0	[1.0,1000.0]	No	This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. This parameter is a floating-point number used to adjust the rearrangement ratio for quantized vector indexes.	This parameter can be specified when you create an index or during a query: If this parameter is not specified during a query, the value specified when the index is created is used. If this parameter is specified during a query, the value specified during the query is used.
refine_type	sq8 This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. This parameter specifies the construction precision of quantized vector indexes.	This parameter improves the efficiency of index construction by reducing the memory usage and the construction time, but may affect the recall rate.
bq_bits_query	32	0/4/32	No	This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. This parameter specifies the query precision of quantized vector indexes in bits.	This parameter improves the efficiency of index construction by reducing the memory usage and the construction time, but may affect the recall rate.
bq_use_fht	true This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an HNSW_BQ index. This parameter specifies whether to use FHT for queries. FHT (Fast Hadamard Transform) is an algorithm used to accelerate vector inner product calculations.

The supported primary key types for extra_info_max_size include:

Numeric types: Integer types, floating-point types, and BIT_VALUE types.
Datetime types
Character types: VARCHAR type is supported.

The calculation method for the primary key information size:

SET @table_name = 'test'; -- Replace this with the table name to be queried.

SELECT 
    CASE 
        WHEN COUNT(*) <> COUNT(result_value) THEN 'not support' 
        ELSE COALESCE(SUM(result_value), 'not support') 
    END AS extra_info_size
FROM (
    SELECT
        CASE 
            WHEN vdt.data_type_class IN (1, 2, 3, 4, 6, 8, 9, 14, 27, 28) THEN 8 -- For numeric types, extra_info_size += 8
            WHEN oc.data_type = 22 THEN oc.data_length -- For varchar types, extra_info_size += data_length
            ELSE NULL -- Other types are not supported
        END AS result_value
    FROM 
        oceanbase.__all_column oc
    JOIN 
        oceanbase.__all_virtual_data_type vdt 
    ON 
        oc.data_type = vdt.data_type
    WHERE 
        oc.rowkey_position != 0 
        AND oc.table_id = (SELECT table_id FROM oceanbase.__all_table WHERE table_name = @table_name)
) AS result_table;

-- The result is 8 bytes.

tab IVF/IVF_SQ (Experimental feature)/IVF_PQ

Syntax for creating an index during table creation:

CREATE TABLE table_name (
    column_name1 data_type1,
    column_name2 data_type2,
    ...,
    VECTOR INDEX index_name (column_name) WITH (param1=value1, param2=value2, ...)
);

Syntax for creating an index after table creation:

-- Creating an index after table creation supports setting parallel degree to improve index construction performance. The maximum parallel degree should not exceed CPU cores * 2
CREATE [/*+ paralell $value*/] VECTOR INDEX index_name ON table_name(column_name) WITH (param1=value1, param2=value2, ...);

param parameter description:

Parameter	Default value	Value range	Required?	Description	Remarks
distance		l2/inner_product/cosine	Yes	The vector distance function type.	l2 indicates the Euclidean distance, inner_product indicates the inner product distance, and cosine indicates the cosine distance.
type		ivf_flat/ivf_sq8/ivf_pq	Yes	The IVF index type.
lib	ob	ob	No	The vector index library type.
nlist	128	[1,65536]	No	The number of clusters.
sample_per_nlist	256	[1,int64_max]	Yes	The number of samples for each cluster center, which is used when creating an index after table creation.
nbits	8	[1,24]	No	The number of quantization bits. This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an IVF_PQ index.	The recommended value is 8. The recommended value range is [8,10]. The larger the value, the higher the quantization accuracy and query accuracy, but the query performance will be affected.
m	No default value, must be specified	[1,65536]	Yes	The dimension of the quantized vectors. This parameter is supported starting from V1.0.0. You can specify this parameter only when you create an IVF_PQ index.	The larger the value, the slower the index construction, and the higher the query accuracy, but the query performance will be affected.

Query syntax and description

Vector index queries are approximate nearest neighbor queries and do not guarantee 100% accuracy. The accuracy of vector queries is measured by recall. For example, if a query for the 10 nearest neighbors can stably return 9 correct results, the recall is 90%. The recall is described as follows:

The recall is affected by the build parameters and query parameters.
The index query parameters are specified when the index is created and cannot be modified. However, you can set session variables to specify the parameters. The ob_hnsw_ef_search variable specifies the parameters for the HNSW/HNSW_SQ/HNSW_BQ index, and the ob_ivf_nprobes variable specifies the parameters for the IVF index. If you set a session variable, its value is prioritized. For more information, see ob_hnsw_ef_search and ob_ivf_nprobes.

The syntax for dense vector indexes is as follows:

SELECT ... FROM $table_name ORDER BY $distance_function($column_name, $vector_expr) [APPROXIMATE|APPROX] LIMIT $num (OFFSET $num);

Query usage notes are as follows:

Syntax requirements:
- The APPROXIMATE/APPROX keyword must be specified for the query to use the vector index instead of a full table scan.
- The query must include the ORDER BY and LIMIT clauses.
- The ORDER BY clause only supports a single vector condition.
- The value of LIMIT + OFFSET must be in the range (0, 16384].
Rules for distance functions:
- If APPROXIMATE/APPROX is specified, a supported distance function is called, and it matches the vector index algorithm, the query will use the vector index.
- If APPROXIMATE/APPROX is specified, but the distance function does not match the vector index algorithm, the query will not use the vector index, but no error is returned.
- If APPROXIMATE/APPROX is specified, but the distance function is not supported in the current version, the query will not use the vector index, and an error is returned.
- If APPROXIMATE/APPROX is not specified, and a supported distance function is called, the query will not use the vector index, but no error is returned.
Other notes:
- The WHERE condition will serve as a filter after the vector index query.
- Specifying the LIMIT clause is required; otherwise, an error will be returned.

Create, query, and delete examples

Create an index during table creation

Example of dense vector index

HNSW example

tip

When you create an HNSW index, the index name must be less than 25 characters in length. Otherwise, an exception may occur because the auxiliary table name exceeds the index_name limit. In future versions, the index name can be longer.

Create a test table.

CREATE TABLE t1(c1 INT, c0 INT, c2 VECTOR(10), c3 VECTOR(10), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw, lib=vsag),  VECTOR INDEX idx2(c3) WITH (distance=l2, type=hnsw, lib=vsag));

Write test data.

INSERT INTO t1 VALUES(1, 1,'[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]', '[0.203846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

INSERT INTO t1 VALUES(2, 2, '[0.735541,0.670776,0.903237,0.447223,0.232028,0.659316,0.765661,0.226980,0.579658,0.933939]', '[0.213846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

INSERT INTO t1 VALUES(3, 3, '[0.327936,0.048756,0.084670,0.389642,0.970982,0.370915,0.181664,0.940780,0.013905,0.628127]', '[0.223846,0.205289,0.880265,0.824340,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833]');

Perform an approximate nearest neighbor query.

SELECT * FROM t1 ORDER BY l2_distance(c2, [0.712338,0.603321,0.133444,0.428146,0.876387,0.763293,0.408760,0.765300,0.560072,0.900498]) APPROXIMATE LIMIT 1;

The query result is as follows:

+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
| c1 | c0   | c2                                                                                        | c3                                                                                         |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
|  3 |    3 | [0.327936,0.048756,0.08467,0.389642,0.970982,0.370915,0.181664,0.94078,0.013905,0.628127] | [0.223846,0.205289,0.880265,0.82434,0.615737,0.496899,0.983632,0.865571,0.248373,0.542833] |
+----+------+-------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------+
1 row in set

HNSW_SQ example

CREATE TABLE t2 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx1(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag));

HNSW_BQ example

CREATE TABLE t3 (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx3(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag));

The distance parameter of HNSW_BQ supports only the l2 value.

IVF example

tip

When you create an IVF index, the index name must be less than 33 characters in length. Otherwise, an exception may occur because the auxiliary table name exceeds the index_name limit. In future versions, the index name can be longer.

CREATE TABLE ivf_vecindex_suite_table_test (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1), VECTOR INDEX idx2(c2) WITH (distance=l2, type=ivf_flat));

Create an index after table creation

tip

Currently, only dense vector indexes can be created after table creation.

Example of HNSW index

Create a test table.

CREATE TABLE vec_table_hnsw (id INT, c2 VECTOR(10));

Create an HNSW index.

CREATE VECTOR INDEX vec_idx1 ON vec_table_hnsw(c2) WITH (distance=l2, type=hnsw);

View the created table.

SHOW CREATE TABLE vec_table_hnsw;

The return result is as follows:

+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table     | Create Table                                                                                                                                                                                                                                                                                                                                                                                         |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| vec_table_hnsw | CREATE TABLE `vec_table_hnsw` (
  `id` int(11) DEFAULT NULL,
  `c2` VECTOR(10) DEFAULT NULL,
  VECTOR KEY `vec_idx1` (`c2`) WITH (DISTANCE=L2, TYPE=HNSW, LIB=VSAG, M=16, EF_CONSTRUCTION=200, EF_SEARCH=64) BLOCK_SIZE 16384
) DEFAULT CHARSET = utf8mb4 ROW_FORMAT = DYNAMIC COMPRESSION = 'zstd_1.3.8' REPLICA_NUM = 2 BLOCK_SIZE = 16384 USE_BLOOM_FILTER = FALSE TABLET_SIZE = 134217728 PCTFREE = 0 |
+-----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set

SHOW INDEX FROM vec_table_hnsw;
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| Table     | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment   | Index_comment | Visible | Expression |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
| vec_table |          1 | vec_idx1 |            1 | c2          | A         |        NULL | NULL     | NULL   | YES  | VECTOR     | available |               | YES     | NULL       |
+-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+------------+
1 row in set

Example of HNSW_SQ index

Create a test table.

CREATE TABLE vec_table_hnsw_sq (c1 INT AUTO_INCREMENT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an HNSW_SQ index.

CREATE VECTOR INDEX vec_idx2 ON vec_table_hnsw_sq(c2) WITH (distance=l2, type=hnsw_sq, lib=vsag, m=16, ef_construction = 200);

Example of HNSW_BQ index

CREATE VECTOR INDEX vec_idx3 ON vec_table_hnsw_bq(c2) WITH (distance=l2, type=hnsw_bq, lib=vsag, m=16, ef_construction = 200);

The distance parameter of the HNSW_BQ index can be used only with the L2 algorithm.

Example of IVF index

Create a test table.

CREATE TABLE vec_table_ivf (c1 INT, c2 VECTOR(3), PRIMARY KEY(c1));

Create an IVF index.

CREATE VECTOR INDEX vec_idx3 ON vec_table_ivf(c2) WITH (distance=l2, type=ivf_flat);

Drop an index

DROP INDEX vec_idx1 ON vec_table;

View the dropped index.

SHOW INDEX FROM vec_table;

The return result is as follows:

Empty set

Maintenance

When there is a large amount of incremental data, the query performance decreases. To reduce the amount of incremental data in the table, seekdb introduced the DBMS_VECTOR package for maintaining vector indexes.

Incremental refresh

tip

IVF/IVF_SQ/IVF_PQ indexes do not support incremental refresh.

If a large amount of data is written after the index is created, we recommend that you perform an incremental refresh by using the REFRESH_INDEX procedure. For more information, see REFRESH_INDEX.

The system checks for incremental data every 15 minutes. If more than 10,000 incremental data records are found, the system automatically performs an incremental refresh.

Full refresh (rebuild)

Manual full table rebuild

If a large amount of data is updated or deleted after an index is created, it is recommended to use the REBUILD_INDEX procedure to perform a full refresh. For details and examples, see REBUILD_INDEX.

A full refresh is automatically checked every 24 hours. If the newly added data exceeds 20% of the original data, a full refresh will be triggered automatically. The full refresh runs asynchronously in the background: a new index is created first, and then the old index is replaced. During the rebuild process, the old index remains available, but the overall process is relatively slow.

We also provide the configuration item vector_index_memory_saving_mode to control the memory usage during index rebuild. Enabling this mode can reduce the memory consumption during vector index rebuild for partitioned tables. Typically, vector index rebuild requires memory equivalent to twice the index size. After enabling the memory-saving mode, the system will temporarily delete the memory index of a partition after building that partition to release memory, effectively reducing the total memory required for the rebuild operation. For syntax and examples, see vector_index_memory_saving_mode.

Notes:

When executing offline DDL operations (such as ALTER TABLE to modify the table structure or primary key), the index table will be rebuilt. Since parallel degree cannot be specified for index rebuild, the system uses single-threaded execution by default. Therefore, when the data volume is large, the rebuild process will be slow, affecting the efficiency of the entire offline DDL operation.
When rebuilding an index, if you need to modify index parameters, you must specify both type and distance in the parameter list, and type and distance must match the original index type. For example, if the original index type is hnsw and the distance algorithm is l2, you must specify both type=hnsw and distance=l2 during rebuild.
When rebuilding an index, the following are supported:
- Modifying m, ef_search, and ef_construction values.
- Online rebuild of the ef_search parameter.
- Index type rebuild between hnsw - hnsw_sq.
- Index type rebuild between ivf_flat - ivf_flat, ivf_sq8 - ivf_sq8, ivf_pq - ivf_pq.
- Setting parallel degree during rebuild. For examples, see REBUILD_INDEX.
When rebuilding an index, the following are not supported:
- Modifying type and distance types.
- Index rebuild between hnsw - ivf.
- Index rebuild between hnsw - hnsw_bq.
- Cross rebuild between ivf_flat, ivf_pq, and ivf_sq8.

Automatic partition rebuild (recommended)

tip

This feature is supported starting from V1.0.0. If your vector database is upgraded from an earlier version to V1.0.0, you need to manually rebuild all vector indexes for the entire table after the upgrade. Otherwise, automatic partition rebuild tasks may not be executed after the upgrade.

This feature only supports HNSW/HNSW_SQ/HNSW_BQ indexes.

There are two scenarios that trigger automatic partition rebuild tasks in the current version:

When executing vector index query statements.
Scheduled checks, with configurable execution cycle.

Configure execution cycle

In the seekdb database, configure the execution cycle through the configuration item vector_index_optimize_duty_time. Example:
```
ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';
```
After the above configuration, partition rebuild tasks will only be executed between 23:00:00 and 24:00:00, and will not be initiated at other times. For detailed parameter descriptions, see the corresponding configuration item documentation.
View task progress/history

You can view task progress and history through the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view.

Determine the current task status through the status field:
- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task is paused.
- 3 (FINISHED): The task has been completed.
Completed tasks, i.e., tasks with status=FINISHED, will be archived to the history table regardless of whether they succeeded. For detailed usage examples, see the corresponding view documentation.
Cancel task

To cancel a task, obtain the trace_id from the DBA_OB_VECTOR_INDEX_TASKS or CDB_OB_VECTOR_INDEX_TASKS view, then execute the following command:
```
ALTER SYSTEM CANCEL TASK <trace_id>;
```
Example:
```
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";
```

Performance optimization

tip

Only the IVF index is supported.

seekdb provides an automatic performance optimization mechanism for the IVF index to improve query performance through cache management and regular maintenance.

Optimization mechanism

IVF index performance optimization includes two types of automated tasks:

Cache warming task: Periodically checks all IVF indexes. If it finds that the cache corresponding to an index does not exist, it automatically triggers cache warming and loads the index data into memory. Additionally, cache warming is automatically performed when an IVF index is created.
Cache cleanup task: Periodically checks all IVF caches. If it finds that the cache corresponds to an index that has been deleted, it automatically cleans up the invalid cache and releases memory resources. Additionally, cache cleanup is automatically performed when an IVF index is deleted.

Configure the optimization cycle

The system allows you to customize the execution time window for performance optimization tasks to avoid impacting performance during peak business hours.

In the seekdb database, you can set the execution cycle using the vector_index_optimize_duty_time parameter:

ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';

The configuration is described as follows:

The time format is [start time, end time].
The above configuration means that optimization tasks will only be executed between 23:00:00 and 24:00:00.
Optimization tasks will not be initiated at other times to avoid impacting normal business operations.

Monitor optimization tasks

seekdb vector indexes provide monitoring capabilities for optimization tasks:

You can view tasks that are being executed or waiting to be executed through the DBA_OB_VECTOR_INDEX_TASKS view.
You can view historical task records through the DBA_OB_VECTOR_INDEX_TASK_HISTORY view.

Usage examples:

View the current task status

View tasks that are being executed or waiting to be executed through the DBA_OB_VECTOR_INDEX_TASKS view:

SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS;

Sample return result:

+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | USER         | FINISHED |         2 | 1750671687770026 |        0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

Description of the task status:

STATUS = 0: PREPARE, the task is waiting to be executed.
STATUS = 1: RUNNING, the task is being executed.
STATUS = 3: FINISHED, the task has been completed.

Description of the task type:

TASK_TYPE = 2: IVF cache warming task.
TASK_TYPE = 3: IVF invalid cache cleanup task.

View the history task records

Completed tasks (with STATUS = 3) are automatically archived to the history table every 10 seconds, regardless of whether they were successful. View the history through the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view:

-- Query the history of a specified task ID
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASKS_HISTORY WHERE TASK_ID=2002281;

Sample return result:

+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500020 | 1152921504606846990 | 2002281 | 1970-08-23 17:10:23.174127 | 1970-08-23 17:10:23.174137 | AUTO         | FINISHED |         2 | 1750671687770026 |        0 | YAFF00B9E4D97-00063839E6BD9BBC-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

Cancel an optimization task

You can cancel a specified task by using the following command:

-- trace_id is obtained from the DBA_OB_VECTOR_INDEX_TASKS_HISTORY view
ALTER SYSTEM CANCEL TASK <trace_id>;

tip

You can cancel a task only in the failed retry phase by executing the ALTER SYSTEM CANCEL TASK statement. If a background task is stuck in a specific execution phase, it cannot be canceled by using this statement.

Example:

-- Log in to the system and obtain the trace_id of the specified task
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY WHERE TASK_ID=2037736;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER         | PREPARED |         2 | 1750682301145225 |       -1 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

-- Cancel the task
ALTER SYSTEM CANCEL TASK "YAFF00B9E4D97-00063839E6BDDEE0-0-1";

After the task is canceled, the task status changes to CANCELLED.

-- Log in to the user database and query the task status
SELECT * FROM oceanbase.DBA_OB_VECTOR_INDEX_TASK_HISTORY;
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
| TABLE_ID | TABLET_ID           | TASK_ID | START_TIME                 | MODIFY_TIME                | TRIGGER_TYPE | STATUS   | TASK_TYPE | TASK_SCN         | RET_CODE | TRACE_ID                           |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
|   500041 | 1152921504606847008 | 2037736 | 1970-08-23 17:10:23.203821 | 1970-08-23 17:10:23.203821 | USER         | FINISHED |         2 | 1750682301145225 |    -4072 | YAFF00B9E4D97-00063839E6BDDEE0-0-1 |
+----------+---------------------+---------+----------------------------+----------------------------+--------------+----------+-----------+------------------+----------+------------------------------------+
1 row in set

References

Use SQL functions

Index types​

Index memory estimation and actual usage query​

Creation syntax and description​

Query syntax and description​

Create, query, and delete examples​

Create an index during table creation​

Example of dense vector index​

HNSW example​

HNSW_SQ example​

HNSW_BQ example​

IVF example​

Create an index after table creation​

Example of HNSW index​

Example of HNSW_SQ index​

Example of HNSW_BQ index​

Example of IVF index​

Drop an index​

Maintenance​

Incremental refresh​

Full refresh (rebuild)​

Manual full table rebuild​

Automatic partition rebuild (recommended)​

Performance optimization​

Optimization mechanism​

Configure the optimization cycle​

Monitor optimization tasks​

Cancel an optimization task​

References​

Contents

Index types

Index memory estimation and actual usage query

Creation syntax and description

Query syntax and description

Create, query, and delete examples

Create an index during table creation

Example of dense vector index

HNSW example

HNSW_SQ example

HNSW_BQ example

IVF example

Create an index after table creation

Example of HNSW index

Example of HNSW_SQ index

Example of HNSW_BQ index

Example of IVF index

Drop an index

Maintenance

Incremental refresh

Full refresh (rebuild)

Manual full table rebuild

Automatic partition rebuild (recommended)

Performance optimization

Optimization mechanism

Configure the optimization cycle

Monitor optimization tasks

Cancel an optimization task

References