Monitor and maintain vector indexes
This topic describes how to monitor and maintain vector indexes in seekdb.
Monitor
You can query system views to obtain the basic information and real-time status of vector indexes.
- You can query the [G]V$OB_HNSW_INDEX_INFO view to obtain the basic information and real-time status of HNSW series vector indexes.
- You can query the [G]V$OB_IVF_INDEX_INFO view to obtain the basic information and real-time status of IVF series vector indexes.
Maintain
The search performance decreases when the incremental data is excessive. To reduce the amount of data in the incremental data table, seekdb provides the DBMS_VECTOR package to maintain vector indexes.
Incremental refresh
This feature is supported for HNSW series vector indexes, semantic indexes, and in-memory sparse indexes, but not for IVF series vector indexes.
If you use an asynchronous embedding mode for a semantic index, the incremental refresh will trigger additional embedding of data to ensure that the incremental data is correctly converted into vectors and added to the index.
If a large amount of data is written after the index is created, we recommend that you perform an incremental refresh by using the REFRESH_INDEX procedure. For more information, see REFRESH_INDEX.
The system checks for incremental data every 15 minutes. If the number of incremental data entries exceeds 10,000, the system automatically performs an incremental refresh.
Full refresh (rebuild)
Manually rebuild the entire table
If a large amount of data is updated or deleted after the index is created, we recommend that you perform a full refresh by using the REBUILD_INDEX procedure. For more information, see REBUILD_INDEX.
The system checks for new data every 24 hours. If the new data exceeds 20% of the original data, the system automatically performs a full refresh. The full refresh is performed in the background asynchronously. First, the system creates a new index and then replaces the old index. During the rebuild process, the old index remains available, but the overall process is relatively slow.
We also provide the vector_index_memory_saving_mode parameter to control the memory usage during index rebuild. Enabling this mode reduces the memory consumption during the rebuild of partitioned table vector indexes. Typically, vector index rebuild consumes twice the memory of the index. When this mode is enabled, the system temporarily deletes the memory index of each partition after the partition is built, thereby effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.
Consider the following notes:
-
When you perform an offline DDL operation (such as
ALTER TABLEto modify the table schema or primary key), the index table is rebuilt. Since parallelism cannot be specified during index rebuild, the system uses a single thread by default. Therefore, the rebuild process is relatively slow when the data volume is large, which affects the overall efficiency of the offline DDL operation. -
If you need to modify the index parameters during index rebuild, you must specify both
typeanddistancein the parameter list, and they must be consistent with the original index type. For example, if the original index type ishnswand the distance algorithm isl2, you must specifytype=hnswanddistance=l2during the rebuild. -
During index rebuild, you can:
- Modify the values of
m,ef_search, andef_construction. - Online rebuild the
ef_searchparameter. - Rebuild the index type between
hnswandhnsw_sq. - Rebuild the index type between
ivf_flatandivf_flat, or betweenivf_pqandivf_pq. - Specify the parallelism during rebuild. For more information, see REBUILD_INDEX.
- Modify the values of
-
During index rebuild, you cannot:
- Modify the
typeanddistancetypes. - Rebuild the index type between
hnswandivf. - Rebuild the index type between
hnswandhnsw_bq. - Rebuild the index type between
ivf_flatandivf_pq.
- Modify the
Automatic partition rebuild (recommended)
This feature is supported for HNSW series vector indexes, semantic indexes, and in-memory sparse indexes, but not for IVF series vector indexes.
During automatic partition rebuild for a semantic index, both incremental data and snapshot tasks are processed to ensure the consistency and integrity of the index data.
The current version triggers automatic partition rebuild tasks in the following two scenarios:
- When a vector index search statement is executed.
- During a scheduled check, which can be manually configured.
-
Configure the execution cycle
In the
oceanbasedatabase, configure the execution cycle by using the vector_index_optimize_duty_time parameter. Here is an example:ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';After the above configuration, the partition rebuild task will only execute between 23:00:00 and 24:00:00, and will not initiate during other periods. For more information about the parameters, see the corresponding parameter documentation.
-
View task progress and history
You can query the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view to view the task progress and history.
You can determine the current status of a task by checking the
statusfield:- 0 (PREPARE): The task is waiting to be executed.
- 1 (RUNNING): The task is being executed.
- 2 (PENDING): The task is paused.
- 3 (FINISHED): The task has been completed.
The tasks that have completed, regardless of whether they were successful, are stored in the history table. For more information, see the corresponding view documentation.
-
Cancel a task
To cancel a task, you can obtain the trace_id from the
DBA_OB_VECTOR_INDEX_TASKSorCDB_OB_VECTOR_INDEX_TASKSview and execute the following command:ALTER SYSTEM CANCEL TASK <trace_id>;Here is an example:
ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";