Skip to main content

Monitor and maintain vector indexes

This topic describes how to monitor and maintain vector indexes in seekdb.

Monitor

You can query system views to obtain the basic information and real-time status of vector indexes.

  • You can query the [G]V$OB_HNSW_INDEX_INFO view to obtain the basic information and real-time status of HNSW series vector indexes.
  • You can query the [G]V$OB_IVF_INDEX_INFO view to obtain the basic information and real-time status of IVF series vector indexes.

Maintain

The search performance decreases when the incremental data is excessive. To reduce the amount of data in the incremental data table, seekdb provides the DBMS_VECTOR package to maintain vector indexes.

Incremental refresh

tip

This feature is supported for HNSW series vector indexes, semantic indexes, and in-memory sparse indexes, but not for IVF series vector indexes.

tip

If you use an asynchronous embedding mode for a semantic index, the incremental refresh will trigger additional embedding of data to ensure that the incremental data is correctly converted into vectors and added to the index.

If a large amount of data is written after the index is created, we recommend that you perform an incremental refresh by using the REFRESH_INDEX procedure. For more information, see REFRESH_INDEX.

The system checks for incremental data every 15 minutes. If the number of incremental data entries exceeds 10,000, the system automatically performs an incremental refresh.

Full refresh (rebuild)

Manually rebuild the entire table

If a large amount of data is updated or deleted after the index is created, we recommend that you perform a full refresh by using the REBUILD_INDEX procedure. For more information, see REBUILD_INDEX.

The system checks for new data every 24 hours. If the new data exceeds 20% of the original data, the system automatically performs a full refresh. The full refresh is performed in the background asynchronously. First, the system creates a new index and then replaces the old index. During the rebuild process, the old index remains available, but the overall process is relatively slow.

We also provide the vector_index_memory_saving_mode parameter to control the memory usage during index rebuild. Enabling this mode reduces the memory consumption during the rebuild of partitioned table vector indexes. Typically, vector index rebuild consumes twice the memory of the index. When this mode is enabled, the system temporarily deletes the memory index of each partition after the partition is built, thereby effectively reducing the total memory required for the rebuild. For more information, see vector_index_memory_saving_mode.

Consider the following notes:

  • When you perform an offline DDL operation (such as ALTER TABLE to modify the table schema or primary key), the index table is rebuilt. Since parallelism cannot be specified during index rebuild, the system uses a single thread by default. Therefore, the rebuild process is relatively slow when the data volume is large, which affects the overall efficiency of the offline DDL operation.

  • If you need to modify the index parameters during index rebuild, you must specify both type and distance in the parameter list, and they must be consistent with the original index type. For example, if the original index type is hnsw and the distance algorithm is l2, you must specify type=hnsw and distance=l2 during the rebuild.

  • During index rebuild, you can:

    • Modify the values of m, ef_search, and ef_construction.
    • Online rebuild the ef_search parameter.
    • Rebuild the index type between hnsw and hnsw_sq.
    • Rebuild the index type between ivf_flat and ivf_flat, or between ivf_pq and ivf_pq.
    • Specify the parallelism during rebuild. For more information, see REBUILD_INDEX.
  • During index rebuild, you cannot:

    • Modify the type and distance types.
    • Rebuild the index type between hnsw and ivf.
    • Rebuild the index type between hnsw and hnsw_bq.
    • Rebuild the index type between ivf_flat and ivf_pq.
tip

This feature is supported for HNSW series vector indexes, semantic indexes, and in-memory sparse indexes, but not for IVF series vector indexes.

tip

During automatic partition rebuild for a semantic index, both incremental data and snapshot tasks are processed to ensure the consistency and integrity of the index data.

The current version triggers automatic partition rebuild tasks in the following two scenarios:

  • When a vector index search statement is executed.
  • During a scheduled check, which can be manually configured.
  1. Configure the execution cycle

    In the oceanbase database, configure the execution cycle by using the vector_index_optimize_duty_time parameter. Here is an example:

    ALTER SYSTEM SET vector_index_optimize_duty_time='[23:00:00, 24:00:00]';

    After the above configuration, the partition rebuild task will only execute between 23:00:00 and 24:00:00, and will not initiate during other periods. For more information about the parameters, see the corresponding parameter documentation.

  2. View task progress and history

    You can query the CDB/DBA_OB_VECTOR_INDEX_TASKS or CDB/DBA_OB_VECTOR_INDEX_TASK_HISTORY view to view the task progress and history.

    You can determine the current status of a task by checking the status field:

    • 0 (PREPARE): The task is waiting to be executed.
    • 1 (RUNNING): The task is being executed.
    • 2 (PENDING): The task is paused.
    • 3 (FINISHED): The task has been completed.

    The tasks that have completed, regardless of whether they were successful, are stored in the history table. For more information, see the corresponding view documentation.

  3. Cancel a task

    To cancel a task, you can obtain the trace_id from the DBA_OB_VECTOR_INDEX_TASKS or CDB_OB_VECTOR_INDEX_TASKS view and execute the following command:

    ALTER SYSTEM CANCEL TASK <trace_id>;

    Here is an example:

    ALTER SYSTEM CANCEL TASK "Y61480BA2D976-00063084E80435E2-0-1";

References