Skip to main content

Benchmark testing with VectorDBBench

VectorDBBench is a benchmarking tool designed to provide benchmark test results for mainstream vector databases and cloud services. This topic explains how to use VectorDBBench to test the performance of seekdb vector database. Designed for ease of use, VectorDBBench allows you to easily replicate test results or test new systems.

Prerequisites

  • Deploy seekdb.

  • Install Python 3.11 or later. The following example uses Conda for installation:

    # Download and install Conda
    mkdir -p ~/miniconda3
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
    rm ~/miniconda3/miniconda.sh

    # Reopen your terminal and initialize Conda
    source ~/miniconda3/bin/activate
    conda init --all

    # Create and initialize the Python environment required by VectorDBBench
    conda create -n vdb python=3.11
    conda activate vdb
  • Connect to the database and optimize memory and query parameters for HNSW vector index searches:

    -- Set ob_vector_memory_limit_percentage to 30%.
    ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
    -- Set ob_query_timeout to 24 hours.
    SET GLOBAL ob_query_timeout = 86400000000;
    -- Set max_allowed_packet to 1 GB.
    SET GLOBAL max_allowed_packet=1073741824;
    -- Set ddl_thread_score and parallel_servers_target to configure parallelism when creating indexes
    ALTER SYSTEM SET ddl_thread_score = 8; -- Parallelism for DDL operations
    SET GLOBAL parallel_servers_target = 624; -- Number of parallel queries the database server can handle simultaneously

    Here, ob_vector_memory_limit_percentage = 30 is only an example value. Adjust it based on the database memory and workload.

The recommended resource specifications for the database are as follows:

ParameterValue
Memory64 GB
CPU16 cores

Testing methods

Clone the VectorDBBench code

tip

We recommend that you deploy VectorDBBench and seekdb on separate servers to avoid CPU resource contention and improve the reliability of test results.

Clone the VectorDBBench test tool code to your local server.

git clone https://github.com/zilliztech/VectorDBBench.git

Install dependencies

Go to the VectorDBBench directory and install the dependencies.

cd VectorDBBench
pip install .

Run the test

Run VectorDBBench. Two examples are provided here: HNSW index and IVF index.

HNSW index example

# Replace $host, $port, and $user with the actual seekdb connection information.
vectordbbench oceanbasehnsw --host $host --port $port --user $user --database test --m 16 --ef-construction 200 --ef-search 40 --k 10 --case-type Performance768D1M --index-type HNSW

For more information about the parameters, run the following command:

vectordbbench oceanbasehnsw --help

Commonly used options are described as follows:

  • --num-concurrency: Used to adjust the concurrency level. VectorDBBench executes vector queries with the specified concurrency and selects the highest QPS (Queries Per Second) as the final result.
  • --skip-drop-old/--skip-load: Skips the deletion of old data and the data loading step. After adding these two options to the command line, the command only performs vector query operations and does not delete old data or reload data.
  • --k: Specifies the number of top-k nearest neighbor results to return in a vector query.
  • --ef-search: HNSW query parameter that indicates the size of the candidate set during query.
  • --index-type: Specifies the index type. Currently supports HNSW, HNSW_SQ, and HNSW_BQ.

IVF index example

vectordbbench oceanbaseivf --host $host --port $port --user $user --database test --nlist 1000 --sample_per_nlist 256 --ivf_nprobes 100  --case-type Performance768D1M --index-type IVF_FLAT

Commonly used options are described as follows:

  • --sample_per_nlist: The amount of data sampled per cluster center. Default value is 256.
  • --ivf_nprobes: Used to set how many nearest cluster centers to search in this query when performing vector index queries. Default value is 8. The larger the value, the higher the recall rate, but the search time also increases.
  • --index-type: Specifies the index type. Currently supports IVF_FLAT.

For more information about the parameters, run the following command:

vectordbbench oceanbaseivf --help

FAQs

Is it normal for the first test execution to be slow?

The first test execution requires downloading the required dataset from AWS S3 storage, which may take relatively longer. This is normal.

Can I customize and modify the test code?

Yes, you can. If you customize and modify the test code, you need to run pip install . again before running the test.