Benchmark testing with VectorDBBench
VectorDBBench is a benchmarking tool designed to provide benchmark test results for mainstream vector databases and cloud services. This topic explains how to use VectorDBBench to test the performance of seekdb vector database. Designed for ease of use, VectorDBBench allows you to easily replicate test results or test new systems.
Prerequisites
-
Deploy seekdb.
-
Install Python 3.11 or later. The following example uses Conda for installation:
# Download and install Conda
mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
# Reopen your terminal and initialize Conda
source ~/miniconda3/bin/activate
conda init --all
# Create and initialize the Python environment required by VectorDBBench
conda create -n vdb python=3.11
conda activate vdb -
Connect to the database and optimize memory and query parameters for HNSW vector index searches:
-- Set ob_vector_memory_limit_percentage to 30%.
ALTER SYSTEM SET ob_vector_memory_limit_percentage = 30;
-- Set ob_query_timeout to 24 hours.
SET GLOBAL ob_query_timeout = 86400000000;
-- Set max_allowed_packet to 1 GB.
SET GLOBAL max_allowed_packet=1073741824;
-- Set ddl_thread_score and parallel_servers_target to configure parallelism when creating indexes
ALTER SYSTEM SET ddl_thread_score = 8; -- Parallelism for DDL operations
SET GLOBAL parallel_servers_target = 624; -- Number of parallel queries the database server can handle simultaneouslyHere,
ob_vector_memory_limit_percentage = 30is only an example value. Adjust it based on the database memory and workload.
Recommended configuration
The recommended resource specifications for the database are as follows:
| Parameter | Value |
|---|---|
| Memory | 64 GB |
| CPU | 16 cores |
Testing methods
Clone the VectorDBBench code
We recommend that you deploy VectorDBBench and seekdb on separate servers to avoid CPU resource contention and improve the reliability of test results.
Clone the VectorDBBench test tool code to your local server.
git clone https://github.com/zilliztech/VectorDBBench.git
Install dependencies
Go to the VectorDBBench directory and install the dependencies.
cd VectorDBBench
pip install .
Run the test
Run VectorDBBench. Two examples are provided here: HNSW index and IVF index.
HNSW index example
# Replace $host, $port, and $user with the actual seekdb connection information.
vectordbbench oceanbasehnsw --host $host --port $port --user $user --database test --m 16 --ef-construction 200 --ef-search 40 --k 10 --case-type Performance768D1M --index-type HNSW
For more information about the parameters, run the following command:
vectordbbench oceanbasehnsw --help
Commonly used options are described as follows:
--num-concurrency: Used to adjust the concurrency level. VectorDBBench executes vector queries with the specified concurrency and selects the highest QPS (Queries Per Second) as the final result.--skip-drop-old/--skip-load: Skips the deletion of old data and the data loading step. After adding these two options to the command line, the command only performs vector query operations and does not delete old data or reload data.--k: Specifies the number of top-k nearest neighbor results to return in a vector query.--ef-search: HNSW query parameter that indicates the size of the candidate set during query.--index-type: Specifies the index type. Currently supportsHNSW,HNSW_SQ, andHNSW_BQ.
IVF index example
vectordbbench oceanbaseivf --host $host --port $port --user $user --database test --nlist 1000 --sample_per_nlist 256 --ivf_nprobes 100 --case-type Performance768D1M --index-type IVF_FLAT
Commonly used options are described as follows:
--sample_per_nlist: The amount of data sampled per cluster center. Default value is256.--ivf_nprobes: Used to set how many nearest cluster centers to search in this query when performing vector index queries. Default value is8. The larger the value, the higher the recall rate, but the search time also increases.--index-type: Specifies the index type. Currently supportsIVF_FLAT.
For more information about the parameters, run the following command:
vectordbbench oceanbaseivf --help
FAQs
Is it normal for the first test execution to be slow?
The first test execution requires downloading the required dataset from AWS S3 storage, which may take relatively longer. This is normal.
Can I customize and modify the test code?
Yes, you can. If you customize and modify the test code, you need to run pip install . again before running the test.