Experience vector search
Vector search overview
In today's era of information explosion, users often need to quickly retrieve the information they need from massive amounts of data. For example, online literature databases, e-commerce platform product catalogs, and growing multimedia content libraries all require efficient retrieval systems to quickly locate content of interest to users. As data volumes continue to grow, traditional keyword-based retrieval methods can no longer meet users' needs for retrieval accuracy and speed. Vector search technology can effectively solve these problems. Vector search encodes different types of data such as text, images, and audio into mathematical vectors and performs retrieval in vector space. This method allows systems to capture deep semantic information of data, thereby providing more accurate and efficient retrieval results.
seekdb provides the capability to store, index, and search embedding vector data, and supports storing vector data together with other data.
seekdb supports up to 16,000 dimensions of float-type dense vectors, sparse vectors, and various types of vector distance calculations such as Manhattan distance, Euclidean distance, inner product, and cosine distance. It supports creating vector indexes based on HNSW/IVF, and supports incremental updates and deletions without affecting recall.
seekdb vector search has hybrid search capabilities with scalar filtering. It also provides flexible access interfaces, supporting SQL access through MySQL protocol clients in various languages, as well as Python SDK access. It has also completed adaptation to AI application development frameworks LlamaIndex and DB-GPT, and AI application development platform Dify, better serving AI application development.
This topic demonstrates how to quickly perform vector search using SQL.
Prerequisites
- Ensure that seekdb is installed.
- You are connected to seekdb.
Quick start
-
Create vector columns and indexes.
When creating a table, you can use the
VECTOR(dim)data type to declare a column as a vector column and specify its dimension. Vector indexes must be created on vector columns, and at least two parameters,typeanddistance, must be provided.The example creates a vector column
embeddingwith a dimension of3, and creates an HNSW index on theembeddingcolumn, specifying the distance algorithm as L2.CREATE TABLE t1(
id INT PRIMARY KEY,
doc VARCHAR(200),
embedding VECTOR(3),
VECTOR INDEX idx1(embedding) WITH (distance=L2, type=hnsw)
); -
Insert vector data.
To simulate a vector search scenario, you need to construct some vector data first. Each row of data includes a description of the data and the corresponding vector. In the example, it is assumed that
'apple'corresponds to the vector'[1.2,0.7,1.1]', and'carrot'corresponds to the vector'[5.3,4.8,5.4]', and so on.INSERT INTO t1
VALUES (1, 'apple', '[1.2,0.7,1.1]'),
(2, 'banana', '[0.6,1.2,0.8]'),
(3, 'orange','[1.1,1.1,0.9]'),
(4, 'carrot', '[5.3,4.8,5.4]'),
(5, 'spinach', '[4.9,5.3,4.8]'),
(6, 'tomato','[5.2,4.9,5.1]');For convenience of demonstration, this example simplifies the vector dimension to only 3 dimensions, and the vectors are manually generated. In actual applications, you need to use embedding models to generate vectors from real text, and the dimensions can reach hundreds or thousands.
You can check whether the data is inserted successfully by querying the table.
SELECT * FROM t1;The expected result is as follows:
+----+---------+---------------+
| id | doc | embedding |
+----+---------+---------------+
| 1 | apple | [1.2,0.7,1.1] |
| 2 | banana | [0.6,1.2,0.8] |
| 3 | orange | [1.1,1.1,0.9] |
| 4 | carrot | [5.3,4.8,5.4] |
| 5 | spinach | [4.9,5.3,4.8] |
| 6 | tomato | [5.2,4.9,5.1] |
+----+---------+---------------+
6 rows in set -
Perform vector search.
To perform vector search, you need to provide a vector as the search condition. Suppose we need to find all
'fruits', and the corresponding vector is[0.9, 1.0, 0.9], then the corresponding SQL is:SELECT id, doc FROM t1
ORDER BY l2_distance(embedding, '[0.9, 1.0, 0.9]')
APPROXIMATE LIMIT 3;The expected result is as follows:
+----+--------+
| id | doc |
+----+--------+
| 3 | orange |
| 2 | banana |
| 1 | apple |
+----+--------+
3 rows in set
Comparison between exact search and approximate search
Perform exact search
Exact search uses a full scan strategy, performing exact search by calculating the distance between the query vector and all vectors in the dataset. This method can guarantee complete accuracy of search results, but since full distance calculation is required, search performance will significantly decrease as the data scale grows.
When performing exact search, the system calculates and compares the distance between the query vector vₑ and all vectors in the vector space. After completing the full distance calculation, the system selects the k vectors with the closest distance as the search results.
Example: Euclidean similarity search
Euclidean similarity search is used to retrieve the top-k vectors closest to the query vector in vector space, using Euclidean distance as the metric. The following example demonstrates how to use exact search to retrieve the top 5 vectors closest to the query vector from a table:
-- Create a test table
CREATE TABLE t1 (
id INT PRIMARY KEY,
c1 VECTOR(3)
);
-- Insert data
INSERT INTO t1 VALUES
(1, '[0.1, 0.2, 0.3]'),
(2, '[0.2, 0.3, 0.4]'),
(3, '[0.3, 0.4, 0.5]'),
(4, '[0.4, 0.5, 0.6]'),
(5, '[0.5, 0.6, 0.7]'),
(6, '[0.6, 0.7, 0.8]'),
(7, '[0.7, 0.8, 0.9]'),
(8, '[0.8, 0.9, 1.0]'),
(9, '[0.9, 1.0, 0.1]'),
(10, '[1.0, 0.1, 0.2]');
-- Perform exact search
SELECT c1
FROM t1
ORDER BY l2_distance(c1, '[0.1, 0.2, 0.3]') LIMIT 5;
The result is as follows:
+---------------+
|| c1 |
+---------------+
|| [0.1,0.2,0.3] |
|| [0.2,0.3,0.4] |
|| [0.3,0.4,0.5] |
|| [0.4,0.5,0.6] |
|| [0.5,0.6,0.7] |
+---------------+
5 rows in set
Perform approximate search using vector indexes
Vector index search uses an approximate nearest neighbor (ANN) strategy, accelerating the search process through pre-built index structures. Although it cannot guarantee 100% accuracy of results, it can significantly improve search performance, achieving a good balance between accuracy and performance in practical applications.
Example: HNSW index approximate search
-- Create an HNSW vector index with the table
CREATE TABLE t2 (
id INT PRIMARY KEY,
vec VECTOR(3),
VECTOR INDEX idx(vec) WITH (distance=l2, type=hnsw, lib=vsag)
);
-- Insert test data
INSERT INTO t2 VALUES
(1, '[0.1, 0.2, 0.3]'),
(2, '[0.2, 0.3, 0.4]'),
(3, '[0.3, 0.4, 0.5]'),
(4, '[0.4, 0.5, 0.6]'),
(5, '[0.5, 0.6, 0.7]'),
(6, '[0.6, 0.7, 0.8]'),
(7, '[0.7, 0.8, 0.9]'),
(8, '[0.8, 0.9, 1.0]'),
(9, '[0.9, 1.0, 0.1]'),
(10, '[1.0, 0.1, 0.2]');
-- Perform approximate search, returning the 5 most similar records
SELECT id, vec
FROM t2
ORDER BY l2_distance(vec, '[0.1, 0.2, 0.3]')
APPROXIMATE
LIMIT 5;
The result is as follows. Due to the small data volume, it is consistent with the exact search result above:
+------+---------------+
|| id | vec |
+------+---------------+
|| 1 | [0.1,0.2,0.3] |
|| 2 | [0.2,0.3,0.4] |
|| 3 | [0.3,0.4,0.5] |
|| 4 | [0.4,0.5,0.6] |
|| 5 | [0.5,0.6,0.7] |
+------+---------------+
5 rows in set
Summary
A comparison of the two search methods is as follows:
| Comparison item | Exact search | Approximate search |
|---|---|---|
| Execution method | Full table scan (TABLE FULL SCAN) followed by sorting | Direct search through vector index (VECTOR INDEX SCAN) |
| Performance characteristics | Requires scanning all table data and sorting, performance significantly decreases as data volume grows | Directly locates target data through index, stable performance |
| Result accuracy | 100% accurate, guarantees returning true nearest neighbors | Approximately accurate, may have minor errors |
| Applicable scenarios | Small data volumes, scenarios with high accuracy requirements | Large-scale datasets, scenarios with high performance requirements |
What's next
For more guides on experiencing seekdb's AI Native features and building AI applications based on seekdb, see:
- Experience full-text indexing
- Experience hybrid search
- Experience AI function service
- Experience semantic indexing
- Experience the Vibe Coding paradigm with Cursor Agent + OceanBase MCP
- Build a knowledge base desktop application based on seekdb
- Build a cultural tourism assistant with multi-model integration based on seekdb
- Build an image search application based on seekdb
In addition to using SQL for operations, you can also use the Python SDK (pyseekdb) provided by seekdb. For usage instructions, see Experience embedded seekdb and pyseekdb overview.