Experience hybrid vector index in seekdb
This tutorial guides you through getting started with seekdb's hybrid vector index, helping you understand the practical applications of hybrid vector indexes and experience the powerful features of hybrid vector indexes. You can achieve semantic retrieval by directly storing text without manually converting to vectors.
Overview
Hybrid vector index refers to a vector index that can automatically convert text to vectors and build indexes. It is a powerful feature provided by seekdb that makes the vector concept transparent to users. Compared to vector indexes that do not use hybrid vector indexes, hybrid vector indexes greatly simplify the usage process.
- Vector index process without hybrid vector index:
Text → Manually call `AI_EMBED` function to generate vectors → Insert vectors → Use vector retrieval - Hybrid vector index process:
Text → Direct insertion → Direct text retrieval
seekdb automatically converts text to vectors and builds indexes internally. During retrieval, you only need to provide the original text, and the system automatically performs embedding and retrieves the vector index, significantly improving ease of use.
Prerequisites
- Contact the administrator to obtain the corresponding database connection string, then execute the following command to connect to the database:
# host: seekdb database connection IP.
# port: seekdb database connection port.
# database_name: Name of the database to access.
# user_name: Database username.
# password: Database password.
obclient -h$host -P$port -u$user_name -p$password -D$database_name - Ensure that you have the relevant permissions for AI function service, and ensure that an embedding model has been registered in the database using the
CREATE_AI_MODELandCREATE_AI_MODEL_ENDPOINTprocedures:CALL DBMS_AI_SERVICE.DROP_AI_MODEL ('ob_embed');
CALL DBMS_AI_SERVICE.DROP_AI_MODEL_ENDPOINT ('ob_embed_endpoint');
CALL DBMS_AI_SERVICE.CREATE_AI_MODEL(
'ob_embed', '{
"type": "dense_embedding",
"model_name": "BAAI/bge-m3"
}');
CALL DBMS_AI_SERVICE.CREATE_AI_MODEL_ENDPOINT (
'ob_embed_endpoint', '{
"ai_model_name": "ob_embed",
"url": "https://api.siliconflow.cn/v1/embeddings",
-- Replace with actual access_key
"access_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxx",
"provider": "siliconflow"
}');
The hybrid vector index feature currently only supports HNSW/HNSW_BQ index types.
Step 1: Create a hybrid vector index
Hybrid vector indexes support two methods: create during table creation and create after table creation.
When creating a hybrid vector index, you must specify it on a VARCHAR column and specify the embedding model and vector dimension.
Create during table creation
CREATE TABLE items (
id INT PRIMARY KEY,
doc VARCHAR(100),
VECTOR INDEX vector_idx(doc)
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=immediate)
);
Create after table creation
CREATE TABLE items1 (
id INT PRIMARY KEY,
doc VARCHAR(100)
);
CREATE VECTOR INDEX vector_idx
ON items (doc)
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=immediate);
Step 2: Insert text data (no manual vectorization required)
When inserting text data, the system automatically performs embedding without manually calling the AI_EMBED function:
INSERT INTO items(id, doc) VALUES(1, 'Rose');
INSERT INTO items(id, doc) VALUES(2, 'Sunflower');
INSERT INTO items(id, doc) VALUES(3, 'Lily');
Step 3: Use text for direct retrieval
Use the semantic_distance function, pass in the original text for vector retrieval, without manually generating query vectors:
SELECT id, doc FROM items
ORDER BY semantic_distance(doc, 'flower')
APPROXIMATE LIMIT 3;
The following result is returned:
+----+-----------+
| id | doc |
+----+-----------+
| 1 | Rose |
| 2 | Sunflower |
| 3 | Lily |
+----+-----------+
3 rows in set
The system automatically converts the query text 'flower' to a vector and then retrieves the most similar text in the vector index.
Advanced: Use vector retrieval
If you already have vector representations of the retrieval content (for example, pre-generated through the AI_EMBED function), you can also directly use these vectors to retrieve hybrid vector indexes, avoiding repeated embedding operations for each retrieval:
-- First get the query vector
SET @query_vector = AI_EMBED("ob_embed", "flower");
-- Use vectors for index retrieval
SELECT id, doc FROM items
ORDER BY semantic_vector_distance(doc, @query_vector)
APPROXIMATE LIMIT 3;
The following result is returned:
+----+-----------+
| id | doc |
+----+-----------+
| 1 | Rose |
| 2 | Sunflower |
| 3 | Lily |
+----+-----------+
3 rows in set
Summary
Through this tutorial, you have mastered the core features of seekdb's hybrid vector index:
- Simplified usage process: Achieve semantic retrieval by directly storing text without manually converting to vectors.
- Automatic embedding: The system automatically converts text to vectors and builds indexes. During retrieval, you only need to provide the original text.
- Performance optimization: Supports direct vector retrieval to avoid repeated embedding operations.
The hybrid vector index feature greatly simplifies the usage process of vector retrieval and is an ideal choice for building intelligent search applications.
What's next
- Learn about vector index maintenance and monitoring
- Learn more about AI function service features
- Explore hybrid search to combine keyword matching and semantic understanding for more accurate and comprehensive search results.
More information
For more guides on experiencing seekdb's AI Native features and building AI applications based on seekdb, see:
- Experience vector search
- Experience full-text indexing
- Experience hybrid search
- Experience AI function service
- Experience the Vibe Coding paradigm with Cursor Agent + OceanBase MCP
- Build a knowledge base desktop application based on seekdb
- Build a cultural tourism assistant with multi-model integration based on seekdb
- Build an image search application based on seekdb
In addition to using SQL for operations, you can also use the Python SDK (pyseekdb) provided by seekdb. For usage instructions, see Experience embedded seekdb using Python SDK and pyseekdb overview.