Experience hybrid search in seekdb
This tutorial guides you through getting started with seekdb's hybrid search feature, demonstrating how hybrid search leverages the advantages of both full-text index keywords and vector index semantic search to help you better understand the practical applications of hybrid search.
Overview
Hybrid search combines vector-based semantic retrieval and full-text index-based keyword retrieval, providing more accurate and comprehensive retrieval results through comprehensive ranking. Vector search excels at semantic approximate matching but is weak at matching exact keywords, numbers, and proper nouns, while full-text retrieval effectively compensates for this deficiency. seekdb provides hybrid search functionality through the DBMS_HYBRID_SEARCH system package, supporting the following scenarios:
- Pure vector search: Find relevant content based on semantic similarity, suitable for semantic search, recommendation systems, and other scenarios.
- Pure full-text search: Find content based on keyword matching, suitable for document search, product search, and other scenarios.
- Hybrid search: Combines keyword matching and semantic understanding to provide more accurate and comprehensive search results.
This feature is widely used in intelligent search, document search, product recommendation, and other scenarios.
Prerequisites
- Contact the administrator to obtain the corresponding database connection string, then execute the following command to connect to the database:
- host: seekdb database connection IP.
- port: seekdb database connection port.
- database_name: Name of the database to access.
- user_name: Database username.
- password: Database password.
obclient -h$host -P$port -u$user_name -p$password -D$database_name - A test table has been created, and vector indexes and full-text indexes have been created in the table:
CREATE TABLE doc_table(
c1 INT,
vector VECTOR(3),
query VARCHAR(255),
content VARCHAR(255),
VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag),
FULLTEXT INDEX idx2(query),
FULLTEXT INDEX idx3(content)
);
INSERT INTO doc_table VALUES
(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");
Step 1: Pure vector search
Vector search finds semantically relevant content by calculating vector similarity, suitable for semantic search, recommendation systems, and other scenarios.
Set search parameters and use vector search to find records most similar to the query vector [1,2,3]:
SET @parm = '{
"knn" : {
"field": "vector",
"k": 3,
"query_vector": [1,2,3]
}
}';
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
The following result is returned:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"c1": 1,
"query": "hello world",
"_score": 1.0,
"vector": "[1,2,3]",
"content": "oceanbase Elasticsearch database"
},
{
"c1": 5,
"query": "real world, how old are you",
"_score": 0.41421356,
"vector": "[1,3,2]",
"content": "redis oracle database"
},
{
"c1": 2,
"query": "hello world, what is your name",
"_score": 0.33333333,
"vector": "[1,2,1]",
"content": "oceanbase mysql database"
}
] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
The results are sorted by vector similarity, where _score represents the similarity score. A higher score indicates greater similarity.
Step 2: Pure full-text search
Full-text search finds content through keyword matching, suitable for document search, product search, and other scenarios.
Set search parameters and use full-text search to find records containing keywords in the query and content fields:
SET @parm = '{
"query": {
"query_string": {
"fields": ["query", "content"],
"query": "hello oceanbase"
}
}
}';
SELECT JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
The following result is returned:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"c1": 1,
"query": "hello world",
"_score": 0.37162162162162166,
"vector": "[1,2,3]",
"content": "oceanbase Elasticsearch database"
},
{
"c1": 2,
"query": "hello world, what is your name",
"_score": 0.3503184713375797,
"vector": "[1,2,1]",
"content": "oceanbase mysql database"
},
{
"c1": 3,
"query": "hello world, how are you",
"_score": 0.3503184713375797,
"vector": "[1,1,1]",
"content": "oceanbase oracle database"
},
{
"c1": 6,
"query": "hello world, where are you from",
"_score": 0.3503184713375797,
"vector": "[2,1,1]",
"content": "starrocks oceanbase database"
}
] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
The results are sorted by keyword matching degree, where _score represents the matching score. A higher score indicates better matching.
Step 3: Hybrid search
Hybrid search combines keyword matching and semantic understanding to provide more accurate and comprehensive search results, leveraging the advantages of both full-text indexes and vector indexes.
Set search parameters to perform both full-text search and vector search simultaneously:
SET @parm = '{
"query": {
"query_string": {
"fields": ["query", "content"],
"query": "hello oceanbase"
}
},
"knn" : {
"field": "vector",
"k": 5,
"query_vector": [1,2,3]
}
}';
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
The following result is returned:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| JSON_PRETTY(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"c1": 1,
"query": "hello world",
"_score": 0.37162162162162166,
"vector": "[1,2,3]",
"content": "oceanbase Elasticsearch database"
},
{
"c1": 2,
"query": "hello world, what is your name",
"_score": 0.3503184713375797,
"vector": "[1,2,1]",
"content": "oceanbase mysql database"
},
{
"c1": 3,
"query": "hello world, how are you",
"_score": 0.3503184713375797,
"vector": "[1,1,1]",
"content": "oceanbase oracle database"
},
{
"c1": 6,
"query": "hello world, where are you from",
"_score": 0.3503184713375797,
"vector": "[2,1,1]",
"content": "starrocks oceanbase database"
}
] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
MySQL [test]> SET @parm = '{
'> "query": {
'> "query_string": {
'> "fields": ["query", "content"],
'> "query": "hello oceanbase"
'> }
'> },
'> "knn" : {
'> "field": "vector",
'> "k": 5,
'> "query_vector": [1,2,3]
'> }
'> }';
Query OK, 0 rows affected (0.00 sec)
MySQL [test]>
MySQL [test]> SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)) |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"c1": 1,
"query": "hello world",
"_score": 1.3716216216216217,
"vector": "[1,2,3]",
"content": "oceanbase Elasticsearch database"
},
{
"c1": 2,
"query": "hello world, what is your name",
"_score": 0.6836518013375796,
"vector": "[1,2,1]",
"content": "oceanbase mysql database"
},
{
"c1": 3,
"query": "hello world, how are you",
"_score": 0.6593354613375797,
"vector": "[1,1,1]",
"content": "oceanbase oracle database"
},
{
"c1": 5,
"query": "real world, how old are you",
"_score": 0.41421356,
"vector": "[1,3,2]",
"content": "redis oracle database"
},
{
"c1": 6,
"query": "hello world, where are you from",
"_score": 0.3503184713375797,
"vector": "[2,1,1]",
"content": "starrocks oceanbase database"
},
{
"c1": 4,
"query": "real world, where are you from",
"_score": 0.30901699,
"vector": "[1,3,1]",
"content": "postgres oracle database"
}
] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
The hybrid search results comprehensively consider the keyword matching score (_keyword_score) and semantic similarity score (_semantic_score). The final _score is the sum of these two, used to comprehensively rank the search results.
Parameter tuning
In hybrid search, you can adjust the weight ratio of full-text search and vector search through the boost parameter to optimize search results. For example, to increase the weight of full-text search:
SET @parm = '{
"query": {
"query_string": {
"fields": ["query", "content"],
"query": "hello oceanbase",
"boost": 2.0
}
},
"knn" : {
"field": "vector",
"k": 5,
"query_vector": [1,2,3],
"boost": 1.0
}
}';
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));
By adjusting the boost parameter, you can control the weight of keyword search and semantic search in the final ranking. For example, if you focus more on keyword matching, you can increase the boost value of query_string; if you focus more on semantic similarity, you can increase the boost value of knn.
Summary
Through this tutorial, you have mastered the core features of seekdb hybrid search:
- Pure vector search: Find relevant content through semantic similarity, suitable for semantic search scenarios.
- Pure full-text search: Find content through keyword matching, suitable for precise search scenarios.
- Hybrid search: Combines keywords and semantic understanding to provide more comprehensive and accurate search results.
The hybrid search feature is an ideal choice for processing massive unstructured data and building intelligent search and recommendation systems, significantly improving the accuracy and comprehensiveness of retrieval results.
What's next
- Explore AI function service features
- View hybrid vector index to simplify vector search processes
More information
For more guides on experiencing seekdb's AI Native features and building AI applications based on seekdb, see:
- Experience vector search
- Experience full-text indexing
- Experience AI function service
- Experience semantic indexing
- Experience the Vibe Coding paradigm with Cursor Agent + OceanBase MCP
- Build a knowledge base desktop application based on seekdb
- Build a cultural tourism assistant with multi-model integration based on seekdb
- Build an image search application based on seekdb
In addition to using SQL for operations, you can also use the Python SDK (pyseekdb) provided by seekdb. For usage instructions, see Experience embedded seekdb using Python SDK and pyseekdb overview.