向量索引混合搜索
本文档介绍了 seekdb 全文索引和向量索引的混合搜索。
混合搜索(Hybrid Search)结合了基于向量的语义搜索和基于全文索引的关键词搜索,通过综合排序提供更准确、全面的搜索结果。向量搜索擅长语义近似匹配,但对精确的关键字、数字和专有名词等匹配能力较弱,而全文搜索能有效弥补这一不足。因此,混合搜索已成为向量数据库的关键特性之一,广泛应用于各类产品中。seekdb 通过整合其全文和向量索引能力,实现了高效的混合查询。
使用
混合搜索功能通过新增的系统包 DBMS_HYBRID_SEARCH 提供,其中包含 2 个子函数:
| 成员方法名称 | 功能介绍 |
|---|---|
DBMS_HYBRID_SEARCH.SEARCH | 用于以 Json 格式返回搜索的结果,返回结果会按照相关性进行排序。 |
DBMS_HYBRID_SEARCH.GET_SQL | 以字符串形式返回实际执行的 SQL 语句。 |
详细语法和参数说明请参见 DBMS_HYBRID_SEARCH。
使用场景说明
创建示例表并插入数据
为了演示混合搜索功能,本节将创建以下几张示例表并插入数据,这些表将用于下文不同场景的搜索示例。
-
products表:一张基础商品信息表,用于演示普通标量搜索。它包含商品 ID、名称、描述、品牌、类别、标签、价格、库存、发布日期、是否在售以及一个向量字段vec。CREATE TABLE products (
`product_id` varchar(50) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`description` text DEFAULT NULL,
`brand` varchar(100) DEFAULT NULL,
`category` varchar(100) DEFAULT NULL,
`tags` varchar(255) DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`stock_quantity` int(11) DEFAULT NULL,
`release_date` datetime DEFAULT NULL,
`is_on_sale` tinyint(1) DEFAULT NULL,
`vec` VECTOR(4) DEFAULT NULL
);插入数据。
INSERT INTO products VALUES
('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),
('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]'),
('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat made from sustainable and eco-friendly materials.',
'NatureFirst', 'Sports', 'eco-friendly,health', 49.99, 200, '2023-04-22 00:00:00.000000', 0, '[0.1,0.9,0.3,0]'); -
products_fulltext表:在products表的基础上,为product_name、description和tags列创建了全文索引,用于演示全文搜索。CREATE TABLE products_fulltext (
product_id VARCHAR(50),
product_name VARCHAR(255),
description TEXT,
brand VARCHAR(100),
category VARCHAR(100),
tags VARCHAR(255),
price DECIMAL(10, 2),
stock_quantity INT,
release_date DATETIME,
is_on_sale TINYINT(1),
vec vector(4),
-- 在需要进行全文搜索的列上创建全文索引
FULLTEXT INDEX idx_product_name(product_name),
FULLTEXT INDEX idx_description(description),
FULLTEXT INDEX idx_tags(tags)
);插入数据。
INSERT INTO products_fulltext VALUES
('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),
('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]'),
('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat made from sustainable and eco-friendly materials.',
'NatureFirst', 'Sports', 'eco-friendly,health', 49.99, 200, '2023-04-22 00:00:00.000000', 0, '[0.1,0.9,0.3,0]'); -
doc_table表:一个包含标量列、向量列和全文索引列的文档表,用于演示全文搜索带标量过滤条件和混合搜索。CREATE TABLE doc_table(
c1 INT,
vector VECTOR(3),
query VARCHAR(255),
content VARCHAR(255),
VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag),
FULLTEXT INDEX idx2(query),
FULLTEXT INDEX idx3(content)
);插 入数据。
INSERT INTO doc_table VALUES
(1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),
(2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),
(3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),
(4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),
(5, '[1,3,2]', "real world, how old are you", "redis oracle database"),
(6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database"); -
products_vector表:与products表结构类似,但明确为vec列创建了向量索引,用于演示纯向量搜索。CREATE TABLE products_vector (
`product_id` varchar(50) DEFAULT NULL,
`product_name` varchar(255) DEFAULT NULL,
`description` text DEFAULT NULL,
`brand` varchar(100) DEFAULT NULL,
`category` varchar(100) DEFAULT NULL,
`tags` varchar(255) DEFAULT NULL,
`price` decimal(10,2) DEFAULT NULL,
`stock_quantity` int(11) DEFAULT NULL,
`release_date` datetime DEFAULT NULL,
`is_on_sale` tinyint(1) DEFAULT NULL,
`vec` VECTOR(4) DEFAULT NULL,
-- 在需要进行向量搜索的列上创建向量索引
VECTOR INDEX idx1(vec) WITH (distance=l2, type=hnsw, lib=vsag)
);插入数据。
INSERT INTO products_vector VALUES
('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),
('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]'),
('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat made from sustainable and eco-friendly materials.',
'NatureFirst', 'Sports', 'eco-friendly,health', 49.99, 200, '2023-04-22 00:00:00.000000', 0, '[0.1,0.9,0.3,0]'); -
products_multi_vector表:包含多个向量字段的表,用于演示多路向量搜索。CREATE TABLE products_multi_vector (
product_id VARCHAR(50),
product_name VARCHAR(255),
description TEXT,
vec1 VECTOR(4),
vec2 VECTOR(4),
vec3 VECTOR(4),
VECTOR INDEX idx1(vec1) WITH (distance=l2, type=hnsw, lib=vsag),
VECTOR INDEX idx2(vec2) WITH (distance=l2, type=hnsw, lib=vsag),
VECTOR INDEX idx3(vec3) WITH (distance=l2, type=hnsw, lib=vsag)
);插入数据。
INSERT INTO products_multi_vector VALUES
('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard', '[0.5,0.1,0.6,0.9]', '[0.2,0.3,0.4,0.5]', '[0.1,0.2,0.3,0.4]'),
('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset', '[0.1,0.9,0.2,0]', '[0.3,0.4,0.5,0.6]', '[0.2,0.3,0.4,0.5]'),
('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat', '[0.1,0.9,0.3,0]', '[0.4,0.5,0.6,0.7]', '[0.3,0.4,0.5,0.6]');
普通标量搜索
普通向量搜索的部分应用场景如下:
- 电商平台商品筛选:用户想要查看特定品牌的所有商品。例如,用户想要查看
GamerZone品牌的所有商品。 - 内容管理系统:管理员需要筛选特定分类的文章或文档。例如,查找特定作者的所有文章。
- 用户管理系统:查找特定状态或角色的用户。例如,查找所有 VIP 用户。
示例如下:
-
设置搜索参数。
SET @parm = '{
"query": {
"bool": {
"must": [
{"term": {"brand": "GamerZone"}}
]
}
}
}'; -
搜索所有
brand是"GamerZone"的记录。SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm));返回结果如下:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"vec": "[0.5,0.1,0.6,0.9]",
"tags": "best-seller,gaming-gear,rgb",
"brand": "GamerZone",
"price": 149.00,
"_score": 1,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-004",
"description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
"product_name": "Gamer-Pro Mechanical Keyboard",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
},
{
"vec": "[0.1,0.9,0.2,0]",
"tags": "best-seller,gaming-gear,audio",
"brand": "GamerZone",
"price": 149.00,
"_score": 1,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-009",
"description": "High-fidelity gaming headset with a noise-cancelling microphone.",
"product_name": "Gamer-Pro Headset",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
}
] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
普通标量的范围搜索
普通标量搜索的部分应用场景如下:
- 价格区间筛选:电商平台按价格范围筛选商品。例如,查找价格在
[30~80]区间的商品。 - 时间范围查询:查找特定时间段内的订单或日志。例如,查找最近 30 天的订单。
- 数值范围过滤:筛选评分、库存数量等数值范围。例如,查找评分在
[4~5]之间的商品。
示例如下:
-
设置搜索参数。
SET @parm = '{
"query": {
"range" : {
"price" : {
"gte" : 30,
"lte" : 80
}
}
}
}'; -
搜索所有
price在[30~80]区间的记录。SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm));返回结果如下:
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"vec": "[0.1,0.9,0.3,0]",
"tags": "eco-friendly,health",
"brand": "NatureFirst",
"price": 49.99,
"_score": true,
"category": "Sports",
"is_on_sale": 0,
"product_id": "prod-003",
"description": "A non-slip yoga mat made from sustainable and eco-friendly materials.",
"product_name": "Eco-Friendly Yoga Mat",
"release_date": "2023-04-22 00:00:00.000000",
"stock_quantity": 200
}
] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
全文搜索
全文搜索的部分应用场景如下:
- 文档搜索:在大量文档中搜索包含特定关键词的内容。例如,在 FAQ 中搜索包含
"如何使用"的文档。 - 产品搜索:根据产品名称、描述进行模糊搜索。例如,搜索包含
"数据库"的产品。 - 知识库搜索:在 FAQ、帮助 文档中搜索相关问题。例如,在客服系统的知识库中搜索相关问题的答案。
示例如下:
-
设置搜索参数。
SET @query_str_with_mini = '{
"query": {
"query_string": {
"type": "best_fields",
"fields": ["product_name^3", "description^2.5", "tags^1.5"],
"query": "Gamer-Pro^2 keyboard^1.5 audio^1.2",
"boost": 1.5
}
}
}'; -
搜索
product_name、description和tags字段中包含关键词"Gamer-Pro"、"keyboard"和"audio"的记录,并根据设置的字段和关键词权重进行排序。SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_fulltext', @query_str_with_mini));返回结果如下:
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_fulltext', @query_str_with_mini)) |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"vec": "[0.5,0.1,0.6,0.9]",
"tags": "best-seller,gaming-gear,rgb",
"brand": "GamerZone",
"price": 149.00,
"_score": 4.569735248749978,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-001",
"description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
"product_name": "Gamer-Pro Mechanical Keyboard",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
},
{
"vec": "[0.1,0.9,0.2,0]",
"tags": "best-seller,gaming-gear,audio",
"brand": "GamerZone",
"price": 149.00,
"_score": 1.7338881172399914,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-002",
"description": "High-fidelity gaming headset with a noise-cancelling microphone.",
"product_name": "Gamer-Pro Headset",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
}
] |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
全文搜索带标量过滤条件
全文搜索带标量过滤条件的部分应用场景如下:
- 精准搜索:在特定条件下进行文本搜索。例如,在已发布状态的文章中搜索特定关键词。
- 权限控制:在用户有权限的数据范围内进行搜索。例如,订单系统在特定时间段的订单中搜索商品信息。
- 分类搜索:在特定分类中进行关键词搜索。例如,用户系统在活跃用户中搜索特定用户信息。
示例如下:
-
设置搜索参数。
-- 过滤条件,指定标量过滤条件 c1 >= 2
SET @query_str = '{
"query": {
"bool" : {
"must" : [
{"query_string": {
"fields": ["query", "content"],
"query": "hello what oceanbase mysql"}
}
],
"filter" : [
{"range": {"c1": {"gte" : 2}}}
]
}
}
}'; -
搜索所有
c1大于等于 2 的记录。SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @query_str));返回结果如下:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @query_str)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"c1": 2,
"query": "hello world, what is your name",
"_score": 2.170969786679347,
"vector": "[1,2,1]",
"content": "oceanbase mysql database"
},
{
"c1": 3,
"query": "hello world, how are you",
"_score": 0.3503184713375797,
"vector": "[1,1,1]",
"content": "oceanbase oracle database"
},
{
"c1": 6,
"query": "hello world, where are you from",
"_score": 0.3503184713375797,
"vector": "[2,1,1]",
"content": "starrocks oceanbase database"
}
] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
向量搜索
向量搜索的部分应用场景如下:
- 语义搜索:根据语义相似性查找相关内容。例如,在知识库中查找语义相关的问题和答案。
- 推荐系统:基于用户偏好推荐相似商品。例如,在电商平台上推荐相似商品。
- 图像搜索:通过图像特征查找相似图片。例如,在图片库中查找相似图片。
- 智能问答:在知识库中查找语义相关的问题和答案。例如,在客服系统的知识库中查找语义相关的问题和答案。
示例如下:
-
设置搜索参数。
-- field 指定向量字段,k 指定返回结果的数量(最近的 k 个结果),query_vector 指定查询的向量
SET @parm = '{
"knn" : {
"field": "vec",
"k": 3,
"query_vector": [0.5,0.1,0.6,0.9]
}
}'; -
搜索所有
vec与[0.5,0.1,0.6,0.9]相似的记录。SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm));返回结果如下:
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm)) |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"vec": "[0.5,0.1,0.6,0.9]",
"tags": "best-seller,gaming-gear,rgb",
"brand": "GamerZone",
"price": 149.00,
"_score": 1.0,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-001",
"description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
"product_name": "Gamer-Pro Mechanical Keyboard",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
},
{
"vec": "[0.1,0.9,0.3,0]",
"tags": "eco-friendly,health",
"brand": "NatureFirst",
"price": 49.99,
"_score": 0.43405784,
"category": "Sports",
"is_on_sale": 0,
"product_id": "prod-003",
"description": "A non-slip yoga mat made from sustainable and eco-friendly materials.",
"product_name": "Eco-Friendly Yoga Mat",
"release_date": "2023-04-22 00:00:00.000000",
"stock_quantity": 200
},
{
"vec": "[0.1,0.9,0.2,0]",
"tags": "best-seller,gaming-gear,audio",
"brand": "GamerZone",
"price": 149.00,
"_score": 0.42910841,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-002",
"description": "High-fidelity gaming headset with a noise-cancelling microphone.",
"product_name": "Gamer-Pro Headset",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
}
] |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
向量搜索带标量过滤条件
向量搜索带标量过滤条件的部分应用场景如下:
- 精准搜索:在特定条件下进行文本搜索。例如,在已发布状态的文章中搜索特定关键词。
- 权限控制:在用户有权限的数据范围内进行搜索。例如,订单系统在特定时间段的订单中搜索商品信息。
- 分类搜索:在特定分类中进行关键词搜索。例如,用户系统在活跃用户中搜索特定用户信息。
示例如下:
-
设置搜索参数。
-- 指定标量过滤条件 brand = "GamerZone"
SET @parm = '{
"knn" : {
"field": "vec",
"k": 3,
"query_vector": [0.1,0.5,0.3,0.7],
"filter" : [
{"term" : {"brand": "GamerZone"} }
]
}
}'; -
搜索所有
vec与[0.1,0.5,0.3,0.7]相似的记录,并且brand是"GamerZone"的记录。SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm));返回结果如下:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"vec": "[0.5,0.1,0.6,0.9]",
"tags": "best-seller,gaming-gear,rgb",
"brand": "GamerZone",
"price": 149.00,
"_score": 0.59850837,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-001",
"description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
"product_name": "Gamer-Pro Mechanical Keyboard",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
},
{
"vec": "[0.1,0.9,0.2,0]",
"tags": "best-seller,gaming-gear,audio",
"brand": "GamerZone",
"price": 149.00,
"_score": 0.55175342,
"category": "Gaming",
"is_on_sale": 1,
"product_id": "prod-002",
"description": "High-fidelity gaming headset with a noise-cancelling microphone.",
"product_name": "Gamer-Pro Headset",
"release_date": "2023-07-20 00:00:00.000000",
"stock_quantity": 100
}
] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
多路向量搜索
多路向量搜索指在多个向量索引中进行搜索,并返回最相似的记录。
示例如下:
-
设置搜索参数。
-- 指定 3 路向量查询,每路查询指定向量索引字段,返回结果数量和查询向量
SET @param_multi_knn = '{
"knn" : [{
"field": "vec1",
"k": 5,
"query_vector": [0.5,0.1,0.6,0.9]
},
{
"field": "vec2",
"k": 5,
"query_vector": [0.2,0.3,0.4,0.5]
},
{
"field": "vec3",
"k": 5,
"query_vector": [0.1,0.2,0.3,0.4]
}
],
"size" : 5
}'; -
执行查询并返回查询结果。
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_multi_vector', @param_multi_knn));+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_multi_vector', @param_multi_knn)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"vec1": "[0.5,0.1,0.6,0.9]",
"vec2": "[0.2,0.3,0.4,0.5]",
"vec3": "[0.1,0.2,0.3,0.4]",
"_score": 3.0,
"product_id": "prod-001",
"description": "A responsive mechanical keyboard",
"product_name": "Gamer-Pro Mechanical Keyboard"
},
{
"vec1": "[0.1,0.9,0.2,0]",
"vec2": "[0.3,0.4,0.5,0.6]",
"vec3": "[0.2,0.3,0.4,0.5]",
"_score": 2.0957750699999997,
"product_id": "prod-002",
"description": "High-fidelity gaming headset",
"product_name": "Gamer-Pro Headset"
},
{
"vec1": "[0.1,0.9,0.3,0]",
"vec2": "[0.4,0.5,0.6,0.7]",
"vec3": "[0.3,0.4,0.5,0.6]",
"_score": 1.86262927,
"product_id": "prod-003",
"description": "A non-slip yoga mat",
"product_name": "Eco-Friendly Yoga Mat"
}
] |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set
全文与向量混合搜索
全文与向量混合搜索的部分应用场景如下:
- 智能搜索:结合关键词和语义理解的综合搜索。例如,用户输入
"我需要一个游戏键盘",系统既匹配"游戏"、"键盘"关键词,又理解"游戏设备"的语义。 - 文档搜索:在大量文档中既支持精确关键词匹配,又支持语义理解。例如,搜索
"数据库优化",既匹配包含这些词的文档,又找到关于"性能调优"、"查询优化"等语义相关的内容。 - 产品推荐:电商平台既支持商品名称搜索,又支持需求描述搜索。例如,根据用户描述
"适合办公的笔记本电脑",既匹配关键词,又理解"商务办公"的语义需求。
示例如下:
-
设置搜索参数。
SET @parm = '{
"query": {
"bool": {
"should": [
{"match": {"query": "hi hello"}},
{"match": { "content": "oceanbase mysql" }}
]
}
},
"knn" : {
"field": "vector",
"k": 5,
"query_vector": [1,2,3]
},
"_source" : ["query", "content", "_keyword_score", "_semantic_score"]
}'; -
执行查询并返回查询结果。
SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm));返回结果如下:
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| json_pretty(dbms_hybrid_search.search('doc_table', @parm)) |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
{
"query": "hello world, what is your name",
"_score": 2.835628417884166,
"content": "oceanbase mysql database",
"_keyword_score": 2.5022950878841663,
"_semantic_score": 0.33333333
},
{
"query": "hello world",
"_score": 1.7219400929592013,
"content": "oceanbase Elasticsearch database",
"_keyword_score": 0.7219400929592014,
"_semantic_score": 1.0
},
{
"query": "hello world, how are you",
"_score": 1.0096539326751595,
"content": "oceanbase oracle database",
"_keyword_score": 0.7006369426751594,
"_semantic_score": 0.30901699
},
{
"query": "real world, how old are you",
"_score": 0.41421356,
"content": "redis oracle database",
"_keyword_score": null,
"_semantic_score": 0.41421356
},
{
"query": "real world, where are you from",
"_score": 0.30901699,
"content": "postgres oracle database",
"_keyword_score": null,
"_semantic_score": 0.30901699
}
] |
全文与向量 RRF 混合搜索
全文子查询和向量子查询的结果集默认采用加权混合。你可以通过 Rank 语法将融合方式配置为 RRF(Reciprocal Rank Fusion)排序混合。部分应用场景如下:
- 多维度排序:需要综合考虑多个搜索维度的结果。例如学术搜索系统,在论文库中搜索,既要考虑关键词匹配度,又要考虑语义相关性。
- 公平性要求:确保不同搜索方式的结果都能得到合理展示。例如,在电商平台上,既要考虑商品的标题、描述等文本信息,又要考虑商品的图片、视频等视觉信息。
- 复杂查询:涉及多个查询条件的复杂搜索场景。例如,医疗系统中,既要考虑患者的症状描述,又要考虑患者的病史、检查结果等。
示例如下:
设置搜索参数。
SET @rrf_query_param = '{
"query": {
"query_string": {
"fields": ["title", "author", "description"],
"query": "fiction American Dream"
}
},
"knn" : {
"field": "vector_embedding",
"k": 5,
"query_vector": [0.1, 0.2, 0.3, 0.4]
},
"rank" : {
"rrf" : {
"rank_window_size" : 10,
"rank_constant" : 60
}
}
}';
RRF 算法通过融合多个子查询结果集的排名,计算最终的相关性分数。计算公式如下:
score = 0.0
for q in queries:
if d in result(q):
score += 1.0 / ( k + rank( result(q), d ) ) # K 常量是配置的 rank_constant
return score
总结
本文所提示例展示了混合搜索功能的强大应用价值:
- 智能搜索升级:在传统关键词搜索基础上融入语义理 解,提供更精准、更符合用户意图的搜索结果。
- 优化用户体验:支持自然语言查询,简化操作,提升信息获取效率。
- 赋能多样业务:广泛应用于电商、内容管理、知识库、智能客服等场景,实现从基础筛选到智能推荐的全面覆盖。
- 融合技术优势:结合精确匹配与语义理解,显著提升搜索结果的准确性和全面性。
混合搜索功能是处理海量非结构化数据、构建智能搜索与推荐系统的理想选择。