跳到主要内容

向量索引混合搜索

本文档介绍了 seekdb 全文索引和向量索引的混合搜索。

混合搜索(Hybrid Search)结合了基于向量的语义搜索和基于全文索引的关键词搜索,通过综合排序提供更准确、全面的搜索结果。向量搜索擅长语义近似匹配,但对精确的关键字、数字和专有名词等匹配能力较弱,而全文搜索能有效弥补这一不足。因此,混合搜索已成为向量数据库的关键特性之一,广泛应用于各类产品中。seekdb 通过整合其全文和向量索引能力,实现了高效的混合查询。

使用

混合搜索功能通过新增的系统包 DBMS_HYBRID_SEARCH 提供,其中包含 2 个子函数:

成员方法名称功能介绍
DBMS_HYBRID_SEARCH.SEARCH用于以 Json 格式返回搜索的结果,返回结果会按照相关性进行排序。
DBMS_HYBRID_SEARCH.GET_SQL以字符串形式返回实际执行的 SQL 语句。

详细语法和参数说明请参见 DBMS_HYBRID_SEARCH

使用场景说明

创建示例表并插入数据

为了演示混合搜索功能,本节将创建以下几张示例表并插入数据,这些表将用于下文不同场景的搜索示例。

  • products:一张基础商品信息表,用于演示普通标量搜索。它包含商品 ID、名称、描述、品牌、类别、标签、价格、库存、发布日期、是否在售以及一个向量字段 vec

    CREATE TABLE products (
    `product_id` varchar(50) DEFAULT NULL,
    `product_name` varchar(255) DEFAULT NULL,
    `description` text DEFAULT NULL,
    `brand` varchar(100) DEFAULT NULL,
    `category` varchar(100) DEFAULT NULL,
    `tags` varchar(255) DEFAULT NULL,
    `price` decimal(10,2) DEFAULT NULL,
    `stock_quantity` int(11) DEFAULT NULL,
    `release_date` datetime DEFAULT NULL,
    `is_on_sale` tinyint(1) DEFAULT NULL,
    `vec` VECTOR(4) DEFAULT NULL
    );

    插入数据。

    INSERT INTO products VALUES
    ('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
    'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),

    ('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
    'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]'),

    ('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat made from sustainable and eco-friendly materials.',
    'NatureFirst', 'Sports', 'eco-friendly,health', 49.99, 200, '2023-04-22 00:00:00.000000', 0, '[0.1,0.9,0.3,0]');
  • products_fulltext:在 products 表的基础上,为 product_namedescriptiontags 列创建了全文索引,用于演示全文搜索。

    CREATE TABLE products_fulltext (
    product_id VARCHAR(50),
    product_name VARCHAR(255),
    description TEXT,
    brand VARCHAR(100),
    category VARCHAR(100),
    tags VARCHAR(255),
    price DECIMAL(10, 2),
    stock_quantity INT,
    release_date DATETIME,
    is_on_sale TINYINT(1),
    vec vector(4),
    -- 在需要进行全文搜索的列上创建全文索引
    FULLTEXT INDEX idx_product_name(product_name),
    FULLTEXT INDEX idx_description(description),
    FULLTEXT INDEX idx_tags(tags)
    );

    插入数据。

    INSERT INTO products_fulltext VALUES
    ('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
    'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),

    ('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
    'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]'),

    ('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat made from sustainable and eco-friendly materials.',
    'NatureFirst', 'Sports', 'eco-friendly,health', 49.99, 200, '2023-04-22 00:00:00.000000', 0, '[0.1,0.9,0.3,0]');
  • doc_table:一个包含标量列、向量列和全文索引列的文档表,用于演示全文搜索带标量过滤条件和混合搜索。

    CREATE TABLE doc_table(
    c1 INT,
    vector VECTOR(3),
    query VARCHAR(255),
    content VARCHAR(255),
    VECTOR INDEX idx1(vector) WITH (distance=l2, type=hnsw, lib=vsag),
    FULLTEXT INDEX idx2(query),
    FULLTEXT INDEX idx3(content)
    );

    插入数据。

    INSERT INTO doc_table VALUES
    (1, '[1,2,3]', "hello world", "oceanbase Elasticsearch database"),

    (2, '[1,2,1]', "hello world, what is your name", "oceanbase mysql database"),

    (3, '[1,1,1]', "hello world, how are you", "oceanbase oracle database"),

    (4, '[1,3,1]', "real world, where are you from", "postgres oracle database"),

    (5, '[1,3,2]', "real world, how old are you", "redis oracle database"),

    (6, '[2,1,1]', "hello world, where are you from", "starrocks oceanbase database");
  • products_vector:与 products 表结构类似,但明确为 vec 列创建了向量索引,用于演示纯向量搜索。

    CREATE TABLE products_vector (
    `product_id` varchar(50) DEFAULT NULL,
    `product_name` varchar(255) DEFAULT NULL,
    `description` text DEFAULT NULL,
    `brand` varchar(100) DEFAULT NULL,
    `category` varchar(100) DEFAULT NULL,
    `tags` varchar(255) DEFAULT NULL,
    `price` decimal(10,2) DEFAULT NULL,
    `stock_quantity` int(11) DEFAULT NULL,
    `release_date` datetime DEFAULT NULL,
    `is_on_sale` tinyint(1) DEFAULT NULL,
    `vec` VECTOR(4) DEFAULT NULL,
    -- 在需要进行向量搜索的列上创建向量索引
    VECTOR INDEX idx1(vec) WITH (distance=l2, type=hnsw, lib=vsag)
    );

    插入数据。

    INSERT INTO products_vector VALUES
    ('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.',
    'GamerZone', 'Gaming', 'best-seller,gaming-gear,rgb', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.5,0.1,0.6,0.9]'),

    ('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset with a noise-cancelling microphone.',
    'GamerZone', 'Gaming', 'best-seller,gaming-gear,audio', 149.00, 100, '2023-07-20 00:00:00.000000', 1, '[0.1,0.9,0.2,0]'),

    ('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat made from sustainable and eco-friendly materials.',
    'NatureFirst', 'Sports', 'eco-friendly,health', 49.99, 200, '2023-04-22 00:00:00.000000', 0, '[0.1,0.9,0.3,0]');
  • products_multi_vector:包含多个向量字段的表,用于演示多路向量搜索。

    CREATE TABLE products_multi_vector (
    product_id VARCHAR(50),
    product_name VARCHAR(255),
    description TEXT,
    vec1 VECTOR(4),
    vec2 VECTOR(4),
    vec3 VECTOR(4),
    VECTOR INDEX idx1(vec1) WITH (distance=l2, type=hnsw, lib=vsag),
    VECTOR INDEX idx2(vec2) WITH (distance=l2, type=hnsw, lib=vsag),
    VECTOR INDEX idx3(vec3) WITH (distance=l2, type=hnsw, lib=vsag)
    );

    插入数据。

    INSERT INTO products_multi_vector VALUES
    ('prod-001', 'Gamer-Pro Mechanical Keyboard', 'A responsive mechanical keyboard', '[0.5,0.1,0.6,0.9]', '[0.2,0.3,0.4,0.5]', '[0.1,0.2,0.3,0.4]'),
    ('prod-002', 'Gamer-Pro Headset', 'High-fidelity gaming headset', '[0.1,0.9,0.2,0]', '[0.3,0.4,0.5,0.6]', '[0.2,0.3,0.4,0.5]'),
    ('prod-003', 'Eco-Friendly Yoga Mat', 'A non-slip yoga mat', '[0.1,0.9,0.3,0]', '[0.4,0.5,0.6,0.7]', '[0.3,0.4,0.5,0.6]');

普通标量搜索

普通向量搜索的部分应用场景如下:

  • 电商平台商品筛选:用户想要查看特定品牌的所有商品。例如,用户想要查看 GamerZone 品牌的所有商品。
  • 内容管理系统:管理员需要筛选特定分类的文章或文档。例如,查找特定作者的所有文章。
  • 用户管理系统:查找特定状态或角色的用户。例如,查找所有 VIP 用户。

示例如下:

  1. 设置搜索参数。

    SET @parm = '{
    "query": {
    "bool": {
    "must": [
    {"term": {"brand": "GamerZone"}}
    ]
    }
    }
    }';
  2. 搜索所有 brand"GamerZone" 的记录。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)); 

    返回结果如下:

    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)) |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "vec": "[0.5,0.1,0.6,0.9]",
    "tags": "best-seller,gaming-gear,rgb",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 1,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-004",
    "description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
    "product_name": "Gamer-Pro Mechanical Keyboard",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    },
    {
    "vec": "[0.1,0.9,0.2,0]",
    "tags": "best-seller,gaming-gear,audio",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 1,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-009",
    "description": "High-fidelity gaming headset with a noise-cancelling microphone.",
    "product_name": "Gamer-Pro Headset",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    }
    ] |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

普通标量的范围搜索

普通标量搜索的部分应用场景如下:

  • 价格区间筛选:电商平台按价格范围筛选商品。例如,查找价格在 [30~80] 区间的商品。
  • 时间范围查询:查找特定时间段内的订单或日志。例如,查找最近 30 天的订单。
  • 数值范围过滤:筛选评分、库存数量等数值范围。例如,查找评分在 [4~5] 之间的商品。

示例如下:

  1. 设置搜索参数。

    SET @parm = '{
    "query": {
    "range" : {
    "price" : {
    "gte" : 30,
    "lte" : 80
    }
    }
    }
    }';
  2. 搜索所有 price[30~80] 区间的记录。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)); 

    返回结果如下:

    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products', @parm)) |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "vec": "[0.1,0.9,0.3,0]",
    "tags": "eco-friendly,health",
    "brand": "NatureFirst",
    "price": 49.99,
    "_score": true,
    "category": "Sports",
    "is_on_sale": 0,
    "product_id": "prod-003",
    "description": "A non-slip yoga mat made from sustainable and eco-friendly materials.",
    "product_name": "Eco-Friendly Yoga Mat",
    "release_date": "2023-04-22 00:00:00.000000",
    "stock_quantity": 200
    }
    ] |
    +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

全文搜索

全文搜索的部分应用场景如下:

  • 文档搜索:在大量文档中搜索包含特定关键词的内容。例如,在 FAQ 中搜索包含 "如何使用" 的文档。
  • 产品搜索:根据产品名称、描述进行模糊搜索。例如,搜索包含 "数据库" 的产品。
  • 知识库搜索:在 FAQ、帮助文档中搜索相关问题。例如,在客服系统的知识库中搜索相关问题的答案。

示例如下:

  1. 设置搜索参数。

    SET @query_str_with_mini = '{
    "query": {
    "query_string": {
    "type": "best_fields",
    "fields": ["product_name^3", "description^2.5", "tags^1.5"],
    "query": "Gamer-Pro^2 keyboard^1.5 audio^1.2",
    "boost": 1.5
    }
    }
    }';
  2. 搜索 product_namedescriptiontags 字段中包含关键词 "Gamer-Pro""keyboard""audio" 的记录,并根据设置的字段和关键词权重进行排序。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_fulltext', @query_str_with_mini));

    返回结果如下:

    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_fulltext', @query_str_with_mini)) |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "vec": "[0.5,0.1,0.6,0.9]",
    "tags": "best-seller,gaming-gear,rgb",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 4.569735248749978,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-001",
    "description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
    "product_name": "Gamer-Pro Mechanical Keyboard",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    },
    {
    "vec": "[0.1,0.9,0.2,0]",
    "tags": "best-seller,gaming-gear,audio",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 1.7338881172399914,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-002",
    "description": "High-fidelity gaming headset with a noise-cancelling microphone.",
    "product_name": "Gamer-Pro Headset",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    }
    ] |
    +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

全文搜索带标量过滤条件

全文搜索带标量过滤条件的部分应用场景如下:

  • 精准搜索:在特定条件下进行文本搜索。例如,在已发布状态的文章中搜索特定关键词。
  • 权限控制:在用户有权限的数据范围内进行搜索。例如,订单系统在特定时间段的订单中搜索商品信息。
  • 分类搜索:在特定分类中进行关键词搜索。例如,用户系统在活跃用户中搜索特定用户信息。

示例如下:

  1. 设置搜索参数。

    -- 过滤条件,指定标量过滤条件 c1 >= 2
    SET @query_str = '{
    "query": {
    "bool" : {
    "must" : [
    {"query_string": {
    "fields": ["query", "content"],
    "query": "hello what oceanbase mysql"}
    }
    ],
    "filter" : [
    {"range": {"c1": {"gte" : 2}}}
    ]
    }
    }
    }';
  2. 搜索所有 c1 大于等于 2 的记录。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @query_str));

    返回结果如下:

    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @query_str)) |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "c1": 2,
    "query": "hello world, what is your name",
    "_score": 2.170969786679347,
    "vector": "[1,2,1]",
    "content": "oceanbase mysql database"
    },
    {
    "c1": 3,
    "query": "hello world, how are you",
    "_score": 0.3503184713375797,
    "vector": "[1,1,1]",
    "content": "oceanbase oracle database"
    },
    {
    "c1": 6,
    "query": "hello world, where are you from",
    "_score": 0.3503184713375797,
    "vector": "[2,1,1]",
    "content": "starrocks oceanbase database"
    }
    ] |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

向量搜索

向量搜索的部分应用场景如下:

  • 语义搜索:根据语义相似性查找相关内容。例如,在知识库中查找语义相关的问题和答案。
  • 推荐系统:基于用户偏好推荐相似商品。例如,在电商平台上推荐相似商品。
  • 图像搜索:通过图像特征查找相似图片。例如,在图片库中查找相似图片。
  • 智能问答:在知识库中查找语义相关的问题和答案。例如,在客服系统的知识库中查找语义相关的问题和答案。

示例如下:

  1. 设置搜索参数。

    -- field 指定向量字段,k 指定返回结果的数量(最近的 k 个结果),query_vector 指定查询的向量
    SET @parm = '{
    "knn" : {
    "field": "vec",
    "k": 3,
    "query_vector": [0.5,0.1,0.6,0.9]
    }
    }';
  2. 搜索所有 vec[0.5,0.1,0.6,0.9] 相似的记录。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm));

    返回结果如下:

    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm)) |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "vec": "[0.5,0.1,0.6,0.9]",
    "tags": "best-seller,gaming-gear,rgb",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 1.0,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-001",
    "description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
    "product_name": "Gamer-Pro Mechanical Keyboard",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    },
    {
    "vec": "[0.1,0.9,0.3,0]",
    "tags": "eco-friendly,health",
    "brand": "NatureFirst",
    "price": 49.99,
    "_score": 0.43405784,
    "category": "Sports",
    "is_on_sale": 0,
    "product_id": "prod-003",
    "description": "A non-slip yoga mat made from sustainable and eco-friendly materials.",
    "product_name": "Eco-Friendly Yoga Mat",
    "release_date": "2023-04-22 00:00:00.000000",
    "stock_quantity": 200
    },
    {
    "vec": "[0.1,0.9,0.2,0]",
    "tags": "best-seller,gaming-gear,audio",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 0.42910841,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-002",
    "description": "High-fidelity gaming headset with a noise-cancelling microphone.",
    "product_name": "Gamer-Pro Headset",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    }
    ] |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

向量搜索带标量过滤条件

向量搜索带标量过滤条件的部分应用场景如下:

  • 精准搜索:在特定条件下进行文本搜索。例如,在已发布状态的文章中搜索特定关键词。
  • 权限控制:在用户有权限的数据范围内进行搜索。例如,订单系统在特定时间段的订单中搜索商品信息。
  • 分类搜索:在特定分类中进行关键词搜索。例如,用户系统在活跃用户中搜索特定用户信息。

示例如下:

  1. 设置搜索参数。

    -- 指定标量过滤条件 brand = "GamerZone"
    SET @parm = '{
    "knn" : {
    "field": "vec",
    "k": 3,
    "query_vector": [0.1,0.5,0.3,0.7],
    "filter" : [
    {"term" : {"brand": "GamerZone"} }
    ]
    }
    }';
  2. 搜索所有 vec[0.1,0.5,0.3,0.7] 相似的记录,并且 brand"GamerZone" 的记录。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm));

    返回结果如下:

    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_vector', @parm)) |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "vec": "[0.5,0.1,0.6,0.9]",
    "tags": "best-seller,gaming-gear,rgb",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 0.59850837,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-001",
    "description": "A responsive mechanical keyboard with customizable RGB lighting for the ultimate gaming experience.",
    "product_name": "Gamer-Pro Mechanical Keyboard",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    },
    {
    "vec": "[0.1,0.9,0.2,0]",
    "tags": "best-seller,gaming-gear,audio",
    "brand": "GamerZone",
    "price": 149.00,
    "_score": 0.55175342,
    "category": "Gaming",
    "is_on_sale": 1,
    "product_id": "prod-002",
    "description": "High-fidelity gaming headset with a noise-cancelling microphone.",
    "product_name": "Gamer-Pro Headset",
    "release_date": "2023-07-20 00:00:00.000000",
    "stock_quantity": 100
    }
    ] |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

多路向量搜索

多路向量搜索指在多个向量索引中进行搜索,并返回最相似的记录。

示例如下:

  1. 设置搜索参数。

    -- 指定 3 路向量查询,每路查询指定向量索引字段,返回结果数量和查询向量
    SET @param_multi_knn = '{
    "knn" : [{
    "field": "vec1",
    "k": 5,
    "query_vector": [0.5,0.1,0.6,0.9]
    },
    {
    "field": "vec2",
    "k": 5,
    "query_vector": [0.2,0.3,0.4,0.5]
    },
    {
    "field": "vec3",
    "k": 5,
    "query_vector": [0.1,0.2,0.3,0.4]
    }
    ],
    "size" : 5
    }';
  2. 执行查询并返回查询结果。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_multi_vector', @param_multi_knn));
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(DBMS_HYBRID_SEARCH.SEARCH('products_multi_vector', @param_multi_knn)) |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "vec1": "[0.5,0.1,0.6,0.9]",
    "vec2": "[0.2,0.3,0.4,0.5]",
    "vec3": "[0.1,0.2,0.3,0.4]",
    "_score": 3.0,
    "product_id": "prod-001",
    "description": "A responsive mechanical keyboard",
    "product_name": "Gamer-Pro Mechanical Keyboard"
    },
    {
    "vec1": "[0.1,0.9,0.2,0]",
    "vec2": "[0.3,0.4,0.5,0.6]",
    "vec3": "[0.2,0.3,0.4,0.5]",
    "_score": 2.0957750699999997,
    "product_id": "prod-002",
    "description": "High-fidelity gaming headset",
    "product_name": "Gamer-Pro Headset"
    },
    {
    "vec1": "[0.1,0.9,0.3,0]",
    "vec2": "[0.4,0.5,0.6,0.7]",
    "vec3": "[0.3,0.4,0.5,0.6]",
    "_score": 1.86262927,
    "product_id": "prod-003",
    "description": "A non-slip yoga mat",
    "product_name": "Eco-Friendly Yoga Mat"
    }
    ] |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    1 row in set

全文与向量混合搜索

全文与向量混合搜索的部分应用场景如下:

  • 智能搜索:结合关键词和语义理解的综合搜索。例如,用户输入 "我需要一个游戏键盘",系统既匹配 "游戏""键盘" 关键词,又理解 "游戏设备" 的语义。
  • 文档搜索:在大量文档中既支持精确关键词匹配,又支持语义理解。例如,搜索 "数据库优化",既匹配包含这些词的文档,又找到关于 "性能调优""查询优化" 等语义相关的内容。
  • 产品推荐:电商平台既支持商品名称搜索,又支持需求描述搜索。例如,根据用户描述 "适合办公的笔记本电脑",既匹配关键词,又理解 "商务办公" 的语义需求。

示例如下:

  1. 设置搜索参数。

    SET @parm = '{
    "query": {
    "bool": {
    "should": [
    {"match": {"query": "hi hello"}},
    {"match": { "content": "oceanbase mysql" }}
    ]
    }
    },
    "knn" : {
    "field": "vector",
    "k": 5,
    "query_vector": [1,2,3]
    },
    "_source" : ["query", "content", "_keyword_score", "_semantic_score"]
    }';
  2. 执行查询并返回查询结果。

    SELECT json_pretty(DBMS_HYBRID_SEARCH.SEARCH('doc_table', @parm)); 

    返回结果如下:

    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | json_pretty(dbms_hybrid_search.search('doc_table', @parm)) |
    +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | [
    {
    "query": "hello world, what is your name",
    "_score": 2.835628417884166,
    "content": "oceanbase mysql database",
    "_keyword_score": 2.5022950878841663,
    "_semantic_score": 0.33333333
    },
    {
    "query": "hello world",
    "_score": 1.7219400929592013,
    "content": "oceanbase Elasticsearch database",
    "_keyword_score": 0.7219400929592014,
    "_semantic_score": 1.0
    },
    {
    "query": "hello world, how are you",
    "_score": 1.0096539326751595,
    "content": "oceanbase oracle database",
    "_keyword_score": 0.7006369426751594,
    "_semantic_score": 0.30901699
    },
    {
    "query": "real world, how old are you",
    "_score": 0.41421356,
    "content": "redis oracle database",
    "_keyword_score": null,
    "_semantic_score": 0.41421356
    },
    {
    "query": "real world, where are you from",
    "_score": 0.30901699,
    "content": "postgres oracle database",
    "_keyword_score": null,
    "_semantic_score": 0.30901699
    }
    ] |

全文与向量 RRF 混合搜索

全文子查询和向量子查询的结果集默认采用加权混合。你可以通过 Rank 语法将融合方式配置为 RRF(Reciprocal Rank Fusion)排序混合。部分应用场景如下:

  • 多维度排序:需要综合考虑多个搜索维度的结果。例如学术搜索系统,在论文库中搜索,既要考虑关键词匹配度,又要考虑语义相关性。
  • 公平性要求:确保不同搜索方式的结果都能得到合理展示。例如,在电商平台上,既要考虑商品的标题、描述等文本信息,又要考虑商品的图片、视频等视觉信息。
  • 复杂查询:涉及多个查询条件的复杂搜索场景。例如,医疗系统中,既要考虑患者的症状描述,又要考虑患者的病史、检查结果等。

示例如下:

设置搜索参数。

SET @rrf_query_param = '{
"query": {
"query_string": {
"fields": ["title", "author", "description"],
"query": "fiction American Dream"
}
},
"knn" : {
"field": "vector_embedding",
"k": 5,
"query_vector": [0.1, 0.2, 0.3, 0.4]
},
"rank" : {
"rrf" : {
"rank_window_size" : 10,
"rank_constant" : 60
}
}
}';

RRF 算法通过融合多个子查询结果集的排名,计算最终的相关性分数。计算公式如下:

score = 0.0
for q in queries:
if d in result(q):
score += 1.0 / ( k + rank( result(q), d ) ) # K 常量是配置的 rank_constant
return score

总结

本文所提示例展示了混合搜索功能的强大应用价值:

  • 智能搜索升级:在传统关键词搜索基础上融入语义理解,提供更精准、更符合用户意图的搜索结果。
  • 优化用户体验:支持自然语言查询,简化操作,提升信息获取效率。
  • 赋能多样业务:广泛应用于电商、内容管理、知识库、智能客服等场景,实现从基础筛选到智能推荐的全面覆盖。
  • 融合技术优势:结合精确匹配与语义理解,显著提升搜索结果的准确性和全面性。

混合搜索功能是处理海量非结构化数据、构建智能搜索与推荐系统的理想选择。

相关文档