版本：V1.1.0

hybrid_search - 混合搜索

hybrid_search() 用于将全文搜索和向量相似度搜索与排名相结合。

信息

仅支持在使用 Client 连接时，才能使用该接口。关于 Client 的详细介绍，参见 Client。

前提条件

您已经安装了 pyseekdb，有关安装 pyseekdb 的详细信息，参见快速开始。
您已经连接到数据库。有关连接的详细操作参见 Client。
您已经创建了 collection，并插入相应的数据。有关创建 collection 和插入数据的详细操作参见 create_collection - 创建 Collection 和 add - 插入数据。

请求参数

hybrid_search(
    query={
        "where_document": ,
        "where": ,
        "n_results": 
    },
    knn={
        "query_texts": 
        "where":
        "n_results": 
    },
    rank=,  
    n_results=,
    include=
)

query：全文搜索配置，包括以下几个参数：

参数	取值类型	是否必选	描述	取值示例
`where`	dict	可选	Metadata 筛选条件。	`{"category": {"$eq": "AI"}}`
`where_document`	dict	可选	Document 筛选条件。	`{"$contains": "machine"}`
`n_results`	int	必须	全文搜索的结果数

knn：Vector 搜索配置，包括以下几个参数：

参数	取值类型	是否必选	描述	取值示例
`query_embeddings`	List[float] or List[List[float]]	必选	用于批量查询的单个向量或向量列表；如果提供，则直接使用（忽略embedding_function）；如果没有提供，则必须提供 `query_text`，`collection` 必须具有 `embedding_function`	[1.0, 2.0, 3.0]
`query_texts`	str or List[str]	可选	单个 vectors 或 vectors 列表；如果提供，则直接使用（忽略 `embedding_function`）；如果没有提供，则必须提供 `documents`，同时 `collection` 必须具有 `embedding_function`。	["my query text"]
`where`	dict	可选	Metadata 筛选条件。	`{"category": {"$eq": "AI"}}`
`n_results`	int	必须	vector 搜索的结果数

其他参数如下：

参数	取值类型	是否必选	描述	取值示例
`rank`	dict	可选	排名配置，例如：`{"rrf": {"rank_window_size": 60, "rank_constant": 60}}`	`{"category": {"$eq": "AI"}}`
`n_results`	int	必须	返回相似的结果数，默认值为 10	3
`include`	List[str]	可选	要包含的字段列表：`["documents", "metadatas", "embeddings"]`。	["documents", "metadatas", "embeddings"]

信息

使用的 embedding_function 是与 collection 相关联的（在 create_collection() 或 get_collection() 期间设置）。您不能每次操作都覆盖它。

请求示例

import pyseekdb

# Create a client
client = pyseekdb.Client()

collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")

# Hybrid search with query_embeddings (embedding_function not used)
results = collection.hybrid_search(
    query={
        "where_document": {"$contains": "machine learning"},
        "n_results": 10
    },
    knn={
        "query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]],  # Used directly
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=5
)

# Hybrid search with both full-text and vector search (using query_texts)
results = collection1.hybrid_search(
    query={
        "where_document": {"$contains": "machine learning"},
        "where": {"category": {"$eq": "science"}},
        "n_results": 10
    },
    knn={
        "query_texts": ["AI research"],  # Will be embedded automatically
        "where": {"year": {"$gte": 2020}},
        "n_results": 10
    },
    rank={"rrf": {}},  # Reciprocal Rank Fusion
    n_results=5,
    include=["documents", "metadatas", "embeddings"]
)

# Hybrid search with multiple query texts (batch)
results = collection1.hybrid_search(
    query={
        "where_document": {"$contains": "AI"},
        "n_results": 10
    },
    knn={
        "query_texts": ["machine learning", "neural networks"],  # Multiple queries
        "n_results": 10
    },
    rank={"rrf": {}},
    n_results=5
)

返回参数

包含 ID、distances、metadatas、document、etc 等的搜索结果字典。

hybrid_search - 混合搜索

前提条件

请求参数

请求示例

返回参数

相关操作

Contents

前提条件​

请求参数​

请求示例​

返回参数​

相关操作​

Contents

前提条件

请求参数

请求示例

返回参数

相关操作