hybrid_search - Hybrid search
hybrid_search() combines full-text search and vector similarity search with ranking.
This API is only available when using the Client. For more information about the Client, see Client.
Prerequisites
-
You have installed pyseekdb. For more information about how to install pyseekdb, see Get Started.
-
You have connected to the database. For more information about how to connect to the database, see Client.
-
You have created a collection and inserted data. For more information about how to create a collection and insert data, see create_collection - Create a collection and add - Insert Data.
Request parameters
hybrid_search(
query={
"where_document": ,
"where": ,
"n_results":
},
knn={
"query_texts":
"where":
"n_results":
},
rank=,
n_results=,
include=
)
-
query: full-text search configuration, including the following parameters:
Parameter Type Required Description Example value wheredict Optional Metadata filter conditions. {"category": {"$eq": "AI"}}where_documentdict Optional Document filter conditions. {"$contains": "machine"}n_resultsint Yes Number of results for full-text search. -
knn: vector search configuration, including the following parameters:
Parameter Type Required Description Example value query_embeddingsList[float] or List[List[float]] Yes A single vector or list of vectors for batch queries; if provided, it will be used directly (ignoring embedding_function); if not provided,query_textmust be provided, and thecollectionmust have anembedding_function[1.0, 2.0, 3.0] query_textsstr or List[str] Optional A single vector or list of vectors; if provided, it will be used directly (ignoring embedding_function); if not provided,documentsmust be provided, and thecollectionmust have anembedding_function["my query text"] wheredict Optional Metadata filter conditions. {"category": {"$eq": "AI"}}n_resultsint Yes Number of results for vector search. -
Other parameters are as follows:
|Parameter|Type|Required|Description|Example value| |
rank|dict |Optional|Ranking configuration, for example:{"rrf": {"rank_window_size": 60, "rank_constant": 60}}|{"category": {"$eq": "AI"}}| |n_results|int|Yes|Number of similar results to return. Default value is 10|3| |include|List[str]|Optional|List of fields to include:["documents", "metadatas", "embeddings"].|["documents", "metadatas", "embeddings"]|
The embedding_function used is associated with the collection (set during create_collection() or get_collection()). You cannot override it for each operation.
Request example
import pyseekdb
# Create a client
client = pyseekdb.Client()
collection = client.get_collection("my_collection")
collection1 = client.get_collection("my_collection1")
# Hybrid search with query_embeddings (embedding_function not used)
results = collection.hybrid_search(
query={
"where_document": {"$contains": "machine learning"},
"n_results": 10
},
knn={
"query_embeddings": [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], # Used directly
"n_results": 10
},
rank={"rrf": {}},
n_results=5
)
# Hybrid search with both full-text and vector search (using query_texts)
results = collection1.hybrid_search(
query={
"where_document": {"$contains": "machine learning"},
"where": {"category": {"$eq": "science"}},
"n_results": 10
},
knn={
"query_texts": ["AI research"], # Will be embedded automatically
"where": {"year": {"$gte": 2020}},
"n_results": 10
},
rank={"rrf": {}}, # Reciprocal Rank Fusion
n_results=5,
include=["documents", "metadatas", "embeddings"]
)
# Hybrid search with multiple query texts (batch)
results = collection1.hybrid_search(
query={
"where_document": {"$contains": "AI"},
"n_results": 10
},
knn={
"query_texts": ["machine learning", "neural networks"], # Multiple queries
"n_results": 10
},
rank={"rrf": {}},
n_results=5
)
Return parameters
A dictionary containing search results, including ID, distances, metadatas, document, etc.