Overview
In seekdb, an embedding function (EF) converts input text into a numeric vector (an embedding). Embeddings are used for semantic search (vector similarity search) and are commonly combined with other filters in hybrid search workflows.
seekdb provides ready-to-use embedding functions for popular providers. You can also implement your own embedding function if you need to integrate a model or service that is not built in.
How embedding functions work
You can bind an embedding function to a collection by passing embedding_function when you create or retrieve the collection. Once it is bound:
- Ingest: if you pass
documents(and do not passembeddings) toadd/update/upsert, the embedding function is called to generate vectors automatically. - Query: if you pass
query_texts(and do not passquery_embeddings) toquery, the embedding function is called to generate query vectors automatically.
Supported providers
seekdb supports the following embedding functions:
- Amazon Bedrock
- Cohere
- Google Vertex AI
- Jina AI
- Ollama
- OpenAI
- Qwen
- Sentence Transformer
- SiliconFlow
- Tencent Hunyuan
- Voyage AI
Typical usage (binding to a collection)
The snippet below shows the typical pattern. Replace ef with an embedding function instance from any supported provider.
import pyseekdb
# Use any embedding function from a supported provider
from pyseekdb.utils.embedding_functions import SomePlatformEmbeddingFunction
# Initialize the embedding function
ef = SomePlatformEmbeddingFunction(api_key="your-api-key")
# Create a collection and bind the embedding function
db = pyseekdb.Client(path="./seekdb.db")
collection = db.create_collection(
name="my_collection",
embedding_function=ef,
)
# Ingest documents. The embeddings are generated automatically.
collection.add(
ids=["1", "2"],
documents=["Hello world", "How are you?"],
)
# Query by text. The query embeddings are generated automatically.
results = collection.query(query_texts="How are you?", n_results=1)
Default embedding function: all-MiniLM-L6-v2
If you do not specify embedding_function when creating a collection, seekdb uses DefaultEmbeddingFunction by default.
The default model is all-MiniLM-L6-v2 (384 dimensions) from Sentence Transformers. It runs locally and may download model files automatically on first use.
import pyseekdb
db = pyseekdb.Client(path="./seekdb.db")
# If embedding_function is not specified, DefaultEmbeddingFunction is used.
collection = db.create_collection(name="default_collection")
For more information about the default embedding function, please refer to Default embedding function.
Custom embedding functions
If the built-in embedding functions do not meet your requirements, you can implement the EmbeddingFunction interface to provide your own embedding logic. A custom embedding function should:
- Implement
__call__: acceptstr | List[str]and returnList[List[float]] - (Recommended) Implement
dimension: return the embedding dimension as anintso the collection can validate dimensional consistency
Example:
from typing import List, Union
from pyseekdb import EmbeddingFunction
Documents = Union[str, List[str]]
Embeddings = List[List[float]]
@register_embedding_function
class MyCustomEmbeddingFunction(EmbeddingFunction[Documents]):
def __init__(self, model_name: str = "my-model"):
self.model_name = model_name
def __call__(self, input: list[str] | str) -> list[list[float]]:
# Your embedding logic
return [[0.1, 0.2, 0.3] for _ in (input if isinstance(input, list) else [input])]
@property
def dimension(self) -> int:
return 3 # Replace with your model dimension
@staticmethod
def name() -> str:
return "my_custom_embedding"
def get_config(self) -> Dict[str, Any]:
return {"model_name": self.model_name}
@staticmethod
def build_from_config(config: Dict[str, Any]) -> "MyCustomEmbeddingFunction":
return MyCustomEmbeddingFunction(model_name=config.get("model_name", "my-model"))
For more information about custom embedding functions, please refer to: