Skip to main content
Version: V1.1.0

Overview

In seekdb, an embedding function (EF) converts input text into a numeric vector (an embedding). Embeddings are used for semantic search (vector similarity search) and are commonly combined with other filters in hybrid search workflows.

seekdb provides ready-to-use embedding functions for popular providers. You can also implement your own embedding function if you need to integrate a model or service that is not built in.

How embedding functions work

You can bind an embedding function to a collection by passing embedding_function when you create or retrieve the collection. Once it is bound:

  • Ingest: if you pass documents (and do not pass embeddings) to add / update / upsert, the embedding function is called to generate vectors automatically.
  • Query: if you pass query_texts (and do not pass query_embeddings) to query, the embedding function is called to generate query vectors automatically.

Supported providers

seekdb supports the following embedding functions:

Typical usage (binding to a collection)

The snippet below shows the typical pattern. Replace ef with an embedding function instance from any supported provider.

import pyseekdb
# Use any embedding function from a supported provider
from pyseekdb.utils.embedding_functions import SomePlatformEmbeddingFunction

# Initialize the embedding function
ef = SomePlatformEmbeddingFunction(api_key="your-api-key")

# Create a collection and bind the embedding function
db = pyseekdb.Client(path="./seekdb.db")
collection = db.create_collection(
name="my_collection",
embedding_function=ef,
)

# Ingest documents. The embeddings are generated automatically.
collection.add(
ids=["1", "2"],
documents=["Hello world", "How are you?"],
)

# Query by text. The query embeddings are generated automatically.
results = collection.query(query_texts="How are you?", n_results=1)

Default embedding function: all-MiniLM-L6-v2

If you do not specify embedding_function when creating a collection, seekdb uses DefaultEmbeddingFunction by default.

The default model is all-MiniLM-L6-v2 (384 dimensions) from Sentence Transformers. It runs locally and may download model files automatically on first use.

import pyseekdb

db = pyseekdb.Client(path="./seekdb.db")

# If embedding_function is not specified, DefaultEmbeddingFunction is used.
collection = db.create_collection(name="default_collection")

For more information about the default embedding function, please refer to Default embedding function.

Custom embedding functions

If the built-in embedding functions do not meet your requirements, you can implement the EmbeddingFunction interface to provide your own embedding logic. A custom embedding function should:

  • Implement __call__: accept str | List[str] and return List[List[float]]
  • (Recommended) Implement dimension: return the embedding dimension as an int so the collection can validate dimensional consistency

Example:

from typing import List, Union
from pyseekdb import EmbeddingFunction

Documents = Union[str, List[str]]
Embeddings = List[List[float]]

@register_embedding_function
class MyCustomEmbeddingFunction(EmbeddingFunction[Documents]):
def __init__(self, model_name: str = "my-model"):
self.model_name = model_name

def __call__(self, input: list[str] | str) -> list[list[float]]:
# Your embedding logic
return [[0.1, 0.2, 0.3] for _ in (input if isinstance(input, list) else [input])]

@property
def dimension(self) -> int:
return 3 # Replace with your model dimension

@staticmethod
def name() -> str:
return "my_custom_embedding"

def get_config(self) -> Dict[str, Any]:
return {"model_name": self.model_name}

@staticmethod
def build_from_config(config: Dict[str, Any]) -> "MyCustomEmbeddingFunction":
return MyCustomEmbeddingFunction(model_name=config.get("model_name", "my-model"))

For more information about custom embedding functions, please refer to: