Skip to main content
Version: V1.1.0

Overview

In seekdb, an embedding function (EF) converts text (or other input) into vectors so you can run semantic search (vector similarity search). seekdb ships with embedding functions for several popular providers. You can use these built-in implementations or implement the EmbeddingFunction interface yourself.

How it works

You bind an embedding function to a collection by passing embeddingFunction when you create or get the collection. Once bound, the collection uses the EF to generate vectors automatically in these cases:

  • Ingest: When you call add, update, or upsert with documents (and do not pass embeddings), the EF converts the text to vectors and they are stored.
  • Query: When you call query with queryTexts (and do not pass queryEmbeddings), the EF converts the query text to a vector and that vector is used for the search.
tip

If you pass embeddings or queryEmbeddings explicitly, seekdb uses those vectors as-is and does not call the collection’s embedding function.

Supported providers

seekdb provides embedding functions for the following providers:

Typical usage (binding to a collection)

The example below shows how to bind an EF to a collection. Replace ef with any provider’s embedding function instance, and ensure client is an existing SeekdbClient instance.

import { SeekdbClient } from "seekdb";
import { OpenAIEmbeddingFunction } from "@seekdb/openai";

const client = new SeekdbClient({
host: "127.0.0.1",
port: 2881,
user: "root",
password: "",
database: "test",
});

// 1. Create an embedding function instance
const ef = new OpenAIEmbeddingFunction({
modelName: "text-embedding-3-small",
});

// 2. Bind the EF when creating the collection
const collection = await client.createCollection({
name: "my_collection",
embeddingFunction: ef,
});

// 3. Add documents; vectors are generated and stored automatically
await collection.add({
ids: ["1", "2"],
documents: ["Hello world", "How are you?"],
metadatas: [{ id: 1 }, { id: 2 }],
});

// 4. Query by text; the query vector is generated and used for search
const results = await collection.query({
queryTexts: "How are you?",
nResults: 1,
});
console.log(results);
tip

You can also call ef.generate(["text"]) directly to produce embeddings, which is useful for debugging or for offline preprocessing in your application.

Default embedding function (all-MiniLM-L6-v2)

If you do not pass embeddingFunction when creating a collection, seekdb uses a default embedding pipeline based on the local model Xenova/all-MiniLM-L6-v2 (384 dimensions). The model runs locally and does not require an API key. The first time you run an operation that needs embeddings, the model files are downloaded automatically.

// No embeddingFunction: default pipeline is used
const collection = await client.createCollection({
name: "default_collection",
});

To use the default embedding logic explicitly (for example, so every collection is created with an embeddingFunction argument), install @seekdb/default-embed and pass its instance:

import { DefaultEmbeddingFunction } from "@seekdb/default-embed";

const defaultEmbed = new DefaultEmbeddingFunction({
// If downloads fail, try region: "intl"
});

const collection = await client.createCollection({
name: "local_embed_collection",
embeddingFunction: defaultEmbed,
});

For more details, see Default embedding function.

Custom embedding functions

If the built-in embedding functions do not meet your needs, you can implement the EmbeddingFunction interface. A custom EF must:

  • Implement generate(texts: string[]): Promise<number[][]>.
  • Expose a name property (a unique string for this embedding function).
  • Implement getConfig() and return a config object (used for serialization and restoration).
  • Implement the static method buildFromConfig to construct a new instance from a config object.
  • (Recommended) Expose dimension so the collection can validate dimensionality when created.

After implementing your class, register it with registerEmbeddingFunction so it can be serialized and restored. Minimal example:

import type { EmbeddingFunction } from "seekdb";
import { registerEmbeddingFunction } from "seekdb";

class MyCustomEmbeddingFunction implements EmbeddingFunction {
readonly name = "my_custom_embedding";
dimension = 3;

async generate(texts: string[]) {
// Implement your embedding logic (e.g., call an internal model service)
return texts.map(() => [0.1, 0.2, 0.3]);
}
getConfig() {
return {};
}
static buildFromConfig() {
return new MyCustomEmbeddingFunction();
}
}

// Register the custom embedding function
registerEmbeddingFunction("my_custom_embedding", MyCustomEmbeddingFunction);

For full requirements and more examples, see: