Skip to main content
Version: V1.1.0

Sentence Transformer

Sentence Transformer is a framework for converting sentences, phrases, or short texts into high-dimensional vectors (embeddings). Semantically similar sentences map to nearby vectors, so you can compare them using cosine similarity or Euclidean distance. The framework is built on pre-trained Transformer models (such as BERT and RoBERTa) and uses pooling (for example, mean or CLS) to produce fixed-size sentence vectors. Typical use cases include semantic search, text clustering, sentence classification, information retrieval, and RAG.

Dependencies and authentication

  • Install pyseekdb and sentence-transformers.
  • The sentence-transformers package handles model loading and encoding. Models run locally; the first time you use a given model, its weights are downloaded from Hugging Face Hub and cached. No API key is required.

Example: create a Sentence Transformer embedding function

Import and initialize SentenceTransformerEmbeddingFunction with a model name. The all-MiniLM-L6-v2 model is a lightweight general-purpose option suitable for many use cases.

from pyseekdb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

ef = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
db = pyseekdb.Client(path="./seekdb.db")
collection = db.create_collection(name="my_collection", embedding_function=ef)

Parameters:

  • model_name: Sentence Transformer model name (for example, all-MiniLM-L6-v2, paraphrase-multilingual-MiniLM-L12-v2).
  • name: Collection name (required when creating the collection; example uses my_collection).
  • embedding_function: The embedding function instance (here, ef) used to convert text to vectors.