Sentence Transformer
Sentence Transformer is a framework for converting sentences, phrases, or short texts into high-dimensional vectors (embeddings). Semantically similar sentences map to nearby vectors, so you can compare them using cosine similarity or Euclidean distance. The framework is built on pre-trained Transformer models (such as BERT and RoBERTa) and uses pooling (for example, mean or CLS) to produce fixed-size sentence vectors. Typical use cases include semantic search, text clustering, sentence classification, information retrieval, and RAG.
Dependencies and authentication
- Install
pyseekdbandsentence-transformers. - The
sentence-transformerspackage handles model loading and encoding. Models run locally; the first time you use a given model, its weights are downloaded from Hugging Face Hub and cached. No API key is required.
Example: create a Sentence Transformer embedding function
Import and initialize SentenceTransformerEmbeddingFunction with a model name. The all-MiniLM-L6-v2 model is a lightweight general-purpose option suitable for many use cases.
from pyseekdb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
ef = SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
db = pyseekdb.Client(path="./seekdb.db")
collection = db.create_collection(name="my_collection", embedding_function=ef)
Parameters:
model_name: Sentence Transformer model name (for example,all-MiniLM-L6-v2,paraphrase-multilingual-MiniLM-L12-v2).name: Collection name (required when creating the collection; example usesmy_collection).embedding_function: The embedding function instance (here,ef) used to convert text to vectors.