OpenAI
OpenAI embedding models convert text (words, sentences, or paragraphs) into high-dimensional vectors (embeddings) using pre-trained models such as text-embedding-3-large. These vectors capture semantic meaning so you can measure similarity (for example, via cosine similarity) without additional training. Typical use cases include semantic search, question answering, text clustering and classification, recommendation systems, and retrieval-augmented generation (RAG).
Using OpenAI service requires you to follow OpenAI's pricing rules and may incur corresponding fees. Before proceeding, please visit their official website or refer to relevant documentation to confirm and accept their pricing standards. If you do not agree, please do not proceed.
Dependencies and authentication
- Install the
pyseekdbandopenaipackages. Theopenaipackage is the official OpenAI Python SDK. - Create an API key in the OpenAI Platform and set it in your environment so
OpenAIEmbeddingFunctioncan read it.
Examples: create an OpenAI embedding function
Import and initialize OpenAIEmbeddingFunction. API keys are usually provided via environment variables.
-
Basic usage
If you do not specify a model, the default is
text-embedding-3-small.from pyseekdb.utils.embedding_functions import OpenAIEmbeddingFunction
# Default model text-embedding-3-small and default env var OPENAI_API_KEY
ef = OpenAIEmbeddingFunction() -
Advanced usage
Override the API key environment variable name and set
dimensions,timeout, andmax_retriesas needed.from pyseekdb.utils.embedding_functions import OpenAIEmbeddingFunction
# Cost-effective model
ef = OpenAIEmbeddingFunction(model_name="text-embedding-3-small")
# Higher-accuracy model with timeout and retries
ef = OpenAIEmbeddingFunction(
model_name="text-embedding-3-large",
timeout=30,
max_retries=3
)
# Dynamic-dimension model with output dimension 512
ef = OpenAIEmbeddingFunction(
model_name="text-embedding-3-small",
dimensions=512 # Reduce from default 1536 to 512
)
Parameters:
model_name: OpenAI embedding model name (for example,text-embedding-3-small,text-embedding-3-large).api_key_env: Environment variable name for the API key (default:OPENAI_API_KEY).timeout: Request timeout (seconds), optional.max_retries: Maximum number of retries on failure, optional.dimensions: Output vector dimension (only effective when the model supports dynamic dimensions, such astext-embedding-3-smallcan be adjusted from the default 1536 to a smaller value), optional.