Skip to main content
Version: V1.1.0

OpenAI

OpenAI embedding models convert text (words, sentences, or paragraphs) into high-dimensional vectors (embeddings) using pre-trained models such as text-embedding-3-large. These vectors capture semantic meaning so you can measure similarity (for example, via cosine similarity) without additional training. Typical use cases include semantic search, question answering, text clustering and classification, recommendation systems, and retrieval-augmented generation (RAG).

tip

Using OpenAI service requires you to follow OpenAI's pricing rules and may incur corresponding fees. Before proceeding, please visit their official website or refer to relevant documentation to confirm and accept their pricing standards. If you do not agree, please do not proceed.

Dependencies and authentication

  • Install the pyseekdb and openai packages. The openai package is the official OpenAI Python SDK.
  • Create an API key in the OpenAI Platform and set it in your environment so OpenAIEmbeddingFunction can read it.

Examples: create an OpenAI embedding function

Import and initialize OpenAIEmbeddingFunction. API keys are usually provided via environment variables.

  • Basic usage

    If you do not specify a model, the default is text-embedding-3-small.

    from pyseekdb.utils.embedding_functions import OpenAIEmbeddingFunction

    # Default model text-embedding-3-small and default env var OPENAI_API_KEY
    ef = OpenAIEmbeddingFunction()
  • Advanced usage

    Override the API key environment variable name and set dimensions, timeout, and max_retries as needed.

    from pyseekdb.utils.embedding_functions import OpenAIEmbeddingFunction

    # Cost-effective model
    ef = OpenAIEmbeddingFunction(model_name="text-embedding-3-small")

    # Higher-accuracy model with timeout and retries
    ef = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-large",
    timeout=30,
    max_retries=3
    )

    # Dynamic-dimension model with output dimension 512
    ef = OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small",
    dimensions=512 # Reduce from default 1536 to 512
    )

Parameters:

  • model_name: OpenAI embedding model name (for example, text-embedding-3-small, text-embedding-3-large).
  • api_key_env: Environment variable name for the API key (default: OPENAI_API_KEY).
  • timeout: Request timeout (seconds), optional.
  • max_retries: Maximum number of retries on failure, optional.
  • dimensions: Output vector dimension (only effective when the model supports dynamic dimensions, such as text-embedding-3-small can be adjusted from the default 1536 to a smaller value), optional.