Skip to main content

Experience embedded seekdb with Python SDK

This example demonstrates how to quickly experience embedded seekdb through pyseekdb (a Python client provided by OceanBase).

Background information

pyseekdb

pyseekdb is a Python client provided by OceanBase. It implements a unified API interface that provides three database connection modes, supporting connections to embedded-mode seekdb, server-mode seekdb, and OceanBase databases.

Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases. Alternatively, you can choose to remotely connect to a deployed seekdb in client/server mode or OceanBase database.

seekdb deployment modes

seekdb provides flexible deployment modes that support everything from rapid prototyping to large-scale user workloads, meeting the full range of your application needs.

  • Embedded mode

    seekdb embeds as a lightweight library installable with a single pip command, ideal for personal learning or prototyping, and can easily run on various end devices.

  • Client/Server mode

    A lightweight and easy-to-use deployment mode recommended for both testing and production, delivering stable and efficient service.

    For information about using seekdb in client/server mode, see Experience seekdb in client/server mode.

Install pyseekdb

Prerequisites

Ensure that your environment meets the following requirements:

  • Operating system: Linux (glibc >= 2.28)
  • Python version: Python 3.11 and later
  • System architecture: x86_64, aarch64

Installation

Use pip to install, which automatically detects the default Python version and platform.

pip install pyseekdb

If your pip version is low, upgrade pip first before installing.

pip install --upgrade pip

Experience seekdb with Python SDK

The following example uses embedded-mode seekdb to demonstrate basic operations with embedding functions, helping you quickly understand how to use seekdb.

  1. Connect to seekdb.
  2. Create a collection with embedding functions.
  3. Add data using documents (vectors are automatically generated).
  4. Query using texts (vectors are automatically generated).
  5. Print query results.
"""
Simple Example: Basic usage of SeekDBClient with Embedding Functions

This example demonstrates the most common operations with embedding functions:
1. Create a client connection
2. Create a collection with embedding function
3. Add data using documents (embeddings auto-generated)
4. Query using query texts (embeddings auto-generated)
5. Print query results

This is a minimal example to get you started quickly with embedding functions.
"""
import pyseekdb

# ==================== Step 1: Create Client Connection ====================
# You can use embedded mode, server mode, or OceanBase mode
# For this example, we'll use server mode (you can change to embedded or OceanBase)

# Embedded mode (local SeekDB)
client = pyseekdb.Client()
# Alternative: Server mode (connecting to remote SeekDB server)
# client = pyseekdb.Client(
# host="127.0.0.1",
# port=2881,
# database="test",
# user="root",
# password=""
# )

# Alternative: Remote server mode (OceanBase Server)
# client = pyseekdb.Client(
# host="127.0.0.1",
# port=2881,
# tenant="test", # OceanBase default tenant
# database="test",
# user="root",
# password=""
# )

# ==================== Step 2: Create a Collection with Embedding Function ====================
# A collection is like a table that stores documents with vector embeddings
collection_name = "my_simple_collection"

# Create collection with default embedding function
# The embedding function will automatically convert documents to embeddings
collection = client.create_collection(
name=collection_name,
)

print(f"Created collection '{collection_name}' with dimension: {collection.dimension}")
print(f"Embedding function: {collection.embedding_function}")

# ==================== Step 3: Add Data to Collection ====================
# With embedding function, you can add documents directly without providing embeddings
# The embedding function will automatically generate embeddings from documents

documents = [
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language",
"Vector databases enable semantic search",
"Neural networks are inspired by the human brain",
"Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add data with documents only - embeddings will be auto-generated by embedding function
collection.add(
ids=ids,
documents=documents, # embeddings will be automatically generated
metadatas=[
{"category": "AI", "index": 0},
{"category": "Programming", "index": 1},
{"category": "Database", "index": 2},
{"category": "AI", "index": 3},
{"category": "NLP", "index": 4}
]
)

print(f"\nAdded {len(documents)} documents to collection")
print("Note: Embeddings were automatically generated from documents using the embedding function")

# ==================== Step 4: Query the Collection ====================
# With embedding function, you can query using text directly
# The embedding function will automatically convert query text to query vector

# Query using text - query vector will be auto-generated by embedding function
query_text = "artificial intelligence and machine learning"

results = collection.query(
query_texts=query_text, # Query text - will be embedded automatically
n_results=3 # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

# ==================== Step 5: Print Query Results ====================
for i in range(len(results['ids'][0])):
print(f"\nResult {i+1}:")
print(f" ID: {results['ids'][0][i]}")
print(f" Distance: {results['distances'][0][i]:.4f}")
if results.get('documents'):
print(f" Document: {results['documents'][0][i]}")
if results.get('metadatas'):
print(f" Metadata: {results['metadatas'][0][i]}")

# ==================== Step 6: Cleanup ====================
# Delete the collection
client.delete_collection(collection_name)
print(f"\nDeleted collection '{collection_name}'")

More information