Skip to main content
Version: V1.1.0

Experience vector search with Python SDK

This guide walks you through vector search using pyseekdb (OceanBase's Python client) with embedded seekdb on a Linux environment.

tip

pyseekdb also runs on macOS and Windows. On Windows, only server-mode seekdb is supported. For setup on macOS and Windows, see pyseekdb quick start.

In this example, you will:

  1. Install pyseekdb and run embedded seekdb.
  2. Connect to seekdb and create a database.
  3. Connect to the database and create a collection with an embedding function.
  4. Add documents (vectors are generated automatically).
  5. Run a vector search (query vectors are generated automatically) and print results.
  6. Clean up the database and collection.

Background information

pyseekdb

pyseekdb is OceanBase's Python client for seekdb. It uses a single API surface and supports three connection modes: embedded seekdb, server-mode seekdb, and OceanBase Database.

Installing the client also installs embedded seekdb so you can create databases and run workloads locally. Alternatively, you can connect to an existing server-mode seekdb or OceanBase instance.

seekdb deployment modes

seekdb can run in different modes depending on your needs:

  • Embedded mode: seekdb runs as a lightweight library inside your application. Install with pip. Suited for learning, prototyping, and running on resource-constrained devices.

  • Client/server mode: Recommended for testing and production. Easy to set up and run as a standalone service. For more information, see Experience seekdb with SQL.

Step 1: Install pyseekdb and run embedded seekdb

Prerequisites

  • OS: Linux (glibc >= 2.28)
  • Python: 3.11 or later
  • Architecture: x86_64 or aarch64

Install

pip will use your default Python and platform.

pip install -U pyseekdb

If your pip is outdated, upgrade it first:

pip install --upgrade pip

The client install includes embedded seekdb, so you can create databases and collections without a separate server.

Step 2: Connect to seekdb and create a database

Use the Admin Client to connect and create a database named query_test.

tip
import pyseekdb

# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("query_test")

Step 3: Connect to the database and create a collection with an embedding function

Use the Client to connect to query_test and create a collection.

tip
import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
# Create collection
collection = client.create_collection(
name="query_demo"
)

Step 4: Insert data

Use add to insert documents into the collection.

tip

For the add API, see add - Insert data.

import pyseekdb

client = pyseekdb.Client(path="./seekdb.db", database="query_test")
collection = client.get_collection("query_demo")

documents = [
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language",
"Vector databases enable semantic search",
"Neural networks are inspired by the human brain",
"Natural language processing helps computers understand text"
]

ids = ["id1", "id2", "id3", "id4", "id5"]

# Add documents; embeddings are generated by the embedding function
collection.add(
ids=ids,
documents=documents,
metadatas=[
{"category": "AI", "index": 0},
{"category": "Programming", "index": 1},
{"category": "Database", "index": 2},
{"category": "AI", "index": 3},
{"category": "NLP", "index": 4}
]
)

Step 5: Run a vector search and print results

Use query to perform a vector search.

tip

For the query API, see query - Vector search.

import pyseekdb

client = pyseekdb.Client(path="./seekdb.db", database="query_test")
collection = client.get_collection("query_demo")

# Perform query
query_text = "artificial intelligence and machine learning"

results = collection.query(
query_texts=query_text, # Query text - will be embedded automatically
n_results=3 # Return top 3 most similar documents
)

print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")

for i in range(len(results['ids'][0])):
print(f"\nResult {i+1}:")
print(f" ID: {results['ids'][0][i]}")
print(f" Distance: {results['distances'][0][i]:.4f}")
if results.get('documents'):
print(f" Document: {results['documents'][0][i]}")
if results.get('metadatas'):
print(f" Metadata: {results['metadatas'][0][i]}")

Step 6: Clean up

When you are done, delete the collection and the database. See delete_collection and delete_database for more information.

import pyseekdb

admin = pyseekdb.AdminClient(path="./seekdb.db")
client = pyseekdb.Client(path="./seekdb.db", database="query_test")

# Delete collection
client.delete_collection("query_demo")
print("\nDeleted collection")

# Delete database
admin.delete_database("query_test")
print("\nDeleted database")

Complete example

import pyseekdb

# Create database
admin = pyseekdb.AdminClient(path="./seekdb.db")
admin.create_database("query_test")

# Create collection
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
collection = client.create_collection(name="query_demo")

# Add data
documents = [
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language",
"Vector databases enable semantic search",
"Neural networks are inspired by the human brain",
"Natural language processing helps computers understand text"
]
ids = ["id1", "id2", "id3", "id4", "id5"]
collection.add(
ids=ids,
documents=documents,
metadatas=[
{"category": "AI", "index": 0},
{"category": "Programming", "index": 1},
{"category": "Database", "index": 2},
{"category": "AI", "index": 3},
{"category": "NLP", "index": 4}
]
)

# Query
query_text = "artificial intelligence and machine learning"
results = collection.query(query_texts=query_text, n_results=3)
print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")
for i in range(len(results['ids'][0])):
print(f"\nResult {i+1}:")
print(f" ID: {results['ids'][0][i]}")
print(f" Distance: {results['distances'][0][i]:.4f}")
if results.get('documents'):
print(f" Document: {results['documents'][0][i]}")
if results.get('metadatas'):
print(f" Metadata: {results['metadatas'][0][i]}")

# Cleanup
client.delete_collection("query_demo")
print("\nDeleted collection")
admin.delete_database("query_test")
print("\nDeleted database")

More information