Experience vector search with Python SDK
This guide walks you through vector search using pyseekdb (OceanBase's Python client) with embedded seekdb on a Linux environment.
pyseekdb also runs on macOS and Windows. On Windows, only server-mode seekdb is supported. For setup on macOS and Windows, see pyseekdb quick start.
In this example, you will:
- Install pyseekdb and run embedded seekdb.
- Connect to seekdb and create a database.
- Connect to the database and create a collection with an embedding function.
- Add documents (vectors are generated automatically).
- Run a vector search (query vectors are generated automatically) and print results.
- Clean up the database and collection.
Background information
pyseekdb
pyseekdb is OceanBase's Python client for seekdb. It uses a single API surface and supports three connection modes: embedded seekdb, server-mode seekdb, and OceanBase Database.
Installing the client also installs embedded seekdb so you can create databases and run workloads locally. Alternatively, you can connect to an existing server-mode seekdb or OceanBase instance.
seekdb deployment modes
seekdb can run in different modes depending on your needs:
-
Embedded mode: seekdb runs as a lightweight library inside your application. Install with pip. Suited for learning, prototyping, and running on resource-constrained devices.
-
Client/server mode: Recommended for testing and production. Easy to set up and run as a standalone service. For more information, see Experience seekdb with SQL.
Step 1: Install pyseekdb and run embedded seekdb
Prerequisites
- OS: Linux (glibc >= 2.28)
- Python: 3.11 or later
- Architecture: x86_64 or aarch64
Install
pip will use your default Python and platform.
pip install -U pyseekdb
If your pip is outdated, upgrade it first:
pip install --upgrade pip
The client install includes embedded seekdb, so you can create databases and collections without a separate server.
Step 2: Connect to seekdb and create a database
Use the Admin Client to connect and create a database named query_test.
-
For more on the Admin Client, see Admin Client.
-
For creating a database, see create_database.
import pyseekdb
# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("query_test")
Step 3: Connect to the database and create a collection with an embedding function
Use the Client to connect to query_test and create a collection.
-
For the Client API, see Client.
-
For creating a collection, see create_collection.
import pyseekdb
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
# Create collection
collection = client.create_collection(
name="query_demo"
)
Step 4: Insert data
Use add to insert documents into the collection.
For the add API, see add - Insert data.
import pyseekdb
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
collection = client.get_collection("query_demo")
documents = [
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language",
"Vector databases enable semantic search",
"Neural networks are inspired by the human brain",
"Natural language processing helps computers understand text"
]
ids = ["id1", "id2", "id3", "id4", "id5"]
# Add documents; embeddings are generated by the embedding function
collection.add(
ids=ids,
documents=documents,
metadatas=[
{"category": "AI", "index": 0},
{"category": "Programming", "index": 1},
{"category": "Database", "index": 2},
{"category": "AI", "index": 3},
{"category": "NLP", "index": 4}
]
)
Step 5: Run a vector search and print results
Use query to perform a vector search.
For the query API, see query - Vector search.
import pyseekdb
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
collection = client.get_collection("query_demo")
# Perform query
query_text = "artificial intelligence and machine learning"
results = collection.query(
query_texts=query_text, # Query text - will be embedded automatically
n_results=3 # Return top 3 most similar documents
)
print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")
for i in range(len(results['ids'][0])):
print(f"\nResult {i+1}:")
print(f" ID: {results['ids'][0][i]}")
print(f" Distance: {results['distances'][0][i]:.4f}")
if results.get('documents'):
print(f" Document: {results['documents'][0][i]}")
if results.get('metadatas'):
print(f" Metadata: {results['metadatas'][0][i]}")
Step 6: Clean up
When you are done, delete the collection and the database. See delete_collection and delete_database for more information.
import pyseekdb
admin = pyseekdb.AdminClient(path="./seekdb.db")
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
# Delete collection
client.delete_collection("query_demo")
print("\nDeleted collection")
# Delete database
admin.delete_database("query_test")
print("\nDeleted database")
Complete example
import pyseekdb
# Create database
admin = pyseekdb.AdminClient(path="./seekdb.db")
admin.create_database("query_test")
# Create collection
client = pyseekdb.Client(path="./seekdb.db", database="query_test")
collection = client.create_collection(name="query_demo")
# Add data
documents = [
"Machine learning is a subset of artificial intelligence",
"Python is a popular programming language",
"Vector databases enable semantic search",
"Neural networks are inspired by the human brain",
"Natural language processing helps computers understand text"
]
ids = ["id1", "id2", "id3", "id4", "id5"]
collection.add(
ids=ids,
documents=documents,
metadatas=[
{"category": "AI", "index": 0},
{"category": "Programming", "index": 1},
{"category": "Database", "index": 2},
{"category": "AI", "index": 3},
{"category": "NLP", "index": 4}
]
)
# Query
query_text = "artificial intelligence and machine learning"
results = collection.query(query_texts=query_text, n_results=3)
print(f"\nQuery: '{query_text}'")
print(f"Query results: {len(results['ids'][0])} items found")
for i in range(len(results['ids'][0])):
print(f"\nResult {i+1}:")
print(f" ID: {results['ids'][0][i]}")
print(f" Distance: {results['distances'][0][i]:.4f}")
if results.get('documents'):
print(f" Document: {results['documents'][0][i]}")
if results.get('metadatas'):
print(f" Metadata: {results['metadatas'][0][i]}")
# Cleanup
client.delete_collection("query_demo")
print("\nDeleted collection")
admin.delete_database("query_test")
print("\nDeleted database")
More information
-
For more information on pyseekdb, see pyseekdb quick start.
-
For more pyseekdb usage examples, see:
-
For SQL-based workflows, see Experience vector search with SQL and Experience hybrid search with SQL.