Experience embedded seekdb with Python SDK
This example demonstrates how to quickly experience embedded seekdb through pyseekdb (a Python client provided by OceanBase) in a Linux environment.
In addition to Linux, you can also use pyseekdb in macOS and Windows. However, only server mode of seekdb is supported. For more information about how to use pyseekdb in macOS and Windows, see Get started with pyseekdb.
In this example, we will perform the following steps:
- Deploy pyseekdb and embedded seekdb.
- Connect to seekdb and create a database.
- Connect to the database and create a collection with Embedding Functions.
- Use documents to add data (vectors will be automatically generated).
- Perform a hybrid search (vectors will be automatically generated) and print the query results.
- Clean up the environment.
Background information
pyseekdb
pyseekdb is a Python client provided by OceanBase. It implements a unified API interface that provides three database connection modes, supporting connections to embedded-mode seekdb, server-mode seekdb, and OceanBase databases.
Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases. Alternatively, you can choose to remotely connect to a deployed seekdb in client/server mode or OceanBase database.
seekdb deployment modes
seekdb provides flexible deployment modes that support everything from rapid prototyping to large-scale user workloads, meeting the full range of your application needs.
-
Embedded mode
seekdb embeds as a lightweight library installable with a single pip command, ideal for personal learning or prototyping, and can easily run on various end devices.
-
Client/Server mode
A lightweight and easy-to-use deployment mode recommended for both testing and production, delivering stable and efficient service.
For information about using seekdb in client/server mode, see Experience client/server mode seekdb with SQL.
Step 1: Install pyseekdb and deploy embedded seekdb
Prerequisites
Ensure that your environment meets the following requirements:
- Operating system: Linux (glibc >= 2.28)
- Python version: Python 3.11 and later
- System architecture: x86_64, aarch64
Installation
Use pip to install, which automatically detects the default Python version and platform.
pip install pyseekdb
If your pip version is low, upgrade pip first before installing.
pip install --upgrade pip
Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases.
Step 2: Connect to seekdb and create a database
Use the Admin Client to connect to seekdb and create a database named hybrid_search_test.
-
For more information about the
Admin Client, see Admin Client. -
For more information about creating a database, see create_database - Create a database.
import pyseekdb
# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("hybrid_search_test")
Step 3: Connect to the database and create a collection with embedding functions
Use Client to connect to the hybrid_search_test database and create a collection.
- For more information about
Client, see Client. - For more information about creating a collection, see create_collection - Create a Collection.
import pyseekdb
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Create collection
collection = client.create_collection(
name="hybrid_search_demo"
)
Step 4: Insert data
Use the add method to insert data into the collection.
For more information about inserting data, see add - Insert data.
import pyseekdb
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# get collection
collection = client.get_collection("hybrid_search_demo")
# Define documents
documents = [
"Machine learning is revolutionizing artificial intelligence and data science",
"Python programming language is essential for machine learning developers",
"Deep learning neural networks enable advanced AI applications",
"Data science combines statistics, programming, and domain expertise",
"Natural language processing uses machine learning to understand text",
"Computer vision algorithms process images using deep learning techniques",
"Reinforcement learning trains agents through reward-based feedback",
"Python libraries like TensorFlow and PyTorch simplify machine learning",
"Artificial intelligence systems can learn from large datasets",
"Neural networks mimic the structure of biological brain connections"
]
# Define metadatas
metadatas = [
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
]
ids = [f"doc_{i+1}" for i in range(len(documents))]
# Insert data
collection.add(ids=ids, documents=documents, metadatas=metadatas)
Step 5: Perform a hybrid search and print the query results
Use the hybrid_search method to perform a hybrid search query and print the query results.
For more information about hybrid search, see hybrid_search - Hybrid search.
import pyseekdb
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# get collection
collection = client.get_collection("hybrid_search_demo")
# Perform hybrid search
hybrid_result = collection.hybrid_search(
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
knn={"query_texts": ["AI research"], "n_results": 10},
rank={"rrf": {}},
n_results=5
)
# Print results
print("\nhybrid_search() Results:")
print(f" ids: {hybrid_result ['ids'][0]}")
print(f" Document: {hybrid_result ['documents'][0]}")
Step 6: Clean up the environment
If you no longer need the example database and collection, you can use the delete_collection method to delete the collection and the delete_database method to delete the database.
-
For more information about deleting a collection, see delete_collection - Delete a Collection.
-
For more information about deleting a database, see delete_database - Delete a database.
import pyseekdb
# Create embedded client
admin = pyseekdb.AdminClient(path="./seekdb.db")
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Delete collection
client.delete_collection("hybrid_search_demo")
print(f"\nDeleted collection")
# Delete database
admin.delete_database("hybrid_search_test")
print(f"\nDeleted database")
Complete sample
import pyseekdb
#==================== Create Database ====================
# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("hybrid_search_test")
# ==================== Create Collection ====================
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Create collection
collection = client.create_collection(
name="hybrid_search_demo"
)
# ==================== Add Data to Collection ====================
# Define documents
documents = [
"Machine learning is revolutionizing artificial intelligence and data science",
"Python programming language is essential for machine learning developers",
"Deep learning neural networks enable advanced AI applications",
"Data science combines statistics, programming, and domain expertise",
"Natural language processing uses machine learning to understand text",
"Computer vision algorithms process images using deep learning techniques",
"Reinforcement learning trains agents through reward-based feedback",
"Python libraries like TensorFlow and PyTorch simplify machine learning",
"Artificial intelligence systems can learn from large datasets",
"Neural networks mimic the structure of biological brain connections"
]
# Define metadatas
metadatas = [
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
]
ids = [f"doc_{i+1}" for i in range(len(documents))]
# Insert data
collection.add(ids=ids, documents=documents, metadatas=metadatas)
# ==================== Perform Hybrid Search ====================
# Perform hybrid search
hybrid_result = collection.hybrid_search(
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
knn={"query_texts": ["AI research"], "n_results": 10},
rank={"rrf": {}},
n_results=5
)
# ==================== Print Query Results ====================
# Print results
print("\nhybrid_search() Results:")
print(f" ids: {hybrid_result ['ids'][0]}")
print(f" Document: {hybrid_result ['documents'][0]}")
# ==================== Cleanup ====================
# Delete collection
client.delete_collection("hybrid_search_demo")
print(f"\nDeleted collection")
# Delete database
admin.delete_database("hybrid_search_test")
print(f"\nDeleted database")
More information
-
For more detailed introduction and usage of pyseekdb, see pyseekdb.
-
For more pyseekdb usage examples, see:
-
Complete example: Demonstrates all capabilities currently supported by pyseekdb.
-
Hybrid search example: Demonstrates the usage of seekdb hybrid search.
-
Hybrid search example: Demonstrates the usage of seekdb hybrid search, and compares it with vector retrieval capabilities.
-
-
In addition to the Python SDK, seekdb also supports operations through SQL. For SQL usage, see Experience seekdb in client/server mode.