Skip to main content

Experience embedded seekdb with Python SDK

This example demonstrates how to quickly experience embedded seekdb through pyseekdb (a Python client provided by OceanBase) in a Linux environment.

tip

In addition to Linux, you can also use pyseekdb in macOS and Windows. However, only server mode of seekdb is supported. For more information about how to use pyseekdb in macOS and Windows, see Get started with pyseekdb.

In this example, we will perform the following steps:

  1. Deploy pyseekdb and embedded seekdb.
  2. Connect to seekdb and create a database.
  3. Connect to the database and create a collection with Embedding Functions.
  4. Use documents to add data (vectors will be automatically generated).
  5. Perform a hybrid search (vectors will be automatically generated) and print the query results.
  6. Clean up the environment.

Background information

pyseekdb

pyseekdb is a Python client provided by OceanBase. It implements a unified API interface that provides three database connection modes, supporting connections to embedded-mode seekdb, server-mode seekdb, and OceanBase databases.

Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases. Alternatively, you can choose to remotely connect to a deployed seekdb in client/server mode or OceanBase database.

seekdb deployment modes

seekdb provides flexible deployment modes that support everything from rapid prototyping to large-scale user workloads, meeting the full range of your application needs.

  • Embedded mode

    seekdb embeds as a lightweight library installable with a single pip command, ideal for personal learning or prototyping, and can easily run on various end devices.

  • Client/Server mode

    A lightweight and easy-to-use deployment mode recommended for both testing and production, delivering stable and efficient service.

    For information about using seekdb in client/server mode, see Experience client/server mode seekdb with SQL.

Step 1: Install pyseekdb and deploy embedded seekdb

Prerequisites

Ensure that your environment meets the following requirements:

  • Operating system: Linux (glibc >= 2.28)
  • Python version: Python 3.11 and later
  • System architecture: x86_64, aarch64

Installation

Use pip to install, which automatically detects the default Python version and platform.

pip install pyseekdb

If your pip version is low, upgrade pip first before installing.

pip install --upgrade pip

Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases.

Step 2: Connect to seekdb and create a database

Use the Admin Client to connect to seekdb and create a database named hybrid_search_test.

tip
import pyseekdb

# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("hybrid_search_test")

Step 3: Connect to the database and create a collection with embedding functions

Use Client to connect to the hybrid_search_test database and create a collection.

tip
import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Create collection
collection = client.create_collection(
name="hybrid_search_demo"
)

Step 4: Insert data

Use the add method to insert data into the collection.

tip

For more information about inserting data, see add - Insert data.

import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# get collection
collection = client.get_collection("hybrid_search_demo")

# Define documents
documents = [
"Machine learning is revolutionizing artificial intelligence and data science",
"Python programming language is essential for machine learning developers",
"Deep learning neural networks enable advanced AI applications",
"Data science combines statistics, programming, and domain expertise",
"Natural language processing uses machine learning to understand text",
"Computer vision algorithms process images using deep learning techniques",
"Reinforcement learning trains agents through reward-based feedback",
"Python libraries like TensorFlow and PyTorch simplify machine learning",
"Artificial intelligence systems can learn from large datasets",
"Neural networks mimic the structure of biological brain connections"
]
# Define metadatas
metadatas = [
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
]

ids = [f"doc_{i+1}" for i in range(len(documents))]
# Insert data
collection.add(ids=ids, documents=documents, metadatas=metadatas)

Step 5: Perform a hybrid search and print the query results

Use the hybrid_search method to perform a hybrid search query and print the query results.

tip

For more information about hybrid search, see hybrid_search - Hybrid search.

import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# get collection
collection = client.get_collection("hybrid_search_demo")

# Perform hybrid search
hybrid_result = collection.hybrid_search(
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
knn={"query_texts": ["AI research"], "n_results": 10},
rank={"rrf": {}},
n_results=5
)

# Print results
print("\nhybrid_search() Results:")
print(f" ids: {hybrid_result ['ids'][0]}")
print(f" Document: {hybrid_result ['documents'][0]}")

Step 6: Clean up the environment

If you no longer need the example database and collection, you can use the delete_collection method to delete the collection and the delete_database method to delete the database.

tip
import pyseekdb

# Create embedded client
admin = pyseekdb.AdminClient(path="./seekdb.db")
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")

# Delete collection
client.delete_collection("hybrid_search_demo")
print(f"\nDeleted collection")

# Delete database
admin.delete_database("hybrid_search_test")
print(f"\nDeleted database")

Complete sample

import pyseekdb

#==================== Create Database ====================
# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("hybrid_search_test")


# ==================== Create Collection ====================
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Create collection
collection = client.create_collection(
name="hybrid_search_demo"
)

# ==================== Add Data to Collection ====================
# Define documents
documents = [
"Machine learning is revolutionizing artificial intelligence and data science",
"Python programming language is essential for machine learning developers",
"Deep learning neural networks enable advanced AI applications",
"Data science combines statistics, programming, and domain expertise",
"Natural language processing uses machine learning to understand text",
"Computer vision algorithms process images using deep learning techniques",
"Reinforcement learning trains agents through reward-based feedback",
"Python libraries like TensorFlow and PyTorch simplify machine learning",
"Artificial intelligence systems can learn from large datasets",
"Neural networks mimic the structure of biological brain connections"
]
# Define metadatas
metadatas = [
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
]

ids = [f"doc_{i+1}" for i in range(len(documents))]
# Insert data
collection.add(ids=ids, documents=documents, metadatas=metadatas)

# ==================== Perform Hybrid Search ====================
# Perform hybrid search
hybrid_result = collection.hybrid_search(
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
knn={"query_texts": ["AI research"], "n_results": 10},
rank={"rrf": {}},
n_results=5
)

# ==================== Print Query Results ====================
# Print results
print("\nhybrid_search() Results:")
print(f" ids: {hybrid_result ['ids'][0]}")
print(f" Document: {hybrid_result ['documents'][0]}")

# ==================== Cleanup ====================
# Delete collection
client.delete_collection("hybrid_search_demo")
print(f"\nDeleted collection")

# Delete database
admin.delete_database("hybrid_search_test")
print(f"\nDeleted database")

More information