Skip to main content
Version: V1.1.0

Experience hybrid search with Python SDK

This guide walks you through hybrid search using pyseekdb (OceanBase's Python client) with embedded seekdb on a Linux environment.

tip

pyseekdb also runs on macOS and Windows. On Windows, only server-mode seekdb is supported. For setup on macOS and Windows, see pyseekdb quick start.

In this example, you will:

  1. Install pyseekdb and run embedded seekdb.
  2. Connect to seekdb and create a database.
  3. Connect to the database and create a collection with an embedding function.
  4. Add documents (vectors are generated automatically).
  5. Run a hybrid search (query vectors are generated automatically) and print results.
  6. Clean up the database and collection.

Background information

pyseekdb

pyseekdb is OceanBase's Python client for seekdb. It uses a single API surface and supports three connection modes: embedded seekdb, server-mode seekdb, and OceanBase Database.

Installing the client also installs embedded seekdb so you can create databases and run workloads locally. Alternatively, you can connect to an existing server-mode seekdb or OceanBase instance.

seekdb deployment modes

seekdb can run in different modes depending on your needs:

  • Embedded mode: seekdb runs as a lightweight library inside your application. Install with pip. Suited for learning, prototyping, and running on resource-constrained devices.

  • Client/server mode: Recommended for testing and production. Easy to set up and run as a standalone service. For more information, see Experience seekdb with SQL.

Step 1: Install pyseekdb and deploy embedded seekdb

Prerequisites

Ensure that your environment meets the following requirements:

  • Operating system: Linux (glibc >= 2.28)
  • Python version: Python 3.11 and later
  • System architecture: x86_64, aarch64

Installation

Use pip to install, which automatically detects the default Python version and platform.

pip install -U pyseekdb

If your pip version is low, upgrade pip first before installing.

pip install --upgrade pip

Installing this client also installs embedded-mode seekdb, allowing you to directly connect to embedded seekdb to perform operations such as creating databases.

Step 2: Connect to seekdb and create a database

Use the Admin Client to connect to seekdb and create a database named hybrid_search_test.

tip
import pyseekdb

# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("hybrid_search_test")

Step 3: Connect to the database and create a collection with embedding functions

Use Client to connect to the hybrid_search_test database and create a collection.

tip
import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Create collection
collection = client.create_collection(
name="hybrid_search_demo"
)

Step 4: Insert data

Use the add method to insert data into the collection.

tip

For more information about inserting data, see add - Insert data.

import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# get collection
collection = client.get_collection("hybrid_search_demo")

# Define documents
documents = [
"Machine learning is revolutionizing artificial intelligence and data science",
"Python programming language is essential for machine learning developers",
"Deep learning neural networks enable advanced AI applications",
"Data science combines statistics, programming, and domain expertise",
"Natural language processing uses machine learning to understand text",
"Computer vision algorithms process images using deep learning techniques",
"Reinforcement learning trains agents through reward-based feedback",
"Python libraries like TensorFlow and PyTorch simplify machine learning",
"Artificial intelligence systems can learn from large datasets",
"Neural networks mimic the structure of biological brain connections"
]
# Define metadatas
metadatas = [
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
]

ids = [f"doc_{i+1}" for i in range(len(documents))]
# Insert data
collection.add(ids=ids, documents=documents, metadatas=metadatas)

Step 5: Perform a hybrid search and print the query results

Use the hybrid_search method to perform a hybrid search query and print the query results.

tip

For more information about hybrid search, see hybrid_search - Hybrid search.

import pyseekdb

# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# get collection
collection = client.get_collection("hybrid_search_demo")

# Perform hybrid search
hybrid_result = collection.hybrid_search(
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
knn={"query_texts": ["AI research"], "n_results": 10},
rank={"rrf": {}},
n_results=5
)

# Print results
print("\nhybrid_search() Results:")
print(f" ids: {hybrid_result['ids'][0]}")
print(f" Document: {hybrid_result['documents'][0]}")

Step 6: Clean up the environment

If you no longer need the example database and collection, you can use the delete_collection method to delete the collection and the delete_database method to delete the database.

tip
import pyseekdb

# Create embedded client
admin = pyseekdb.AdminClient(path="./seekdb.db")
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")

# Delete collection
client.delete_collection("hybrid_search_demo")
print(f"\nDeleted collection")

# Delete database
admin.delete_database("hybrid_search_test")
print(f"\nDeleted database")

Complete example

import pyseekdb

#==================== Create Database ====================
# Create embedded admin client
admin = pyseekdb.AdminClient(path="./seekdb.db")
# Create database
admin.create_database("hybrid_search_test")


# ==================== Create Collection ====================
# Create embedded client
client = pyseekdb.Client(path="./seekdb.db", database="hybrid_search_test")
# Create collection
collection = client.create_collection(
name="hybrid_search_demo"
)

# ==================== Add Data to Collection ====================
# Define documents
documents = [
"Machine learning is revolutionizing artificial intelligence and data science",
"Python programming language is essential for machine learning developers",
"Deep learning neural networks enable advanced AI applications",
"Data science combines statistics, programming, and domain expertise",
"Natural language processing uses machine learning to understand text",
"Computer vision algorithms process images using deep learning techniques",
"Reinforcement learning trains agents through reward-based feedback",
"Python libraries like TensorFlow and PyTorch simplify machine learning",
"Artificial intelligence systems can learn from large datasets",
"Neural networks mimic the structure of biological brain connections"
]
# Define metadatas
metadatas = [
{"category": "AI", "topic": "machine learning", "year": 2023, "popularity": 95},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 88},
{"category": "AI", "topic": "deep learning", "year": 2024, "popularity": 92},
{"category": "Data Science", "topic": "data analysis", "year": 2023, "popularity": 85},
{"category": "AI", "topic": "nlp", "year": 2024, "popularity": 90},
{"category": "AI", "topic": "computer vision", "year": 2023, "popularity": 87},
{"category": "AI", "topic": "reinforcement learning", "year": 2024, "popularity": 89},
{"category": "Programming", "topic": "python", "year": 2023, "popularity": 91},
{"category": "AI", "topic": "general ai", "year": 2023, "popularity": 93},
{"category": "AI", "topic": "neural networks", "year": 2024, "popularity": 94}
]

ids = [f"doc_{i+1}" for i in range(len(documents))]
# Insert data
collection.add(ids=ids, documents=documents, metadatas=metadatas)

# ==================== Perform Hybrid Search ====================
# Perform hybrid search
hybrid_result = collection.hybrid_search(
query={"where_document": {"$contains": "machine learning"}, "n_results": 10},
knn={"query_texts": ["AI research"], "n_results": 10},
rank={"rrf": {}},
n_results=5
)

# ==================== Print Query Results ====================
# Print results
print("\nhybrid_search() Results:")
print(f" ids: {hybrid_result['ids'][0]}")
print(f" Document: {hybrid_result['documents'][0]}")

# ==================== Cleanup ====================
# Delete collection
client.delete_collection("hybrid_search_demo")
print(f"\nDeleted collection")

# Delete database
admin.delete_database("hybrid_search_test")
print(f"\nDeleted database")

More information