Integrate seekdb vector search with Jina AI

seekdb supports vector data storage, vector indexes, and embedding vector search. You can store vectorized data in seekdb for further search.

Jina AI is an AI platform focused on multimodal search and vector search. It offers core components and tools for building enterprise-grade Retrieval-Augmented Generation (RAG) applications based on multimodal search, helping organizations and developers create advanced search-driven generative AI solutions.

Prerequisites

You have deployed seekdb.
You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.
You have installed Python 3.11 or later.

You have installed required dependencies:

python3 -m pip install pyobvector requests sqlalchemy

Step 1: Obtain the database connection information

Contact your seekdb deployment engineer or administrator to obtain the database connection string. For example:

obclient -h$host -P$port -u$user_name -p$password -D$database_name

Parameters:

$host: The IP address for connecting to seekdb.
$port: The port number for connecting to seekdb. Default is 2881.
$database_name: The name of the database to access.

tip
The connected user must have CREATE, INSERT, DROP, and SELECT privileges on the database.
$user_name: The username for connecting to the database.
$password: The password for the account.

Step 2: Build your AI assistant

Set your Jina AI API key as an environment variable

Get your Jina AI API key and configure it, along with your seekdb connection details, as environment variables:

export OCEANBASE_DATABASE_URL=YOUR_OCEANBASE_DATABASE_URL
export OCEANBASE_DATABASE_USER=YOUR_OCEANBASE_DATABASE_USER
export OCEANBASE_DATABASE_DB_NAME=YOUR_OCEANBASE_DATABASE_DB_NAME
export OCEANBASE_DATABASE_PASSWORD=YOUR_OCEANBASE_DATABASE_PASSWORD
export JINAAI_API_KEY=YOUR_JINAAI_API_KEY

Example code snippets

Get embeddings from Jina AI

Jina AI offers several embedding models. You can choose the one that best fits your needs.

Model	Parameter size	Embedding dimension	Text
jina-embeddings-v3	570M	flexible embedding size (Default: 1024)	multilingual text embeddings; supports 94 language in total
jina-embeddings-v2-small-en	33M	512	English monolingual embeddings
jina-embeddings-v2-base-en	137M	768	English monolingual embeddings
jina-embeddings-v2-base-zh	161M	768	Chinese-English Bilingual embeddings
jina-embeddings-v2-base-de	161M	768	German-English Bilingual embeddings
jina-embeddings-v2-base-code	161M	768	English and programming languages

Here is an example using jina-embeddings-v3. The following helper function, generate_embeddings, calls the Jina AI embedding API:

import os
import requests
from sqlalchemy import Column, Integer, String
from pyobvector import ObVecClient, VECTOR, IndexParam, cosine_distance

JINAAI_API_KEY = os.getenv('JINAAI_API_KEY')

# Step 1. Text data vectorization
def generate_embeddings(text: str):
    JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings'
    JINAAI_HEADERS = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {JINAAI_API_KEY}'
    }
    JINAAI_REQUEST_DATA = {
        'input': [text],
        'model': 'jina-embeddings-v3'
    }
    
    response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA)
    response_json = response.json()
    return response_json['data'][0]['embedding']
    

TEXTS = [
    'Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.',
    'OceanBase Database is an enterprise-level, native distributed database independently developed by the OceanBase team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.',
    'OceanBase is a native distributed relational database that supports HTAP hybrid transaction analysis and processing. It features enterprise-level characteristics such as high availability, transparent scalability, and multi-tenancy, and is compatible with MySQL/Oracle protocols.'
]
data = []
for text in TEXTS:
    # Generate the embedding for the text via Jina AI API.
    embedding = generate_embeddings(text)
    data.append({
        'content': text,
        'content_vec': embedding
    })

print(f"Successfully processed {len(data)} texts")

Define the vector table structure and store vectors in seekdb

Create a table called jinaai_oceanbase_demo_documents with columns for the text (content), the embedding vector (content_vec), and vector index information. Then insert the vector data into seekdb:

# Step 2. Connect seekdb Serverless
OCEANBASE_DATABASE_URL = os.getenv('OCEANBASE_DATABASE_URL')
OCEANBASE_DATABASE_USER = os.getenv('OCEANBASE_DATABASE_USER')
OCEANBASE_DATABASE_DB_NAME = os.getenv('OCEANBASE_DATABASE_DB_NAME')
OCEANBASE_DATABASE_PASSWORD = os.getenv('OCEANBASE_DATABASE_PASSWORD')

client = ObVecClient(uri=OCEANBASE_DATABASE_URL, user=OCEANBASE_DATABASE_USER,password=OCEANBASE_DATABASE_PASSWORD,db_name=OCEANBASE_DATABASE_DB_NAME)
# Step 3. Create the vector table.
table_name = "jinaai_oceanbase_demo_documents"
client.drop_table_if_exist(table_name)

cols = [
    Column("id", Integer, primary_key=True, autoincrement=True),
    Column("content", String(500), nullable=False),
    Column("content_vec", VECTOR(1024))
]

# Create vector index
vector_index_params = IndexParam(
    index_name="idx_content_vec",
    field_name="content_vec",  
    index_type="HNSW",
    distance_metric="cosine"
)

client.create_table_with_index_params(
    table_name=table_name,
    columns=cols, 
    vidxs=[vector_index_params]
)

print('- Inserting Data to OceanBase...')
client.insert(table_name, data=data)

Semantic search

Use the Jina AI embedding API to generate an embedding for your query text. Then, search for the most relevant document by calculating the cosine distance between the query embedding and each embedding in the vector table:

# Step 4. Query the most relevant document based on the query.
query = 'What is OceanBase?'
# Generate the embedding for the query via Jina AI API.
query_embedding = generate_embeddings(query)

res = client.ann_search(
    table_name,
    vec_data=query_embedding,
    vec_column_name="content_vec",
    distance_func=cosine_distance,  # Use cosine distance function
    with_dist=True,
    topk=1,
    output_column_names=["id", "content"],
)

print('- The Most Relevant Document and Its Distance to the Query:')
for row in res.fetchall():
    print(f'  - ID: {row[0]}\n'
          f'    content: {row[1]}\n'
          f'    distance: {row[2]}')

Expected result

  - ID: 2
    content: OceanBase Database is an enterprise-level, native distributed database independently developed by the OceanBase team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
    distance: 0.14733879001870276

Prerequisites​

Step 1: Obtain the database connection information​

Step 2: Build your AI assistant​

Set your Jina AI API key as an environment variable​

Example code snippets​

Get embeddings from Jina AI​

Define the vector table structure and store vectors in seekdb​

Semantic search​

Expected result​

Contents