Skip to main content

seekdb Vector with Jina AI

seekdb provides vector type storage, vector indexing, and embedding vector search capabilities. You can store vectorized data in seekdb for subsequent searches.

Jina AI is an AI platform framework focused on multimodal and vector search. It provides core components and tools needed to build enterprise-level search-enhanced generative AI applications, helping enterprises and developers build RAG (Retrieval-Augmented Generation) applications based on multimodal search.

Prerequisites

  • You have deployed the seekdb database.

  • Your environment has a usable MySQL database and account, and the database account has read and write permissions.

  • Python 3.11 or later is installed.

  • Dependencies are installed.

    python3 -m pip install cffi pyseekdb requests 

Step 1: Obtain the database connection information

Contact the seekdb database deployment personnel or administrator to obtain the corresponding database connection string, for example:

mysql -h$host -P$port -u$user_name -p$password -D$database_name

Parameter description:

  • $host: the IP address for connecting to seekdb.

  • $port: the port for connecting to seekdb, defaulting to 2881.

  • $database_name: the name of the database to access.

    tip

    The user needs to have the CREATE, INSERT, DROP, and SELECT permissions for the database.

  • $user_name: the database connection account.

  • $password: the account password.

Step 2: Build your AI assistant

Set the Jina AI API key environment variable

Obtain the Jina AI API key and configure it along with the seekdb connection information in the environment variables.

export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD
export JINAAI_API_KEY=YOUR_JINAAI_API_KEY

Sample code snippet

Obtain Jina AI embeddings

Jina AI provides various embedding models, and users can choose the corresponding model based on their needs.

ModelParameter SizeEmbedding DimensionText
jina-embeddings-v3570Mflexible embedding size (Default: 1024)multilingual text embeddings; supports 94 languages in total
jina-embeddings-v2-small-en33M512English monolingual embeddings
jina-embeddings-v2-base-en137M768English monolingual embeddings
jina-embeddings-v2-base-zh161M768Chinese-English Bilingual embeddings
jina-embeddings-v2-base-de161M768German-English Bilingual embeddings
jina-embeddings-v2-base-code161M768English and programming languages

Here's an example using jina-embeddings-v3. Define a generate_embeddings helper function to call the Jina AI embedding API:

import os
import requests
import pyseekdb
from pyseekdb import HNSWConfiguration

JINAAI_API_KEY = os.getenv('JINAAI_API_KEY')

# Step 1. Text data vectorization
def generate_embeddings(text: str):
JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings'
JINAAI_HEADERS = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {JINAAI_API_KEY}'
}
JINAAI_REQUEST_DATA = {
'input': [text],
'model': 'jina-embeddings-v3'
}

response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA)
response_json = response.json()
return response_json['data'][0]['embedding']


TEXTS = [
'Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.',
'seekdb Database is an enterprise-level, native distributed database independently developed by the seekdb team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.',
'seekdb is a native distributed relational database that supports HTAP hybrid transaction analysis and processing. It features enterprise-level characteristics such as high availability, transparent scalability, and multi-tenancy, and is compatible with MySQL/Oracle protocols.'
]
ids = []
embeddings = []
documents = []

for i, text in enumerate(TEXTS):
# Generate the embedding for the text via Jina AI API.
embedding = generate_embeddings(text)
ids.append(f"item{i+1}")
embeddings.append(embedding)
documents.append(text)

Create a table and store data in seekdb

Create a table named jinaai_seekdb_demo_documents and store the vector data in seekdb:

# Step 2. Connect seekdb Serverless
SEEKDB_DATABASE_HOST = os.getenv('SEEKDB_DATABASE_HOST')
SEEKDB_DATABASE_PORT = int(os.getenv('SEEKDB_DATABASE_PORT', 2881))
SEEKDB_DATABASE_USER = os.getenv('SEEKDB_DATABASE_USER')
SEEKDB_DATABASE_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
SEEKDB_DATABASE_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')

client = pyseekdb.Client(host=SEEKDB_DATABASE_HOST, port=SEEKDB_DATABASE_PORT, database=SEEKDB_DATABASE_DB_NAME, user=SEEKDB_DATABASE_USER, password=SEEKDB_DATABASE_PASSWORD)
# Step 3. Create the vector table.
table_name = "jinaai_seekdb_demo_documents"
config = HNSWConfiguration(dimension=1024, distance='cosine')
collection = client.create_collection(
name=table_name,
configuration=config,
embedding_function=None
)

print('- Inserting Data to seekdb...')
collection.add(
ids=ids,
embeddings=embeddings,
documents=documents
)

Generate the query text vector using the Jina AI API, then search for the most relevant documents based on the cosine distance between the query vector and each vector in the vector table:

# Step 4. Query the most relevant document based on the query.
query = 'What is seekdb?'
# Generate the embedding for the query via Jina AI API.
query_embedding = generate_embeddings(query)

res = collection.query(
query_embeddings=query_embedding,
n_results=1
)

print('- The Most Relevant Document and Its Distance to the Query:')
for i, (doc_id, document, distance) in enumerate(zip(
res['ids'][0],
res['documents'][0],
res['distances'][0]
)):
print(f' - ID: {doc_id}')
print(f' content: {document}')
print(f' distance: {distance:.6f}')

Expected results

- ID: item2
content: seekdb Database is an enterprise-level, native distributed database independently developed by the seekdb team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
distance: 0.158139