seekdb Vector with Jina AI
seekdb provides vector type storage, vector indexing, and embedding vector search capabilities. You can store vectorized data in seekdb for subsequent searches.
Jina AI is an AI platform framework focused on multimodal and vector search. It provides core components and tools needed to build enterprise-level search-enhanced generative AI applications, helping enterprises and developers build RAG (Retrieval-Augmented Generation) applications based on multimodal search.
Prerequisites
-
You have deployed the seekdb database.
-
Your environment has a usable MySQL database and account, and the database account has read and write permissions.
-
Python 3.11 or later is installed.
-
Dependencies are installed.
python3 -m pip install cffi pyseekdb requests
Step 1: Obtain the database connection information
Contact the seekdb database deployment personnel or administrator to obtain the corresponding database connection string, for example:
mysql -h$host -P$port -u$user_name -p$password -D$database_name
Parameter description:
-
$host: the IP address for connecting to seekdb. -
$port: the port for connecting to seekdb, defaulting to2881. -
$database_name: the name of the database to access.tipThe user needs to have the
CREATE,INSERT,DROP, andSELECTpermissions for the database. -
$user_name: the database connection account. -
$password: the account password.
Step 2: Build your AI assistant
Set the Jina AI API key environment variable
Obtain the Jina AI API key and configure it along with the seekdb connection information in the environment variables.
export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD
export JINAAI_API_KEY=YOUR_JINAAI_API_KEY
Sample code snippet
Obtain Jina AI embeddings
Jina AI provides various embedding models, and users can choose the corresponding model based on their needs.
| Model | Parameter Size | Embedding Dimension | Text |
|---|---|---|---|
| jina-embeddings-v3 | 570M | flexible embedding size (Default: 1024) | multilingual text embeddings; supports 94 languages in total |
| jina-embeddings-v2-small-en | 33M | 512 | English monolingual embeddings |
| jina-embeddings-v2-base-en | 137M | 768 | English monolingual embeddings |
| jina-embeddings-v2-base-zh | 161M | 768 | Chinese-English Bilingual embeddings |
| jina-embeddings-v2-base-de | 161M | 768 | German-English Bilingual embeddings |
| jina-embeddings-v2-base-code | 161M | 768 | English and programming languages |
Here's an example using jina-embeddings-v3. Define a generate_embeddings helper function to call the Jina AI embedding API:
import os
import requests
import pyseekdb
from pyseekdb import HNSWConfiguration
JINAAI_API_KEY = os.getenv('JINAAI_API_KEY')
# Step 1. Text data vectorization
def generate_embeddings(text: str):
JINAAI_API_URL = 'https://api.jina.ai/v1/embeddings'
JINAAI_HEADERS = {
'Content-Type': 'application/json',
'Authorization': f'Bearer {JINAAI_API_KEY}'
}
JINAAI_REQUEST_DATA = {
'input': [text],
'model': 'jina-embeddings-v3'
}
response = requests.post(JINAAI_API_URL, headers=JINAAI_HEADERS, json=JINAAI_REQUEST_DATA)
response_json = response.json()
return response_json['data'][0]['embedding']
TEXTS = [
'Jina AI offers best-in-class embeddings, reranker and prompt optimizer, enabling advanced multimodal AI.',
'seekdb Database is an enterprise-level, native distributed database independently developed by the seekdb team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.',
'seekdb is a native distributed relational database that supports HTAP hybrid transaction analysis and processing. It features enterprise-level characteristics such as high availability, transparent scalability, and multi-tenancy, and is compatible with MySQL/Oracle protocols.'
]
ids = []
embeddings = []
documents = []
for i, text in enumerate(TEXTS):
# Generate the embedding for the text via Jina AI API.
embedding = generate_embeddings(text)
ids.append(f"item{i+1}")
embeddings.append(embedding)
documents.append(text)
Create a table and store data in seekdb
Create a table named jinaai_seekdb_demo_documents and store the vector data in seekdb:
# Step 2. Connect seekdb Serverless
SEEKDB_DATABASE_HOST = os.getenv('SEEKDB_DATABASE_HOST')
SEEKDB_DATABASE_PORT = int(os.getenv('SEEKDB_DATABASE_PORT', 2881))
SEEKDB_DATABASE_USER = os.getenv('SEEKDB_DATABASE_USER')
SEEKDB_DATABASE_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
SEEKDB_DATABASE_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')
client = pyseekdb.Client(host=SEEKDB_DATABASE_HOST, port=SEEKDB_DATABASE_PORT, database=SEEKDB_DATABASE_DB_NAME, user=SEEKDB_DATABASE_USER, password=SEEKDB_DATABASE_PASSWORD)
# Step 3. Create the vector table.
table_name = "jinaai_seekdb_demo_documents"
config = HNSWConfiguration(dimension=1024, distance='cosine')
collection = client.create_collection(
name=table_name,
configuration=config,
embedding_function=None
)
print('- Inserting Data to seekdb...')
collection.add(
ids=ids,
embeddings=embeddings,
documents=documents
)
Semantic search
Generate the query text vector using the Jina AI API, then search for the most relevant documents based on the cosine distance between the query vector and each vector in the vector table:
# Step 4. Query the most relevant document based on the query.
query = 'What is seekdb?'
# Generate the embedding for the query via Jina AI API.
query_embedding = generate_embeddings(query)
res = collection.query(
query_embeddings=query_embedding,
n_results=1
)
print('- The Most Relevant Document and Its Distance to the Query:')
for i, (doc_id, document, distance) in enumerate(zip(
res['ids'][0],
res['documents'][0],
res['distances'][0]
)):
print(f' - ID: {doc_id}')
print(f' content: {document}')
print(f' distance: {distance:.6f}')
Expected results
- ID: item2
content: seekdb Database is an enterprise-level, native distributed database independently developed by the seekdb team. It is cloud-native, highly consistent, and highly compatible with Oracle and MySQL.
distance: 0.158139