Integrate seekdb Vector with Cloudflare Workers AI
seekdb provides capabilities for storing vector data, building vector indexes, and performing embedding-based searches. You can store vectorized data in seekdb for subsequent searches.
Cloudflare Workers AI is a service provided by Cloudflare that allows developers to run machine learning models on its global network. Developers can easily integrate AI functionality into their applications using REST APIs.
Prerequisites
-
You have deployed the seekdb.
-
Your environment has a usable database, and account, and the database account has read and write permissions.
-
Python 3.11 or later is installed.
-
Dependencies are installed.
python3 -m pip install cffi pyseekdb requests
Step 1: Obtain Database Connection Information
Contact the seekdb deployment personnel or administrator to obtain the database connection string, for example:
mysql -h$host -P$port -u$user_name -p$password -D$database_name
Parameter Description:
-
$host: The IP address for connecting to seekdb. -
$port: The port for connecting to seekdb, defaulting to 2881. -
$database_name: The name of the database to access.tipThe user needs to have
CREATE,INSERT,DROP, andSELECTpermissions on the database. -
$user_name: The database connection account. -
$password: The account password.
Step 2: Build Your AI Assistant
Set the Cloudflare API Key Environment Variable
Obtain the Cloudflare API key and configure the environment variables with the seekdb connection information.
export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD
export CLOUDFLARE_API_KEY=YOUR_CLOUDFLARE_API_KEY
export account_id=you_account_id
Example Code Snippet
Here's an example using the bge-base-en-v1.5 model to generate vector data with the Cloudflare Workers AI Embedding API:
import requests, os, httpx, pyseekdb
from tqdm import tqdm
from pyseekdb import HNSWConfiguration
documents = [
"Machine learning is the core technology of artificial intelligence",
"Python is the preferred programming language for data science",
"Cloud computing provides elastic and scalable computing resources",
"Blockchain technology ensures data security and transparency",
"Natural language processing helps computers understand human language"
]
BASE_URL = "https://api.cloudflare.com/client/v4/accounts"
model_name = "@cf/baai/bge-base-en-v1.5"
account_id = os.getenv('account_id')
CLOUDFLARE_API_KEY = os.getenv('CLOUDFLARE_API_KEY')
api_url = f"{BASE_URL}/{account_id}/ai/run/{model_name}"
# Create an HTTP client
httpclient = httpx.Client()
httpclient.headers.update({
"Authorization": f"Bearer {CLOUDFLARE_API_KEY}",
"Accept-Encoding": "identity"
})
payload = {"text": documents}
response = httpclient.post(api_url, json=payload)
embedding_response = response.json()["result"]["data"]
ids = []
embeddings = []
documents_list = []
for i, text in enumerate(tqdm(documents, desc="Creating embeddings")):
# Use the pre-computed embedding from the response
embedding = embedding_response[i]
ids.append(f"{i+1}")
embeddings.append(embedding)
documents_list.append(text)
print(f"Successfully processed {len(documents_list)} texts")
Define the Table Structure and Store Data in seekdb
Create a table named cloudflare_seekdb_demo_documents and store the data in seekdb:
SEEKDB_DATABASE_HOST = os.getenv('SEEKDB_DATABASE_HOST')
SEEKDB_DATABASE_PORT = int(os.getenv('SEEKDB_DATABASE_PORT', 2881))
SEEKDB_DATABASE_USER = os.getenv('SEEKDB_DATABASE_USER')
SEEKDB_DATABASE_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
SEEKDB_DATABASE_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')
client = pyseekdb.Client(host=SEEKDB_DATABASE_HOST, port=SEEKDB_DATABASE_PORT, database=SEEKDB_DATABASE_DB_NAME, user=SEEKDB_DATABASE_USER, password=SEEKDB_DATABASE_PASSWORD)
table_name = "cloudflare_oceanbase_demo_documents"
config = HNSWConfiguration(dimension=768, distance='cosine')
collection = client.create_collection(
name=table_name,
configuration=config,
embedding_function=None
)
print('- Inserting Data to seekdb...')
collection.add(
ids=ids,
embeddings=embeddings,
documents=documents
)
Semantic Search
Generate a vector for the query text using the Cloudflare Workers AI API, then search for the most relevant documents based on the cosine distance between the query vector and each vector in the vector table:
query = 'Programming languages for data analysis'
payload = {"text": query}
response = httpclient.post(api_url, json=payload)
query_embedding = response.json()["result"]["data"]
res = collection.query(
query_embeddings=query_embedding,
n_results=1
)
print('- The Most Relevant Document and Its Distance to the Query:')
for i, (doc_id, document, distance) in enumerate(zip(
res['ids'][0],
res['documents'][0],
res['distances'][0]
)):
print(f'- ID: {doc_id}')
print(f' content: {document}')
print(f' distance: {distance:.6f}')
Expected Result
- ID: 2
content: Python is the preferred programming language for data science
distance: 0.139745337621493