Version: V1.1.0

Integrate seekdb Vector with Cloudflare Workers AI

seekdb provides capabilities for storing vector data, building vector indexes, and performing embedding-based searches. You can store vectorized data in seekdb for subsequent searches.

Cloudflare Workers AI is a service provided by Cloudflare that allows developers to run machine learning models on its global network. Developers can easily integrate AI functionality into their applications using REST APIs.

Prerequisites

You have deployed the seekdb.
Your environment has a usable database, and account, and the database account has read and write permissions.
Python 3.11 or later is installed.

Dependencies are installed.

python3 -m pip install cffi pyseekdb requests

Step 1: Obtain Database Connection Information

Contact the seekdb deployment personnel or administrator to obtain the database connection string, for example:

mysql -h$host -P$port -u$user_name -p$password -D$database_name

Parameter Description:

$host: The IP address for connecting to seekdb.
$port: The port for connecting to seekdb, defaulting to 2881.
$database_name: The name of the database to access.

tip
The user needs to have CREATE, INSERT, DROP, and SELECT permissions on the database.
$user_name: The database connection account.
$password: The account password.

Step 2: Build Your AI Assistant

Set the Cloudflare API Key Environment Variable

Obtain the Cloudflare API key and configure the environment variables with the seekdb connection information.

export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD
export CLOUDFLARE_API_KEY=YOUR_CLOUDFLARE_API_KEY
export account_id=you_account_id

Example Code Snippet

Here's an example using the bge-base-en-v1.5 model to generate vector data with the Cloudflare Workers AI Embedding API:

import requests, os, httpx, pyseekdb
from tqdm import tqdm
from pyseekdb import HNSWConfiguration

documents = [
    "Machine learning is the core technology of artificial intelligence",
    "Python is the preferred programming language for data science",
    "Cloud computing provides elastic and scalable computing resources",
    "Blockchain technology ensures data security and transparency",
    "Natural language processing helps computers understand human language"
]
BASE_URL = "https://api.cloudflare.com/client/v4/accounts"
model_name = "@cf/baai/bge-base-en-v1.5"
account_id = os.getenv('account_id')
CLOUDFLARE_API_KEY = os.getenv('CLOUDFLARE_API_KEY')
api_url = f"{BASE_URL}/{account_id}/ai/run/{model_name}"
# Create an HTTP client
httpclient = httpx.Client()
httpclient.headers.update({
    "Authorization": f"Bearer {CLOUDFLARE_API_KEY}",
    "Accept-Encoding": "identity"
})
payload = {"text": documents}
response = httpclient.post(api_url, json=payload)
embedding_response = response.json()["result"]["data"]

ids = []
embeddings = []
documents_list = []

for i, text in enumerate(tqdm(documents, desc="Creating embeddings")):
    # Use the pre-computed embedding from the response
    embedding = embedding_response[i]
    ids.append(f"{i+1}")
    embeddings.append(embedding)
    documents_list.append(text)


print(f"Successfully processed {len(documents_list)} texts")

Define the Table Structure and Store Data in seekdb

Create a table named cloudflare_seekdb_demo_documents and store the data in seekdb:

SEEKDB_DATABASE_HOST = os.getenv('SEEKDB_DATABASE_HOST')
SEEKDB_DATABASE_PORT = int(os.getenv('SEEKDB_DATABASE_PORT', 2881)) 
SEEKDB_DATABASE_USER = os.getenv('SEEKDB_DATABASE_USER')
SEEKDB_DATABASE_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
SEEKDB_DATABASE_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')

client = pyseekdb.Client(host=SEEKDB_DATABASE_HOST, port=SEEKDB_DATABASE_PORT, database=SEEKDB_DATABASE_DB_NAME, user=SEEKDB_DATABASE_USER, password=SEEKDB_DATABASE_PASSWORD)
table_name = "cloudflare_seekdb_demo_documents"
config = HNSWConfiguration(dimension=768, distance='cosine')  
collection = client.create_collection(
    name=table_name,
    configuration=config,
    embedding_function=None
)

print('- Inserting Data to seekdb...')
collection.add(
    ids=ids,
    embeddings=embeddings,
    documents=documents
)

Semantic Search

Generate a vector for the query text using the Cloudflare Workers AI API, then search for the most relevant documents based on the cosine distance between the query vector and each vector in the vector table:

query = 'Programming languages for data analysis'

payload = {"text": query}
response = httpclient.post(api_url, json=payload)
query_embedding = response.json()["result"]["data"]

res = collection.query(
    query_embeddings=query_embedding,
    n_results=1
)

print('- The Most Relevant Document and Its Distance to the Query:')
for i, (doc_id, document, distance) in enumerate(zip(
    res['ids'][0], 
    res['documents'][0], 
    res['distances'][0]
)):
    print(f'- ID: {doc_id}')
    print(f'    content: {document}')
    print(f'    distance: {distance:.6f}')

Expected Result

  - ID: 2
    content: Python is the preferred programming language for data science
    distance: 0.139745337621493

Prerequisites​

Step 1: Obtain Database Connection Information​

Step 2: Build Your AI Assistant​

Set the Cloudflare API Key Environment Variable​

Example Code Snippet​

Define the Table Structure and Store Data in seekdb​

Semantic Search​

Expected Result​

Contents