Skip to main content

seekdb Vector integration with Ollama

seekdb provides capabilities for storing vector data, indexing vectors, and performing embedding-based vector searches. You can store vectorized data in seekdb for use in subsequent searches.

Ollama is an open-source tool that allows you to easily run and manage large language models (LLMs) such as gpt-oss, Gemma 3, DeepSeek-R1, and Qwen3 locally.

Prerequisites

  • You have deployed the seekdb.

  • Your environment has a usable database, and account, and the database account has read and write permissions.

  • Python 3.11 or later is installed.

  • Dependencies are installed.

    python3 -m pip install cffi pyseekdb requests ollama

Step 1: Obtain database connection information

Contact the seekdb deployment team or administrator to obtain the database connection string, for example:

mysql -h$host -P$port -u$user_name -p$password -D$database_name

Parameter description:

  • $host: the IP address for connecting to seekdb.

  • $port: the port for connecting to seekdb, defaulting to 2881.

  • $database_name: the name of the database to access.

    tip

    The user needs to have the CREATE, INSERT, DROP, and SELECT privileges for the database.

  • $user_name: the database connection account.

  • $password: the account password.

Step 2: Build your AI assistant

Set environment variables

Obtain the seekdb connection information and configure it in the environment variables.

export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD

Install Ollama and pull the Qwen3-embedding model

curl -fsSL https://ollama.ai/install.sh | sh
export PATH="/usr/local/bin:$PATH"
ollama --version
ollama pull qwen3-embedding:latest

Example code snippet

Load data

Here's an example using the Qwen3-embedding model. We'll use the vector search common questions from seekdb Vector as private knowledge, with each line in the file separated by a "#".

import requests,os,json,ollama,pyseekdb
from tqdm import tqdm
from pyseekdb import HNSWConfiguration

# text URL
url = "https://raw.githubusercontent.com/oceanbase/oceanbase-doc/refs/heads/V4.3.5/en-US/640.ob-vector-search/800.ob-vector-search-faq.md"
response = requests.get(url)
text_lines = []
file_text = response.text
text_lines += file_text.split("# ")

def emb_text(text):
response = ollama.embeddings(model="qwen3-embedding", prompt=text)
return response["embedding"]


ids = []
embeddings = []
documents = []

for i, text in enumerate(tqdm(text_lines, desc="Creating embeddings")):
# Generate the embedding for the text via Ollama API
embedding = emb_text(text)
ids.append(f"{i+1}")
embeddings.append(embedding)
documents.append(text)

print(f"Successfully processed {len(documents)} texts")

Define a table and store data in seekdb

Create a table named ollama_seekdb_demo_documents, generate an embedding vector for each text segment using Qwen3-embedding, and store it in seekdb:

SEEKDB_DATABASE_HOST = os.getenv('SEEKDB_DATABASE_HOST')
SEEKDB_DATABASE_PORT = int(os.getenv('SEEKDB_DATABASE_PORT', 2881))
SEEKDB_DATABASE_USER = os.getenv('SEEKDB_DATABASE_USER')
SEEKDB_DATABASE_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
SEEKDB_DATABASE_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')

client = pyseekdb.Client(host=SEEKDB_DATABASE_HOST, port=SEEKDB_DATABASE_PORT, database=SEEKDB_DATABASE_DB_NAME, user=SEEKDB_DATABASE_USER, password=SEEKDB_DATABASE_PASSWORD)
table_name = "ollama_seekdb_demo_documents"
config = HNSWConfiguration(dimension=4096, distance='l2')
collection = client.create_collection(
name=table_name,
configuration=config,
embedding_function=None
)

print('- Inserting Data to seekdb...')
collection.add(
ids=ids,
embeddings=embeddings,
documents=documents
)

Generate an embedding vector for the query text using the Qwen3-embedding model, then search for the most relevant documents based on the L2 distance between the query vector and each vector in the vector table:

query = 'Which mode supports vector search?'
query_embedding = emb_text(query)

res = collection.query(
query_embeddings=query_embedding,
n_results=1
)

print('- The Most Relevant Document and Its Distance to the Query:')
for i, (doc_id, document, distance) in enumerate(zip(
res['ids'][0],
res['documents'][0],
res['distances'][0]
)):
print(f'- ID: {doc_id}')
print(f' content: {document}')
print(f' distance: {distance:.6f}')

Expected result

- ID: 3
content: In which mode is vector search supported?

Vector search is supported in the MySQL-compatible mode of OceanBase Database. It is not supported in the Oracle-compatible mode.

distance: 0.547024