Skip to main content

Integrate seekdb with Google Gemini

seekdb provides capabilities for vector type storage, vector indexing, and embedding vector search. You can store vectorized data in seekdb for subsequent searches.

Google Gemini AI is a series of multimodal large language models (LLMs) developed by Google. It is designed to understand and process various types of data, including text, code, images, audio, and video.

Prerequisites

  • You have deployed the seekdb.

  • Your environment has a usable database, and account, and the database account has read and write permissions.

  • Python 3.11 or later is installed.

  • Dependencies are installed.

    python3 -m pip install cffi pyseekdb requests google-genai

Step 1: Obtain database connection information

Contact the seekdb deployment team or administrator to obtain the database connection string, for example:

mysql -h$host -P$port -u$user_name -p$password -D$database_name

Parameter description:

  • $host: The IP address for connecting to seekdb.

  • $port: The port for connecting to seekdb, which is 2881 by default.

  • $database_name: The name of the database to be accessed.

    tip

    The user for the connection must have the CREATE, INSERT, DROP, and SELECT permissions on the database.

  • $user_name: The database connection account.

  • $password: The account password.

Step 2: Build your AI assistant

Set environment variables

Obtain the Google Gemini API and seekdb connection information and configure the environment variables.

export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD
export GEMINI_API_KEY=YOUR_GEMINI_API_KEY

Sample code snippet

Load data

Here is an example using the text-embedding-004 model to generate vector data with the Google Gemini AI API:

from google import genai
from glob import glob
from tqdm import tqdm
import os
from google.genai import types
import requests,os,json,ollama,pyseekdb
from tqdm import tqdm
from pyseekdb import HNSWConfiguration

documents = [
"Artificial intelligence was founded as an academic discipline in 1956.",
"Alan Turing was the first person to conduct substantial research in AI.",
"Born in Maida Vale, London, Turing was raised in southern England.",
]

genai_client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])

def generate_embeddings(text):
"""Generate embeddings for a single text using Gemini API"""
response = genai_client.models.embed_content(
model="text-embedding-004",
contents=[text],
config=types.EmbedContentConfig(output_dimensionality=768)
)
return response.embeddings[0].values


ids = []
embeddings = []
documents_list = []

for i, text in enumerate(tqdm(documents, desc="Creating embeddings")):
# Generate the embedding for the text via Gemini API
embedding = generate_embeddings(text)
ids.append(f"{i+1}")
embeddings.append(embedding)
documents_list.append(text)

print(f"Successfully processed {len(documents_list)} texts")

Define a table and store data in seekdb

Create a table named gemini_seekdb_demo_documents and store the data in seekdb:

SEEKDB_DATABASE_HOST = os.getenv('SEEKDB_DATABASE_HOST')
SEEKDB_DATABASE_PORT = int(os.getenv('SEEKDB_DATABASE_PORT', 2881))
SEEKDB_DATABASE_USER = os.getenv('SEEKDB_DATABASE_USER')
SEEKDB_DATABASE_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
SEEKDB_DATABASE_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')

client = pyseekdb.Client(host=SEEKDB_DATABASE_HOST, port=SEEKDB_DATABASE_PORT, database=SEEKDB_DATABASE_DB_NAME, user=SEEKDB_DATABASE_USER, password=SEEKDB_DATABASE_PASSWORD)

table_name = "gemini_seekdb_demo_documents"
config = HNSWConfiguration(dimension=768, distance='l2')
collection = client.create_collection(
name=table_name,
configuration=config,
embedding_function=None
)

print('- Inserting Data to seekdb...')
collection.add(
ids=ids,
embeddings=embeddings,
documents=documents
)

Generate a vector for the query text using the text-embedding-004 model, then search for the most relevant documents based on the L2 distance between the query vector and each vector in the vector table:

query = 'When was artificial intelligence founded?'
quest_embed = genai_client.models.embed_content(model="text-embedding-004", contents=query)
res = collection.query(
query_embeddings=quest_embed.embeddings[0].values,
n_results=1
)

print('- The Most Relevant Document and Its Distance to the Query:')
for i, (doc_id, document, distance) in enumerate(zip(
res['ids'][0],
res['documents'][0],
res['distances'][0]
)):
print(f' - ID: {doc_id}')
print(f' content: {document}')
print(f' distance: {distance:.6f}')

Expected result

  - ID: 1
content: Artificial intelligence was founded as an academic discipline in 1956.
distance: 0.6019276093082409