Skip to main content

Integrate seekdb with CamelAI

seekdb provides vector type storage, vector indexing, and embedding vector search capabilities. You can store the vectorized data in seekdb for use in subsequent searches.

CamelAI revolutionizes the way teams interact with data by allowing natural language queries to generate precise SQL queries, intelligent analysis, and visualizations.

Prerequisites

  • You have deployed seekdb.

  • Your environment has a MySQL database and account with read and write permissions.

  • You have installed Python 3.11 or later.

  • You have installed the required dependencies.

    python3 -m pip install "unstructured[pdf]" camel-ai pyobvector

Step 1: Obtain database connection information

Contact the seekdb deployment personnel or administrator to obtain the database connection string. For example:

mysql -h$host -P$port -u$user_name -p$password -D$database_name

Parameter description:

  • $host: the IP address for connecting to seekdb.

  • $port: the port for connecting to seekdb, which defaults to 2881.

  • $database_name: the name of the database to access.

    tip

    The user must have the CREATE, INSERT, DROP, and SELECT permissions on the database.

  • $user_name: the database connection account.

  • $password: the account password.

Step 2: Build your AI assistant

Set environment variables

Obtain the Jina AI API key and configure the OceanBase connection information in the environment variables.

export SEEKDB_DATABASE_URL=YOUR_SEEKDB_DATABASE_URL
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD
export JINAAI_API_KEY=YOUR_JINAAI_API_KEY

Load data

CamelAI supports various embedding models, such as OpenAIEmbedding, VisionLanguageEmbedding, and JinaEmbedding.

import os
import requests
from camel.embeddings import JinaEmbedding
from camel.storages.vectordb_storages import (
OceanBaseStorage,
VectorDBQuery,
VectorRecord,
)
from camel.storages import OceanBaseStorage
from camel.retrievers import VectorRetriever
from camel.types import EmbeddingModelType

documents = [
"""Artificial Intelligence (AI) is a branch of computer science that aims to create systems capable of performing tasks that typically require human intelligence. AI encompasses multiple subfields including machine learning, deep learning, natural language processing, and computer vision.""",
"""Machine Learning is a subset of artificial intelligence that enables computers to learn and improve without being explicitly programmed. The main types of machine learning include supervised learning, unsupervised learning, and reinforcement learning.""",
"""Deep Learning is a branch of machine learning that uses multi-layered neural networks to simulate how the human brain works. Deep learning has achieved breakthrough progress in areas such as image recognition, speech recognition, and natural language processing.""",
"""Natural Language Processing (NLP) is a branch of artificial intelligence that focuses on enabling computers to understand, interpret, and generate human language. NLP applications include machine translation, sentiment analysis, text summarization, and chatbots.""",
"""Computer Vision is a field of artificial intelligence that aims to enable computers to identify and understand content in digital images and videos. Applications include facial recognition, object detection, medical image analysis, and autonomous vehicles.""",
"""Reinforcement Learning is a machine learning method where an agent learns how to make decisions through interaction with an environment. The agent optimizes its behavioral strategy through trial and error and reward mechanisms.""",
"""Neural Networks are computational models inspired by biological neural systems, composed of interconnected nodes (neurons). Neural networks can learn complex patterns and relationships and serve as the foundation for deep learning.""",
"""Large Language Models (LLMs) are natural language processing models based on deep learning. These models are trained on vast amounts of text data and can generate human-like text and answer questions.""",
"""Transformer architecture is a neural network architecture that has revolutionized natural language processing. It uses attention mechanisms to process sequential data and forms the basis for models like GPT and BERT.""",
"""Generative AI refers to artificial intelligence systems that can create new content, including text, images, audio, and video. Examples include ChatGPT for text generation, DALL-E for image creation, and various AI tools for creative applications."""
]

JINAAI_API_KEY = os.getenv('JINAAI_API_KEY')
embedding = JinaEmbedding(
api_key=JINAAI_API_KEY,
model_type=EmbeddingModelType.JINA_EMBEDDINGS_V3)

Connect to seekdb, define the vector table structure, and store the data in seekdb

Create a table named my_seekdb_vector_table with a fixed structure of id, embedding, and metadata. Use the Jina AI Embeddings API to generate vectors for each text segment and store them in seekdb:

OB_URI = os.getenv('SEEKDB_DATABASE_URL')
OB_USER = os.getenv('SEEKDB_DATABASE_USER')
OB_DB_NAME = os.getenv('SEEKDB_DATABASE_DB_NAME')
OB_PASSWORD = os.getenv('SEEKDB_DATABASE_PASSWORD')

# create table
ob_storage = OceanBaseStorage(
vector_dim=embedding.get_output_dim(),
table_name="my_seekdb_vector_table",
uri=OB_URI,
user=OB_USER,
password=OB_PASSWORD,
db_name=OB_DB_NAME,
distance="cosine"
)

vector_retriever = VectorRetriever(
embedding_model=embedding, storage=ob_storage
)

for i, doc in enumerate(documents):
print(f"Processing document {i+1}/{len(documents)}")
vector_retriever.process(content=doc)

Generate a vector for the query text using the Jina AI API and search for the most relevant documents based on the cosine distance between the query vector and each vector in the vector table:

retrieved_info = vector_retriever.query(query="What is generative AI?", top_k=1)
print(retrieved_info)

Expected result

[{'similarity score': '0.8538218656447916', 'content path': 'Generative AI refers to artificial intelligence systems that can create new content, including text,', 'metadata': {'piece_num': 1}, 'extra_info': {}, 'text': 'Generative AI refers to artificial intelligence systems that can create new content, including text, images, audio, and video. Examples include ChatGPT for text generation, DALL-E for image creation, and various AI tools for creative applications.'}]