Integrate seekdb vector search with LlamaIndex

seekdb supports vector data storage, vector indexing, and embedding-based vector search. You can store vectorized data in seekdb for further search.

LlamaIndex is a framework for building context-augmented generative AI applications by using large language models (LLMs), including proxies and workflows. It provides a wealth of capabilities such as data connectors, data indexes, proxies, observability/assessment integration, and workflows.

This topic demonstrates how to integrate the vector search feature of seekdb with the Tongyi Qianwen (Qwen) API and LlamaIndex for Document Question Answering (DQA).

Prerequisites

You have deployed the seekdb database.
- Your environment has a database and account with read and write privileges.
- You can set the ob_vector_memory_limit_percentage parameter to enable vector search. We recommend keeping the default value of 0 (adaptive mode). For more precise configuration settings, see the relevant configuration documentation.
You have installed Python 3.9 or later.

You have installed the required dependencies:

python3 -m pip install llama-index-vector-stores-oceanbase llama-index
python3 -m pip install llama-index-embeddings-dashscope
python3 -m pip install llama-index-llms-dashscope

You have obtained the Qwen API key.

Step 1: Obtain the database connection information

Contact the seekdb database deployment personnel or administrator to obtain the database connection string. For example:

obclient -h$host -P$port -u$user_name -p$password -D$database_name

Parameters:

$host: The IP address for connecting to the seekdb database.
$port: The port for connecting to the seekdb database. The default value is 2881, which can be customized during deployment.
$database_name: The name of the database to access.

Notice
The user connecting to the database must have the CREATE, INSERT, DROP, and SELECT privileges on the database.
$user_name: The database account, in the format of username.
$password: The password for the account.

For more information about the connection string, see Connect to OceanBase Database by using OBClient.

Step 2: Build your AI assistant

Set the environment variable for the Qwen API key

Create a Qwen API key and configure it in the environment variables.

export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"

Download the sample data

mkdir -p '/root/llamaindex/paul_graham/'
wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O '/root/llamaindex/paul_graham/paul_graham_essay.txt'

Load the data text

import os
from pyobvector import ObVecClient
from llama_index.core import Settings
from llama_index.embeddings.dashscope import DashScopeEmbedding
from llama_index.core import (
SimpleDirectoryReader,
load_index_from_storage,
VectorStoreIndex,
StorageContext,
)
from llama_index.vector_stores.oceanbase import OceanBaseVectorStore
from llama_index.llms.dashscope import DashScope, DashScopeGenerationModels
#set ob client
client = ObVecClient(uri="127.0.0.1:2881", user="root@test",password="",db_name="test")
# Global Settings
Settings.embed_model = DashScopeEmbedding()
# config llm model
dashscope_llm = DashScope(
    model_name=DashScopeGenerationModels.QWEN_MAX,
    api_key=os.environ.get("DASHSCOPE_API_KEY", ""),
)
# load documents
documents = SimpleDirectoryReader("/root/llamaindex/paul_graham/").load_data()
oceanbase = OceanBaseVectorStore(
    client=client,
    dim=1536,
    drop_old=True,
    normalize=True,
)

storage_context = StorageContext.from_defaults(vector_store=oceanbase)
index = VectorStoreIndex.from_documents(
    documents, storage_context=storage_context
)

Vector search

This step shows how to query "What did the author do growing up?" from the document paul_graham_essay.txt.

# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine(llm=dashscope_llm)
res = query_engine.query("What did the author do growing up?")
res.response

Expected result:

'Growing up, the author worked on writing and programming outside of school. In terms of writing, he wrote short stories, which he now considers to be awful, as they had very little plot and focused mainly on characters with strong feelings. For programming, he started in 9th grade by trying to write programs on an IBM 1401 at his school, using an early version of Fortran. Later, after getting a TRS-80 microcomputer, he began to write more practical programs, including simple games, a program to predict the flight height of model rockets, and a word processor that his father used for writing.'

Prerequisites​

Step 1: Obtain the database connection information​

Notice

Step 2: Build your AI assistant​

Set the environment variable for the Qwen API key​

Download the sample data​

Load the data text​

Vector search​

Contents