Qwen

Qwen is a large language model developed by Alibaba Cloud. It can understand and analyze user input. You can use the API service of Qwen in the Model Experience Center of Alibaba Cloud.

seekdb provides vector type storage, vector indexes, and embedding vector search capabilities. You can use the API interface of Qwen to store vectorized data in seekdb and then use seekdb's vector search capabilities to query relevant data.

Prerequisites

You have deployed seekdb.
Your environment contains a MySQL database and account, and the database account has read and write permissions.
You have installed python 3.9 or later and the corresponding pip.
You have installed poetry, seekdb, and the DashScope SDK.
```
python3 -m pip install pyseekdb dashscope pandas cffi
```
You have prepared the Qwen API key.

Step 1: Obtain the seekdb connection string

Contact the seekdb deployment personnel or administrator to obtain the corresponding database connection string, for example:

mysql -h$host -P$port -u$user_name -p$password -D$database_name

Parameters:

$host: the IP address for connecting to seekdb.
$port: the port for connecting to seekdb, which is 2881 by default.
$database_name: the name of the database to be accessed.

tip
The user for the connection must have the CREATE, INSERT, DROP, and SELECT permissions on the database.
$user_name: the database connection account.
$password: the account password.

Step 2: Configure the OpenAI API key and seekdb connection information in the environment variables

For Unix-based systems (such as Ubuntu or macOS), you can run the following command in the terminal:

export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"
export SEEKDB_DATABASE_HOST=SEEKDB_DATABASE_HOST
export SEEKDB_DATABASE_PORT=SEEKDB_DATABASE_PORT
export SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
export SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
export SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD

For Windows, you can use the following command in the command prompt:

set DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY
set SEEKDB_DATABASE_HOST=SEEKDB_DATABASE_HOST
set SEEKDB_DATABASE_PORT=SEEKDB_DATABASE_PORT
set SEEKDB_DATABASE_USER=YOUR_SEEKDB_DATABASE_USER
set SEEKDB_DATABASE_DB_NAME=YOUR_SEEKDB_DATABASE_DB_NAME
set SEEKDB_DATABASE_PASSWORD=YOUR_SEEKDB_DATABASE_PASSWORD

Step 3: Store the vector data in seekdb

Prepare the test data Download the CSV file containing 1,000 rows of fine food reviews. The last column in the CSV file contains the vector values, so you do not need to compute the vectors. You can also use the following code to recompute the embedding column (vector column) and generate a new CSV file.

import dashscope
import pandas as pd
from dashscope import TextEmbedding

input_datapath = "./fine_food_reviews.csv"
# Here, the text_embedding_v1 embedding model is used. You can adjust it as needed.
def generate_embeddings(text):
    rsp = dashscope.TextEmbedding.call(model=TextEmbedding.Models.text_embedding_v1, input=text)
    embeddings = [record['embedding'] for record in rsp.output['embeddings']]
    return embeddings if isinstance(text, list) else embeddings[0]

df = pd.read_csv(input_datapath, index_col=0)
# The actual generation process will take a few minutes. The embeddings are generated by calling the Qwen Embedding API row by row.
df["embedding"] = df.combined.apply(generate_embeddings)
output_datapath = './fine_food_reviews_self_embeddings.csv'
df.to_csv(output_datapath)

Run the following script to insert the test data into seekdb. Make sure the script is in the same directory as the test data.

import os,csv,json
import pyseekdb
from pyseekdb import HNSWConfiguration

ids = []
embeddings = []
documents = []
metadatas = []
file_name = "fine_food_reviews_self_embeddings.csv"
file_path = os.path.join("./", file_name)
# Open and read the CSV file.
with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
    csvreader = csv.reader(csvfile)
    headers = next(csvreader)
    print("Headers:", headers)
    for i, row in enumerate(csvreader):
        if not row or len(row) < 9:
            print(f"Skipping row {i+2}: incomplete data")
            continue

        ids.append(row[0])
        embeddings.append(json.loads(row[8]))
        documents.append(row[6])
        metadata = {
            "product_id": str(row[1]),
            "user_id": str(row[2]),
            "score": str(row[3]),
            "summary": str(row[4]),
            "n_tokens": str(row[7])
        }
        metadatas.append(metadata)

# Connect to seekdb by using pyseekdb
client = pyseekdb.Client(
    host=os.getenv('SEEKDB_DATABASE_HOST'),
    port=int(os.getenv('SEEKDB_DATABASE_PORT', 2881)),
    database=os.getenv('SEEKDB_DATABASE_DB_NAME'),
    user=os.getenv('SEEKDB_DATABASE_USER'),
    password=os.getenv('SEEKDB_DATABASE_PASSWORD')
)

table_name = 'fine_food_reviews'
config = HNSWConfiguration(dimension=1536, distance='cosine')
collection = client.create_collection(
    name=table_name,
    configuration=config,
    embedding_function=None
)

# Insert 10 rows each time.
batch_size = 100
total_records = len(ids)

for i in range(0, total_records, batch_size):
    end_idx = min(i + batch_size, total_records)
    batch_ids = ids[i:end_idx]
    batch_embeddings = embeddings[i:end_idx]
    batch_documents = documents[i:end_idx]
    batch_metadatas = metadatas[i:end_idx]

    try:
        collection.add(
            ids=batch_ids,
            embeddings=batch_embeddings,
            documents=batch_documents,
            metadatas=batch_metadatas
        )
        print(f"Batch {i//batch_size + 1} inserted successfully!")
    except Exception as e:
        print(f"Batch {i//batch_size + 1} insertion failed: {e}")
        break

print("All data insertion completed!")

Step 4: Query the seekdb database

Save the following Python script as query.py.

import os,csv,json,sys
import pyseekdb
import dashscope
from pyseekdb import HNSWConfiguration
from dashscope import TextEmbedding

# Obtain command-line options.
if len(sys.argv) != 2:
    print("Enter a query statement." )
    sys.exit()
queryStatement = sys.argv[1]

# Connect to seekdb by using pyseekdb
client = pyseekdb.Client(
    host=os.getenv('SEEKDB_DATABASE_HOST'), 
    port=int(os.getenv('SEEKDB_DATABASE_PORT', 2881)), 
    database=os.getenv('SEEKDB_DATABASE_DB_NAME'), 
    user=os.getenv('SEEKDB_DATABASE_USER'), 
    password=os.getenv('SEEKDB_DATABASE_PASSWORD')
)

# Define the function for generating text vectors.
def generate_embeddings(text):
    rsp = dashscope.TextEmbedding.call(model=TextEmbedding.Models.text_embedding_v1, input=text)
    embeddings = [record['embedding'] for record in rsp.output['embeddings']]
    return embeddings if isinstance(text, list) else embeddings[0]

def query_ob(query, tableName, top_k=1):
    query_embedding = generate_embeddings(query)
    collection = client.get_collection(name=tableName)
    res = collection.query(
        query_embeddings=query_embedding,
        n_results=top_k
    )
    print('- The Most Relevant Document and Its Distance to the Query:')
    for i, (doc_id, document, distance) in enumerate(zip(
        res['ids'][0], 
        res['documents'][0], 
        res['distances'][0]
    )):
        print(f'  - ID: {doc_id}')
        print(f'    content: {document}')
        print(f'    distance: {distance:.6f}')
        
# Specify the table name.
table_name = 'fine_food_reviews'
query_ob(queryStatement,table_name,1)

Enter a question and get the relevant answer.

python3 query.py 'pet food'

The expected result is as follows:

- The Most Relevant Document and Its Distance to the Query:
  - ID: 444
    content: Title: Healthy Dog Food; Content: This is a very healthy dog food. Good for their digestion. Also good for small puppies. My dog eats her required amount at every feeding.
    distance: 0.509108

Prerequisites​

Step 1: Obtain the seekdb connection string​

Step 2: Configure the OpenAI API key and seekdb connection information in the environment variables​

Step 3: Store the vector data in seekdb​

Step 4: Query the seekdb database​

Contents

Prerequisites

Step 1: Obtain the seekdb connection string

Step 2: Configure the OpenAI API key and seekdb connection information in the environment variables

Step 3: Store the vector data in seekdb

Step 4: Query the seekdb database