Skip to main content

Qwen

Tongyi Qianwen (Qwen) is a large language model (LLM) developed by Alibaba Cloud for interpreting and analyzing user inputs. You can use the API of Qwen in the Alibaba Cloud Model Studio.

seekdb offers features such as vector storage, vector indexing, and embedding-based vector search. By using Qwen's API, you can convert data into vectors, store these vectors in seekdb, and then take advantage of seekdb's vector search capabilities to find relevant data.

Prerequisites

  • You have deployed seekdb.

  • You have an existing MySQL database and account available in your environment, and the database account has been granted read and write privileges.

  • You have installed Python 3.9 or later and pip.

  • You have installed Poetry, Pyobvector, and DashScope SDK. The installation commands are as follows:

    pip install poetry
    pip install pyobvector
    pip install dashscope
  • You have obtained the Qwen API key.

Step 1: Obtain the connection string of seekdb

Contact the seekdb deployment engineer or administrator to obtain the connection string of seekdb, for example:

obclient -h$host -P$port -u$user_name -p$password -D$database_name

Parameters:

  • $host: The IP address for connecting to seekdb.

  • $port: The port number for connecting to seekdb. Default is 2881.

  • $database_name: The name of the database to be accessed.

    tip

    The user for connection must have the CREATE, INSERT, DROP, and SELECT privileges on the database.

  • $user_name: The database account.

  • $password: The password of the account.

Step 2: Configure the environment variable for the Qwen API key

For a Unix-based system (such as Ubuntu or MacOS), run the following command in the terminal:

export DASHSCOPE_API_KEY="YOUR_DASHSCOPE_API_KEY"

For Windows, run the following command in the command prompt:

set DASHSCOPE_API_KEY=YOUR_DASHSCOPE_API_KEY

You must replace YOUR_DASHSCOPE_API_KEY with the actual Qwen API key.

Step 3: Store the vector data in seekdb

  1. Prepare the test data. Download the CSV file that already contains the vectorized data. This CSV file includes 1,000 food review entries, and the last column contains the vector values. Therefore, you do not need to calculate the vectors yourself. If you want to recalculate the embeddings for the "embedding" column (the vector column), you can use the following code to generate a new CSV file:

     import dashscope
    import pandas as pd
    input_datapath = "./fine_food_reviews.csv"
    # Here the text_embedding_v1 model is used. You can change the model as needed.
    def generate_embeddings(text):
    rsp = dashscope.TextEmbedding.call(model=TextEmbedding.Models.text_embedding_v1, input=text)
    embeddings = [record['embedding'] for record in rsp.output['embeddings']]
    return embeddings if isinstance(text, list) else embeddings[0]
    df = pd.read_csv(input_datapath, index_col=0)
    # It takes a few minutes to generate the CSV file by calling the Tongyi Qianwen Embedding API row by row.
    df["embedding"] = df.combined.apply(generate_embeddings)
    output_datapath = './fine_food_reviews_self_embeddings.csv'
    df.to_csv(output_datapath)
  2. Execute the following script to insert the test data into seekdb. The directory where the script is located must be the same as the directory where the test data is stored.

    import os
    import sys
    import csv
    import json
    from pyobvector import *
    from sqlalchemy import Column, Integer, String
    # Use pyobvector to connect to seekdb. If @ is in the username or password, replace it with %40.
    client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
    # The test dataset is prepared in advance and has been vectorized. By default, it is placed in the same directory as the Python script. If you have vectorized it yourself, replace it with the corresponding file.
    file_name = "fine_food_reviews.csv"
    file_path = os.path.join("./", file_name)
    # Define the columns. The vectorized column is placed in the last field.
    cols = [
    Column('id', Integer, primary_key=True, autoincrement=False),
    Column('product_id', String(256), nullable=True),
    Column('user_id', String(256), nullable=True),
    Column('score', Integer, nullable=True),
    Column('summary', String(2048), nullable=True),
    Column('text', String(8192), nullable=True),
    Column('combined', String(8192), nullable=True),
    Column('n_tokens', Integer, nullable=True),
    Column('embedding', VECTOR(1536))
    ]
    # Table name
    table_name = 'fine_food_reviews'
    # If the table does not exist, create it.
    if not client.check_table_exists(table_name):
    client.create_table(table_name,columns=cols)
    # Create an index for the vector column.
    client.create_index(
    table_name=table_name,
    is_vec_index=True,
    index_name='vidx',
    column_names=['embedding'],
    vidx_params='distance=l2, type=hnsw, lib=vsag',
    )
    # Open and read the CSV file.
    with open(file_name, mode='r', newline='', encoding='utf-8') as csvfile:
    csvreader = csv.reader(csvfile)
    # Read the header row.
    headers = next(csvreader)
    print("Headers:", headers)
    batch = [] # Store data and insert it into the database every 10 rows.
    for i, row in enumerate(csvreader):
    # The CSV file has 9 fields: id, product_id, user_id, score, summary, text, combined, n_tokens, embedding.
    if not row:
    break
    food_review_line= {'id':row[0],'product_id':row[1],'user_id':row[2],'score':row[3],'summary':row[4],'text':row[5],\
    'combined':row[6],'n_tokens':row[7],'embedding':json.loads(row[8])}
    batch.append(food_review_line)
    # Insert data every 10 rows.
    if (i + 1) % 10 == 0:
    client.insert(table_name,batch)
    batch = [] # Clear the cache.
    # Insert the remaining rows (if any).
    if batch:
    client.insert(table_name,batch)
    # Check the data in the table to ensure that all data has been inserted.
    count_sql = f"select count(*) from {table_name};"
    cursor = client.perform_raw_text_sql(count_sql)
    result = cursor.fetchone()
    print(f"Total number of imported data: {result[0]}")

Step 4: Query seekdb data

  1. Save the following Python script as query.py.

    import os
    import sys
    import csv
    import json
    from pyobvector import *
    from sqlalchemy import func
    import dashscope
    # Get command-line arguments
    if len(sys.argv) != 2:
    print("Please enter a query statement.")
    sys.exit()
    queryStatement = sys.argv[1]
    # Use pyobvector to connect to seekdb. If the username or password contains @, replace it with %40.
    client = ObVecClient(uri="host:port", user="username",password="****",db_name="test")
    # Define a function to generate text vectors.
    def generate_embeddings(text):
    rsp = dashscope.TextEmbedding.call(model=TextEmbedding.Models.text_embedding_v1, input=text)
    embeddings = [record['embedding'] for record in rsp.output['embeddings']]
    return embeddings if isinstance(text, list) else embeddings[0]

    def query_ob(query, tableName, vector_name="embedding", top_k=1):
    embedding = generate_embeddings(query)
    # Execute approximate nearest neighbor search.
    res = client.ann_search(
    table_name=tableName,
    vec_data=embedding,
    vec_column_name=vector_name,
    distance_func=func.l2_distance,
    topk=top_k,
    output_column_names=['combined']
    )
    for row in res:
    print(str(row[0]).replace("Title: ", "").replace("; Content: ", ": "))
    # Table name
    table_name = 'fine_food_reviews'
    query_ob(queryStatement,table_name,'embedding',1)
  2. Enter a question and obtain the related answer.

    python3 query.py 'pet food'

    The expected result is as follows:

    This is so good!: I purchased this after my sister sent a small bag to me in a gift box. I loved it so much I wanted to find it to buy for myself and keep it around. I always look on Amazon because you can find everything here and true enough, I found this wonderful candy. It is nice to keep in your purse for when you are out and about and get a dry throat or a tickle in the back of your throat. It is also nice to have in a candy dish at home for guests to try.