Skip to main content

Experience hybrid vector index in seekdb

This tutorial guides you through getting started with seekdb's hybrid vector index, helping you understand the practical applications of hybrid vector indexes and experience the powerful features of hybrid vector indexes. You can achieve semantic retrieval by directly storing text without manually converting to vectors.

Overview

Hybrid vector index refers to a vector index that can automatically convert text to vectors and build indexes. It is a powerful feature provided by seekdb that makes the vector concept transparent to users. Compared to vector indexes that do not use hybrid vector indexes, hybrid vector indexes greatly simplify the usage process.

  • Vector index process without hybrid vector index:
    Text → Manually call `AI_EMBED` function to generate vectors → Insert vectors → Use vector retrieval
  • Hybrid vector index process:
    Text → Direct insertion → Direct text retrieval

seekdb automatically converts text to vectors and builds indexes internally. During retrieval, you only need to provide the original text, and the system automatically performs embedding and retrieves the vector index, significantly improving ease of use.

Prerequisites

  • Contact the administrator to obtain the corresponding database connection string, then execute the following command to connect to the database:
    # host: seekdb database connection IP.
    # port: seekdb database connection port.
    # database_name: Name of the database to access.
    # user_name: Database username.
    # password: Database password.
    obclient -h$host -P$port -u$user_name -p$password -D$database_name
  • Ensure that you have the relevant permissions for AI function service, and ensure that an embedding model has been registered in the database using the CREATE_AI_MODEL and CREATE_AI_MODEL_ENDPOINT procedures:
    CALL DBMS_AI_SERVICE.DROP_AI_MODEL ('ob_embed');
    CALL DBMS_AI_SERVICE.DROP_AI_MODEL_ENDPOINT ('ob_embed_endpoint');

    CALL DBMS_AI_SERVICE.CREATE_AI_MODEL(
    'ob_embed', '{
    "type": "dense_embedding",
    "model_name": "BAAI/bge-m3"
    }');

    CALL DBMS_AI_SERVICE.CREATE_AI_MODEL_ENDPOINT (
    'ob_embed_endpoint', '{
    "ai_model_name": "ob_embed",
    "url": "https://api.siliconflow.cn/v1/embeddings",
    -- Replace with actual access_key
    "access_key": "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "provider": "siliconflow"
    }');
info

The hybrid vector index feature currently only supports HNSW/HNSW_BQ index types.

Step 1: Create a hybrid vector index

Hybrid vector indexes support two methods: create during table creation and create after table creation.

info

When creating a hybrid vector index, you must specify it on a VARCHAR column and specify the embedding model and vector dimension.

Create during table creation

CREATE TABLE items (
id INT PRIMARY KEY,
doc VARCHAR(100),
VECTOR INDEX vector_idx(doc)
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=immediate)
);

Create after table creation

CREATE TABLE items1 (
id INT PRIMARY KEY,
doc VARCHAR(100)
);

CREATE VECTOR INDEX vector_idx
ON items (doc)
WITH (distance=l2, lib=vsag, type=hnsw, model=ob_embed, dim=1024, sync_mode=immediate);

Step 2: Insert text data (no manual vectorization required)

When inserting text data, the system automatically performs embedding without manually calling the AI_EMBED function:

INSERT INTO items(id, doc) VALUES(1, 'Rose');
INSERT INTO items(id, doc) VALUES(2, 'Sunflower');
INSERT INTO items(id, doc) VALUES(3, 'Lily');

Step 3: Use text for direct retrieval

Use the semantic_distance function, pass in the original text for vector retrieval, without manually generating query vectors:

SELECT id, doc FROM items
ORDER BY semantic_distance(doc, 'flower')
APPROXIMATE LIMIT 3;

The following result is returned:

+----+-----------+
| id | doc |
+----+-----------+
| 1 | Rose |
| 2 | Sunflower |
| 3 | Lily |
+----+-----------+
3 rows in set

The system automatically converts the query text 'flower' to a vector and then retrieves the most similar text in the vector index.

Advanced: Use vector retrieval

If you already have vector representations of the retrieval content (for example, pre-generated through the AI_EMBED function), you can also directly use these vectors to retrieve hybrid vector indexes, avoiding repeated embedding operations for each retrieval:

-- First get the query vector
SET @query_vector = AI_EMBED("ob_embed", "flower");

-- Use vectors for index retrieval
SELECT id, doc FROM items
ORDER BY semantic_vector_distance(doc, @query_vector)
APPROXIMATE LIMIT 3;

The following result is returned:

+----+-----------+
| id | doc |
+----+-----------+
| 1 | Rose |
| 2 | Sunflower |
| 3 | Lily |
+----+-----------+
3 rows in set

Summary

Through this tutorial, you have mastered the core features of seekdb's hybrid vector index:

  • Simplified usage process: Achieve semantic retrieval by directly storing text without manually converting to vectors.
  • Automatic embedding: The system automatically converts text to vectors and builds indexes. During retrieval, you only need to provide the original text.
  • Performance optimization: Supports direct vector retrieval to avoid repeated embedding operations.

The hybrid vector index feature greatly simplifies the usage process of vector retrieval and is an ideal choice for building intelligent search applications.

What's next

More information

For more guides on experiencing seekdb's AI Native features and building AI applications based on seekdb, see:

In addition to using SQL for operations, you can also use the Python SDK (pyseekdb) provided by seekdb. For usage instructions, see Experience embedded seekdb using Python SDK and pyseekdb overview.