Use cases and examples of AI functions
This topic describes the features of AI functions in seekdb.
AI functions integrate AI model capabilities directly into data processing within databases through SQL expressions. This greatly simplifies operations such as data extraction, analysis, summarization, and storage using large AI models, making it an important new feature in the fields of databases and data warehouses. seekdb provides AI model and endpoint management through the DBMS_AI_SERVICE package, introduces several built-in AI function expressions, and supports monitoring AI model usage through views.
Prerequisites
Before using AI functions, make sure you have the necessary privileges. For more information about the privileges, see AI function privileges.
Considerations
Hybrid search relies on the model management and embedding capabilities of the AI function service. Before deleting an AI model, check whether it is referenced by hybrid search to avoid potential issues.
AI model management
The DBMS_AI_SERVICE package provides the ability to manage AI models and endpoints. It supports the following operations:
| Operation | Description |
|---|---|
| CREATE_AI_MODEL | Creates an AI model object. |
| DROP_AI_MODEL | Drops an AI model object. |
| CREATE_AI_MODEL_ENDPOINT | Creates an AI model endpoint object. |
| ALTER_AI_MODEL_ENDPOINT | Modifies an AI model endpoint object. |
| DROP_AI_MODEL_ENDPOINT | Drops an AI model endpoint object. |
By using this system package, you can directly manage AI models and endpoints within seekdb, without relying on external services.
Monitor AI model usage
seekdb allows you to query and monitor information about AI models and their usage through the following views:
- CDB/DBA_OB_AI_MODELS: Query information about AI models.
- CDB/DBA_OB_AI_MODEL_ENDPOINTS: Monitor the calls of AI models.
AI function expressions
seekdb supports the following AI function expressions, allowing you to call AI models directly within seekdb using SQL statements and greatly simplifying the process:
| Function | Description |
|---|---|
AI_COMPLETE | Calls a specified text generation large language model (LLM) to process prompts and data, and then parses the results. |
AI_PROMPT | Constructs and formats prompts. Supports dynamic data insertion. |
AI_EMBED | Calls an embedding model to convert text data into vector data. |
AI_RERANK | Calls a reranking model to sort text based on prompts by similarity. |
When using AI function expressions, make sure you have registered the AI models and endpoint information in the database.
AI_COMPLETE and AI_PROMPT
The AI_COMPLETE function specifies a registered large language model (LLM) for text generation using the model_key, processes the user-provided prompt and data, and returns the text generated by the model. Users can customize the prompt and the format of the data from the database through the prompt parameter. This approach not only enables flexible processing of textual data, but also allows for batch processing directly within the database, effectively avoiding the overhead of repeatedly transferring data between the database and the language model.
In many AI application scenarios, prompts are often highly structured and require dynamic injection of specific data. Manually concatenating prompts and input content using functions like CONCAT each time is not only costly in terms of development, but also prone to formatting errors. To support prompt reuse and dynamic combination of prompts and data, seekdb provides the AI_PROMPT function. AI_PROMPT upgrades prompts from "static text" to a "reusable, parameterizable" function template format, which can be used directly in place of the prompt parameter within AI_COMPLETE. This greatly simplifies the process of constructing prompts, improving both development efficiency and accuracy.
AI_PROMPT function
The AI_PROMPT function is used to construct and format prompts, supporting dynamic data insertion.
Syntax
The syntax for the AI_PROMPT function is as follows:
AI_PROMPT('template', expr0 [ , expr1, ... ]);
Parameters:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| template | The prompt template entered by the user. | VARCHAR(max_length) | No |
| expr | The data entered by the user. | VARCHAR(max_length) | No |
Both the template and expr parameters are required and cannot be null. The expr parameter only supports the VARCHAR type and does not support the JSON type.
Return value:
- JSON, the formatted prompt string.
Examples
The AI_PROMPT function organizes the template string and dynamic data into JSON format:
- The first parameter (the template string
template) is placed in thetemplatefield of the returned JSON. - Subsequent parameters (data values
expr0,expr1, ...) are placed in theargsarray of the returned JSON. - Placeholders in the template such as
{0},{1}, etc., correspond by index to the data in theargsarray and will be automatically replaced when used in theAI_COMPLETEfunction.
For example:
SELECT AI_PROMPT('Recommend {0} of the most popular {1} to me.', 'ten', 'mobile phones');
Return result:
{
"template": "Recommend {0} of the most popular {1} to me.",
"args": ["ten", "mobile phones"]
}
Based on the previous example, using the AI_PROMPT function within the AI_COMPLETE function:
SELECT AI_COMPLETE("ob_complete", AI_PROMPT('Recommend {0} of the most popular {1} to me. just output name in json array format', 'two', 'mobile phones')) AS ans;
Return result:
+--------------------------------------------------+
| ans |
+--------------------------------------------------+
| ["iPhone 15 Pro Max","Samsung Galaxy S24 Ultra"] |
+--------------------------------------------------+
AI_COMPLETE function
Syntax
The syntax for the AI_COMPLETE function is as follows:
AI_COMPLETE(model_key, prompt[, parameters])
-- If you use the AI_PROMPT function, replace the prompt parameter with the AI_PROMPT function. See the AI_PROMPT function example.
AI_COMPLETE(model_key, AI_PROMPT(prompt_template, data))
Parameters:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| model_key | The model registered in the database. | VARCHAR(128) | No |
| prompt | The prompt provided by the user. | VARCHAR/TEXT(LONGTEXT) | No |
| parameters | Optional configuration for the API, such as temperature, top_p, and max_tokens. These options vary by vendor and are added directly to the message body. Typically, you can use the default settings without specifying these options. | JSON | Yes |
Both model_key and prompt are required. If either is NULL, the function will return an error.
Return value:
- text: The text generated by the LLM based on the prompt.
Examples
-
Sentiment analysis example
SELECT AI_COMPLETE("ob_complete","Your task is to perform sentiment analysis on the provided text and determine whether the sentiment is positive or negative.
The text to analyze is as follows:
<text>
What a beautiful day!
</text>
Judgment criteria:
If the text expresses a positive sentiment, output 1; if it expresses a negative sentiment, output -1. Do not output anything else.\n") AS ans;Result:
+-----+
| ans |
+-----+
| 1 |
+-----+ -
Translation example
CREATE TABLE comments (
id INT AUTO_INCREMENT PRIMARY KEY,
content TEXT
);
INSERT INTO comments (content) VALUES ('hello world!');
-- Use the concat expression to replace the processed data with column names from the database table, enabling batch processing of database data without copying data to and from the LLM.
SELECT AI_COMPLETE("ob_complete",
concat("You are a professional translator. Please translate the following English text into Chinese. The text to be translated is:<text>",
content,
"</text>")) AS ans FROM comments;Result:
+-------------+
| ans |
+-------------+
| 你好,世界! |
+-------------+ -
Classification example
SELECT AI_COMPLETE("ob_complete","You are a classification expert. You will receive various issue texts and need to categorize them into the appropriate department. The department list is [\"Hardware\",\"Software\",\"Other\"]. The text to analyze is as follows:
<text>
The screen quality is terrible.
</text>") AS res;Result:
+--------+
| res |
+--------+
| Hardware |
+--------+
AI_EMBED
The AI_EMBED function uses the model_key parameter to specify a registered embedding model, which converts your text data into vector representations. If the model supports multiple dimensions, you can use the dim parameter to specify the output dimension.
Use AI_EMBED
Syntax:
AI_EMBED(model_key, input, [dim])
Parameters:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| model_key | The embedding model registered in your database. | VARCHAR(128) | No |
| input | The text you want to convert into a vector. | VARCHAR | No |
| dim | Specifies the output dimension of the vector. Some model providers support configuring this value. | INT64 | Yes |
Both model_key and input are required. If either is NULL, the function will return an error.
Return value:
- A string in vector format, that is, the embedding model’s vector representation of your text.
Examples
-
Embed single row of data.
SELECT AI_EMBED("ob_embed","Hello world") AS embedding;Result:
+----------------+
| embedding |
+----------------+
| [0.1, 0.2, 0.3]|
+----------------+ -
Embed table columns.
CREATE TABLE comments (
id INT AUTO_INCREMENT PRIMARY KEY,
content TEXT
);
INSERT INTO comments (content) VALUES ('hello world!');
SELECT AI_EMBED("ob_embed",content) AS embedding FROM comments;Result:
+----------------+
| embedding |
+----------------+
| [0.1, 0.2, 0.3]|
+----------------+
AI_RERANK
The AI_RERANK function uses the model_key parameter to specify a registered reranking model. It organizes your query and document list according to the provider's rules, sends them to the specified model, and returns the sorted results. This function is suitable for reranking scenarios in Retrieval-Augmented Generation (RAG).
Use AI_RERANK
Syntax:
AI_RERANK(model_key, query, documents[, document_key])
Parameters:
| Parameter | Description | Type | Nullable |
|---|---|---|---|
| model_key | The reranking model registered in your database. | VARCHAR(128) | No |
| query | The search text you want to use. | VARCHAR(1024) | No |
| documents | The list of documents to be ranked. | JSON array, for example, '["apple", "banana"]' | No |
All of the parameters model_key and input are required. If either is NULL, the function will return an error.
Return value:
- A JSON array containing the documents and their relevance scores, sorted in descending order by relevance.
Examples
SELECT AI_RERANK("ob_rerank","Apple",'["apple","banana","fruit","vegetable"]');
Result:
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ai_rerank("ob_rerank","Apple",'["apple","banana","fruit","vegetable"]') |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [{"index": 0, "document": {"text": "apple"}, "relevance_score": 0.9912109375}, {"index": 1, "document": {"text": "banana"}, "relevance_score": 0.0033512115478515625}, {"index": 2, "document": {"text": "fruit"}, "relevance_score": 0.0003669261932373047}, {"index": 3, "document": {"text": "vegetable"}, "relevance_score": 0.00001996755599975586}] |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+