Use SQL functions

This topic describes the vector functions supported by seekdb and the considerations for using them.

Considerations

Vectors with different dimensions are not allowed to perform the following operations. An error different vector dimensions %d and %d is returned.
When the result exceeds the floating-point number range, an error value out of range: overflow / underflow is returned.
Dense vector indexes support L2, inner product, and cosine distance as index distance algorithms. Memory-based sparse vector indexes support inner product as the distance algorithm. For details, see Create vector indexes.
Vector index search supports calling the L2_distance, Cosine_distance, and Inner_product distance functions in this document.

Distance functions

Distance functions are used to calculate the distance between two vectors. The calculation method varies depending on the distance algorithm used.

L2_distance

Euclidean distance reflects the distance between the coordinates of the compared vectors -- essentially the straight-line distance between two vectors. It is calculated by applying the Pythagorean theorem to vector coordinates:

Pythagorean theorem

The function syntax is as follows:

l2_distance(vector v1, vector v2)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

The return value is a distance(double) distance value.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t1(c1 vector(3));
INSERT INTO t1 VALUES('[1,2,3]');
SELECT l2_distance(c1, [1,2,3]), l2_distance([1,2,3],[1,1,1]), l2_distance('[1,1,1]','[1,2,3]') FROM t1;

The return result is as follows:

+--------------------------+------------------------------+----------------------------------+
| l2_distance(c1, [1,2,3]) | l2_distance([1,2,3],[1,1,1]) | l2_distance('[1,1,1]','[1,2,3]') |
+--------------------------+------------------------------+----------------------------------+
|                        0 |             2.23606797749979 |                 2.23606797749979 |
+--------------------------+------------------------------+----------------------------------+
1 row in set

L2_squared

L2 Squared distance is the square of the Euclidean distance (L2 Distance). It omits the square root operation in the Euclidean distance formula, thereby reducing computational cost while maintaining the relative order of distances. The calculation method is as follows:

L2 Squared

The syntax is as follows:

l2_squared(vector v1, vector v2)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

The return value is a distance(double) distance value.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t1(c1 vector(3));
INSERT INTO t1 VALUES('[1,2,3]');
SELECT l2_squared(c1, [1,2,3]), l2_squared([1,2,3],[1,1,1]), l2_squared('[1,1,1]','[1,2,3]') FROM t1;

The return result is as follows:

+-------------------------+-----------------------------+---------------------------------+
| l2_squared(c1, [1,2,3]) | l2_squared([1,2,3],[1,1,1]) | l2_squared('[1,1,1]','[1,2,3]') |
+-------------------------+-----------------------------+---------------------------------+
|                       0 |                           5 |                               5 |
+-------------------------+-----------------------------+---------------------------------+
1 row in set

L1_distance

The Manhattan distance is used to calculate the sum of absolute axis distances between two points in a standard coordinate system. The calculation formula is as follows:

Manhattan distance

The function syntax is as follows:

l1_distance(vector v1, vector v2)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

The return value is a distance(double) distance value.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t2(c1 vector(3));
INSERT INTO t2 VALUES('[1,2,3]');
INSERT INTO t2 VALUES('[1,1,1]');
SELECT l1_distance(c1, [1,2,3]) FROM t2;

The return result is as follows:

+--------------------------+
| l1_distance(c1, [1,2,3]) |
+--------------------------+
|                        0 |
|                        3 |
+--------------------------+
2 rows in set

Cosine_distance

Cosine similarity measures the angular difference between two vectors and reflects their directional similarity, regardless of their lengths (magnitude). The value range of cosine similarity is [-1, 1], where 1 indicates vectors in exactly the same direction, 0 indicates orthogonality, and -1 indicates completely opposite directions.

The calculation method for cosine similarity is as follows:

Cosine similarity

Since cosine similarity closer to 1 indicates greater similarity, cosine distance (or cosine dissimilarity) is sometimes used as a measure of distance between vectors. Cosine distance can be calculated by subtracting cosine similarity from 1:

Cosine distance

The value range of cosine distance is [0, 2], where 0 indicates exactly the same direction (no distance) and 2 indicates completely opposite directions.

The function syntax is as follows:

cosine_distance(vector v1, vector v2)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

The return value is a distance(double) distance value.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t3(c1 vector(3));
INSERT INTO t3 VALUES('[1,2,3]');
INSERT INTO t3 VALUES('[1,2,1]');
SELECT cosine_distance(c1, [1,2,3]) FROM t3;

+------------------------------+
| cosine_distance(c1, [1,2,3]) |
+------------------------------+
|                            0 |
|          0.12712843905603044 |
+------------------------------+
2 rows in set

Inner_product

The inner product, also known as the dot product or scalar product, represents a type of multiplication between two vectors. In geometric terms, the inner product indicates the direction and magnitude relationship between two vectors. The calculation method for the inner product is as follows:

Inner product

The syntax is as follows:

inner_product(vector v1, vector v2)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.
For sparse vectors using this function, one parameter can be a string in sparse vector format, such as c2,'{1:2.4}'. Two parameters cannot both be strings.

The return values are described as follows:

The return value is a distance(double) distance value.
If any parameter is NULL, the return value is NULL.

Dense vector example:

CREATE TABLE t4(c1 vector(3));
INSERT INTO t4 VALUES('[1,2,3]');
INSERT INTO t4 VALUES('[1,2,1]');
SELECT inner_product(c1, [1,2,3]) FROM t4;

The return result is as follows:

+----------------------------+
| inner_product(c1, [1,2,3]) |
+----------------------------+
|                         14 |
|                          8 |
+----------------------------+
2 rows in set

Sparse vector example:

CREATE TABLE t4(c1 INT, c2 SPARSEVECTOR, c3 SPARSEVECTOR);
INSERT INTO t4 VALUES(1, '{1:1.1, 2:2.2}', '{1:2.4}');
INSERT INTO t4 VALUES(2, '{1:1.5, 3:3.6}', '{4:4.5}');
SELECT inner_product(c2,c3) FROM t4;

The return result is as follows:

```shell
+----------------------+
| inner_product(c2,c3) |
+----------------------+
|    2.640000104904175 |
|                    0 |
+----------------------+
2 rows in set

Vector_distance

The vector_distance function calculates the distance between two vectors. You can specify parameters to select different distance algorithms.

The syntax is as follows:

vector_distance(vector v1, vector v2 [, string metric])

The vector v1/v2 parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The metric parameter is used to specify the distance algorithm. Options:

If not specified, the default algorithm is euclidean.
If specified, the only valid values are:
- euclidean. Represents Euclidean distance, same as L2_distance.
- manhattan. Represents Manhattan distance, same as L1_distance.
- cosine. Represents cosine distance, same as Cosine_distance.
- dot. Represents inner product, same as Inner_product.

The return values are described as follows:

The return value is a distance(double) distance value.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t5(c1 vector(3));
INSERT INTO t5 VALUES('[1,2,3]');
INSERT INTO t5 VALUES('[1,2,1]');
SELECT vector_distance(c1, [1,2,3], euclidean) FROM t5;

The return result is as follows:

+-----------------------------------------+
| vector_distance(c1, [1,2,3], euclidean) |
+-----------------------------------------+
|                                       0 |
|                                       2 |
+-----------------------------------------+
2 rows in set

Arithmetic functions

Arithmetic functions provide element-wise addition (+), subtraction (-), and multiplication (*) operations between vector types and vector types, single-level array types, and special string types, as well as between single-level array types and single-level array types, and special string types. The calculation method is element-wise, as shown for addition:

Addition

The syntax is as follows:

v1 + v2
v1 - v2
v1 * v2

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]'). Note: Two parameters cannot both be string types. At least one parameter must be a vector or single-level array type.
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

When at least one of the two parameters is a vector type, the return value is of the same vector type as the vector parameter.
When both parameters are single-level array types, the return value is of the array(float) type.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t6(c1 vector(3));
INSERT INTO t6 VALUES('[1,2,3]');
SELECT [1,2,3] + '[1.12,1000.0001, -1.2222]', c1 - [1,2,3] FROM t6;

The return result is as follows:

+---------------------------------------+--------------+
| [1,2,3] + '[1.12,1000.0001, -1.2222]' | c1 - [1,2,3] |
+---------------------------------------+--------------+
| [2.12,1002,1.7778]                    | [0,0,0]      |
+---------------------------------------+--------------+
1 row in set

Comparison functions

Comparison functions provide comparison calculations between vector types and vector types, single-level array types, and special string types, including the comparison operators =, !=, >, >=, <, and <=. The calculation method is element-wise dictionary order comparison.

The syntax is as follows:

v1 = v2
v1 != v2
v1 > v2
v1 < v2
v1 >= v2
v1 <= v2

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').

tip
One of the two parameters must be a vector type.
The dimensions of the two parameters must be the same.
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

The return value is of the bool type.
If any parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t7(c1 vector(3));
INSERT INTO t7 VALUES('[1,2,3]');
SELECT c1 = '[1,2,3]' FROM t7;

The return result is as follows:

+----------------+
| c1 = '[1,2,3]' |
+----------------+
|              1 |
+----------------+
1 row in set

Aggregate functions

tip

Vector columns cannot be used as GROUP BY conditions, and DISTINCT is not supported.

Sum

The Sum function is used to calculate the sum of vectors in a vector column of a table, using element-wise accumulation to obtain the sum vector.

The syntax is as follows:

sum(vector v1)

The parameters are described as follows:

Only the vector type is supported.

The return values are described as follows:

The return value is a sum (vector) value.

Here is an example:

CREATE TABLE t8(c1 vector(3));
INSERT INTO t8 VALUES('[1,2,3]');
SELECT sum(c1) FROM t8;

The return result is as follows:

+---------+
| sum(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set

Avg

The Avg function is used to calculate the average of vectors in a vector column of a table.

The syntax is as follows:

avg(vector v1)

The parameters are described as follows:

Only the vector type is supported.

The return values are described as follows:

The return value is an avg (vector) value.
NULL rows in the vector column are not counted.
When the input parameter is empty, the output is NULL.

Here is an example:

CREATE TABLE t9(c1 vector(3));
INSERT INTO t9 VALUES('[1,2,3]');
SELECT avg(c1) FROM t9;

The return result is as follows:

+---------+
| avg(c1) |
+---------+
| [1,2,3] |
+---------+
1 row in set

Other common vector functions

Vector_norm

The Vector_norm function calculates the Euclidean norm (or length) of a vector, which represents the Euclidean distance between the vector and the origin. The calculation formula is as follows:

Euclidean norm

The syntax is as follows:

vector_norm(vector v1)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').
If a single-level array type parameter exists, its elements cannot be NULL.

The return values are described as follows:

The return value is a norm(double) modulus value.
If the parameter is NULL, the return value is NULL.

Here is an example:

CREATE TABLE t10(c1 vector(3));
INSERT INTO t10 VALUES('[1,2,3]');
SELECT vector_norm(c1),vector_norm([1,2,3]) FROM t10;

The return result is as follows:

+--------------------+----------------------+
| vector_norm(c1)    | vector_norm([1,2,3]) |
+--------------------+----------------------+
| 3.7416573867739413 |   3.7416573867739413 |
+--------------------+----------------------+
1 row in set

Vector_dims

The Vector_dims function is used to return the vector dimension.

The syntax is as follows:

vector_dims(vector v1)

The parameters are described as follows:

Apart from the vector type, other types that can be forcibly converted to the vector type are accepted, including single-level array types (such as [1,2,3,..]) and string types (such as '[1,2,3]').

The return values are described as follows:

The return value is a dims(int64) dimension value.
If the parameter is NULL, an error is returned.

Here is an example:

CREATE TABLE t11(c1 vector(3));
INSERT INTO t11 VALUES('[1,2,3]');
INSERT INTO t11 VALUES('[1,1,1]');
SELECT vector_dims(c1), vector_dims('[1,2,3]') FROM t11;

The return result is as follows:

+-----------------+------------------------+
| vector_dims(c1) | vector_dims('[1,2,3]') |
+-----------------+------------------------+
|               3 |                      3 |
|               3 |                      3 |
+-----------------+------------------------+
2 rows in set

Considerations​

Distance functions​

L2_distance​

L2_squared​

L1_distance​

Cosine_distance​

Inner_product​

Vector_distance​

Arithmetic functions​

Comparison functions​

Aggregate functions​

Sum​

Avg​

Other common vector functions​

Vector_norm​

Vector_dims​

Contents

Considerations

Distance functions

L2_distance

L2_squared

L1_distance

Cosine_distance

Inner_product

Vector_distance

Arithmetic functions

Comparison functions

Aggregate functions

Sum

Avg

Other common vector functions

Vector_norm

Vector_dims