Version: V1.0.0

Overview of vector search

This topic describes the core concepts of vector databases and vector search.

seekdb supports dense vectors of up to 16,000 dimensions and sparse vectors. It supports various types of vector distance calculations, including Manhattan distance, Euclidean distance, inner product, and cosine distance. It supports the creation of vector indexes based on HNSW and IVF. It supports incremental updates and deletions, and these operations do not affect the recall rate.

Core features

seekdb provides the capabilities to store, index, and search embedding vector data. Specifically, it includes the following features:

Core feature	Description
Vector data types	Supports the storage of dense vectors with a maximum of 16,000 dimensions. Supports the storage of sparse vectors.
Vector indexes	Supports exact search and approximate nearest neighbor search. Supports L2 distance, inner product, and cosine similarity calculations. Supports HNSW, HNSW_SQ, and HNSW_BQ indexes. The maximum dimension of the indexed column is 4,096. The HNSW_BQ index is supported from version V4.4.0 onwards. Supports IVF and IVF_PQ indexes. The maximum dimension of the indexed column is 4,096.
Vector search SQL operators	Supports basic vector operations such as addition, subtraction, multiplication, comparison, and aggregation.

The following limitations and notes apply:

seekdb uses the NULL first comparison mode by default. Therefore, when sorting NULL values, they are placed at the beginning. It is recommended to add the NOT NULL condition in queries.

Key concepts

Unstructured data

Unstructured data refers to data that does not have a predefined format or organization. It typically includes text, images, audio, video, and other forms of data, as well as social media content, emails, and log files. Due to the complexity and diversity of unstructured data, specific tools and techniques are required for processing, such as natural language processing, image recognition, and machine learning.

Vectors

A vector is essentially the projection of an object in a high-dimensional space. Mathematically, a vector is a floating-point array with the following characteristics:

Each element in the array represents a dimension of the vector, and each element is a floating-point number.
The size of the vector array (number of elements) indicates the dimensionality of the entire vector space.

Vector embeddings

Vector embeddings refer to the process of extracting content and semantic information from unstructured data using deep learning neural networks and converting images, videos, and other data into feature vectors. Embedding technology maps raw data from high-dimensional spaces to low-dimensional spaces, transforming multimodal data with rich features into multidimensional vector data.

Vector similarity search

In the era of information overload, users often need to quickly search for the information they need from vast amounts of data. For example, online literature databases, e-commerce product catalogs, and growing multimedia content libraries all require efficient search systems to quickly locate content of interest. As data volumes continue to grow, traditional keyword-based search methods are no longer sufficient to meet users' needs for search accuracy and speed. This is where vector search technology comes into play. Vector similarity search uses feature extraction and vectorization techniques to convert various types of unstructured data, such as text, images, and audio, into vectors. It then uses similarity measurement methods to compare the similarity between these vectors, capturing the underlying semantic information of the data to provide more accurate and efficient search results.

Core features​

Key concepts​

Unstructured data​

Vectors​

Vector embeddings​

Vector similarity search​

Contents