Overview of vector embedding
This topic provides an overview of the concepts and usage of vector embedding.
What is vector embedding?
Vector embedding is a technique that converts unstructured data into numerical vectors. These vectors capture the semantic information of unstructured data, allowing computers to "understand" and process the meaning of unstructured data. Specifically:
- Vector embedding maps unstructured data such as text, images, and audio/video into points in a high-dimensional vector space.
- In this vector space, semantically similar unstructured data are mapped to nearby positions.
- Vectors are typically composed of hundreds of numbers (such as 512 dimensions or 1024 dimensions).
- The similarity between vectors can be calculated using mathematical methods such as cosine similarity.
- Common vector embedding models include Word2Vec, BERT, BGE, and so on. For example, when developing a RAG application, we usually need to embed text data into vector data and store it in a vector database, while other structured data are stored in a relational database.
seekdb allows you to store vector data as a data type in a relational table, enabling efficient and organized storage of both vector and traditional scalar data in seekdb.
How to generate vector embeddings
In seekdb, you can generate vector embeddings in the following two ways:
| Method | Description | Advantages | Differences |
|---|---|---|---|
| Use the AI_EMBED function | Use the built-in AI function service of the database without installing additional dependencies. | Simple and convenient, with direct SQL calls. | Currently only supports text embeddings. |
| Use an external embedding model | Use external models such as Sentence Transformers, Ollama, and online APIs. | Offers more flexibility and supports a wider range of models. | Supports generating vector embeddings from various data sources, including text, images, and more. |