Database schema design overview
This topic describes the database schema in seekdb.
Database
In seekdb, a database can be considered a collection of tables and indexes.
After deploying seekdb, a database named test is automatically created for testing or experience. However, for actual use, it is recommended to create your own databases.
Table
In seekdb, a table is the most basic data storage unit. Tables contain all the data accessible to users, with each table consisting of multiple rows of records, and each record containing multiple columns. The design and use of each table must be reasonably planned based on business needs to ensure the efficiency and scalability of the system.
Table types
seekdb supports various table types, including partitioned tables, replicated tables, primary key tables, non-primary key tables, and external tables, among others.
- Partitioned tables: seekdb allows you to divide the data of a regular table into different partitions based on specific rules, storing the data of each partition together. This type of table is called a partitioned table. seekdb supports basic partitioning strategies such as range, list, and hash partitioning.
- Primary key tables and non-primary key tables: A table with a primary key is called a primary key table; a table without a specified primary key is called a non-primary key table.
- External tables: The data of a table in a database is stored in the database's storage space, while the data of an external table is stored in an external storage service.
- Heap-organized tables: In heap-organized tables, the primary key is used for uniqueness constraints, and queries rely on the primary table. When user data is sorted by time, the Skip Index can be more effectively utilized to improve query efficiency.
Additionally, based on the data storage method, seekdb introduces new table types: columnar tables and hybrid row-column tables.
-
Columnar tables: In columnar tables, data is stored by columns rather than by rows. This significantly improves the performance of analytical queries, especially in scenarios involving large amounts of data and frequent aggregation analysis. For more information, see Columnar storage.
-
Hybrid row-column tables: In hybrid row-column tables, data is stored both by rows and by columns. The system automatically determines whether to use row-based or column-based queries based on the query statement, making it suitable for scenarios that require both transactional and analytical operations.
Table update modes
seekdb allows you to specify the data write and query modes when creating a table. When creating a table, you can use the merge_engine parameter in the CREATE TABLE statement to choose between the delete_insert update mode and the partial_update update mode, which are data update strategies designed for different business scenarios.
-
delete_insert (Full-column update mode)
This mode prioritizes query performance. It uses the "Merge-On-Write" mechanism to convert
UPDATEoperations into full-columnDELETEandINSERTrecords, ensuring that each row contains complete column values. This mode significantly improves the efficiency of complex queries and batch processing (such as analytical tasks), but it requires additional storage space for incremental data, making it suitable for scenarios with frequent incremental data and the need for rapid analysis. -
partial_update (Partial update mode)
This mode only records the values of modified columns, avoiding redundant storage. During queries, multiple data sets must be merged to obtain the latest values, resulting in relatively lower performance. However, it is more suitable for scenarios with frequent updates but low query requirements (such as OLTP operations) or environments sensitive to storage costs.
| Feature category | delete_insert update mode | partial_update update mode |
|---|---|---|
| Storage method | Each update writes two rows (DELETE and INSERT) to the SSTable, containing all column data. | Each update only records the values of modified columns, saving storage space. |
| Query efficiency | Filters are precomputed in the memtable/sstable during queries. If the filtering results of incremental data do not include updates to baseline data, baseline and incremental data can be batch-processed separately; if the filtering results include updates to baseline data, incremental and baseline records are merged to obtain the latest values for projection. | Multiple Memtable/SSTable records must be merged during queries to obtain the latest primary key values, which may affect performance. Suitable for scenarios sensitive to storage costs with frequent updates. |
| Applicable scenarios | Scenarios with a high proportion of incremental data requiring frequent execution of complex queries or batch processing. | Scenarios with frequent updates but low query requirements. |
For more information, see Create a table.
Index
Indexes are key components for improving query performance. seekdb supports various types of indexes, providing flexible and efficient data retrieval methods. Here are some common types of indexes and their applications:
- Local indexes: Local indexes are indexes established for individual partitions. They are suitable for scenarios requiring localized data queries. When data is strictly divided into multiple partitions, local indexes can significantly reduce the data scanning range during queries, thus improving query efficiency.
- Unique indexes: Unique indexes ensure that each value in a data column is unique. By creating unique indexes on certain columns, the database can effectively prevent data duplication and accelerate query processes.
- Non-unique indexes: Unlike unique indexes, non-unique indexes do not require all values in a column to be unique. They are mainly used to accelerate data queries, especially when searching for multiple records that meet specific conditions, significantly improving retrieval efficiency.
- Full-text indexes: seekdb supports full-text search capabilities compatible with MySQL. By preprocessing text content and establishing keyword indexes, it effectively enhances full-text search efficiency. For more information about full-text indexes, see Full-text index.
- JSON multi-value indexes: seekdb supports multi-value indexing, which is particularly useful for JSON documents and other collection data types. This feature allows you to create indexes on arrays or collections, improving query efficiency when searching based on JSON array elements. For more information about multi-value indexes, see Multi-value index.
- Spatial indexes: Spatial indexes are used to optimize the efficiency of geographic space data queries. They are widely applied in geographic information systems (GIS) and location-based services. In these applications, spatial indexes can quickly retrieve data information within a geographic coordinate range, providing strong support for location services.
- Columnar indexes: seekdb allows you to specify the storage format of a table as columnar when creating a table. Since indexes and data tables are both tables, you can also set the data in the index table to be stored in a columnar format. Columnar indexes store data by columns rather than by rows, significantly improving the performance of analytical queries, especially when processing large-scale data, greatly reducing the time required for data retrieval and analysis, thus enabling real-time analysis. For more information about columnar storage, see Columnar storage.
For more information about indexes, see Index overview.
Data type
Before creating and using tables, you need to plan the table structure and data types based on business requirements. To ensure efficient data storage and query optimization, follow these principles:
- Normalize the table structure: Design the table structure reasonably to minimize data redundancy and improve query efficiency.
- Choose appropriate SQL data types: Select the most suitable SQL data type for each column to reduce storage space and improve query speed.
Common SQL data types include:
- Basic data types: Such as
INT,VARCHAR, andDATE. - Complex data types: Such as
JSON,ARRAY, andBITMAP, suitable for storing more complex data structures.
For detailed descriptions of SQL data types, see Data types
View
seekdb supports standard views and materialized views.
- Standard views: Standard views are also known as non-materialized views. They are the most common type of view. They store only the SQL query that defines the view, not the query results.
- Materialized views: Unlike standard views, materialized views store the results of the query in physical storage. Seekdb supports asynchronous materialized views, which means that the materialized view is not immediately updated when the data in the base table changes. This ensures the performance of DML operations on the base table. For more information, see Materialized views.