Skip to main content

Data import

seekdb allows you to import data from external files into database tables. It provides two methods for data import: server-side import and client-side import. This topic describes how to use these two methods.

Overview

Depending on the location of the data file, seekdb supports the following two main data import methods:

  1. Server-side import (recommended): The data file is located in a position accessible to the server (such as the local file system or object storage). The database server directly reads the file for import. This method is suitable for production environments and large-scale imports. Server-side import is based on the FILES() function, and it provides two usage methods: LOAD DATA FROM FILES() and INSERT INTO ... SELECT FROM FILES().

  2. Client-side import: The data file is located on the client side. The client tool reads the file and transmits it to the server. This method is suitable for scenarios where the server and the file location are not connected through a network.

Note

For seekdb, it is recommended to prioritize server-side import. Server-side import offers better performance and does not require client-side configuration. Client-side import is suitable for scenarios where the server cannot directly access the file location.

This topic provides a simple example of how to use the LOAD DATA statement for data import. For more detailed information about the LOAD DATA statement, please refer to the LOAD DATA documentation.

Limitations

Tables with triggers are not supported for LOAD DATA statements.

Considerations

seekdb optimizes data import speed using parallel processing technology. This operation divides data into multiple subtasks for parallel execution, with each subtask treated as an independent transaction. The execution order is not fixed. Therefore:

  • Global atomicity of the data import process cannot be guaranteed.
  • For tables without a primary key, the data write order may differ from the original file order.

Obtain import permissions

Before performing a data import operation, you need to obtain the necessary permissions. Here are the steps to grant the required permissions:

  1. Grant the FILE permission to the user.

    GRANT FILE ON *.* TO user_name;

    Here, user_name is the user who needs to execute the import command.

  2. Grant the INSERT permission on the table.

    GRANT INSERT ON database_name.tbl_name TO user_name;

    Here, database_name is the database name, tbl_name is the table name, and user_name is the user who needs to execute the import command.

  3. Set the secure_file_priv system variable to control the path accessible for importing or exporting to a file (only required for server-side import).

    Notice

    For security reasons, when setting the secure_file_priv system variable, you can only modify the global variable by executing an SQL statement through a local Socket connection to the database.

    Example:

    Setting the file directory to / means no restrictions, and any path can be accessed. For more information, please refer to secure_file_priv.

    SET GLOBAL SECURE_FILE_PRIV = "/";

Server-side import is the recommended data import method in seekdb. It allows you to directly import data from a server-accessible location, such as a local file system or object storage.

seekdb provides two ways to use the FILES() function for data import:

  1. LOAD DATA FROM FILES(): A dedicated statement for data import, suitable for direct import scenarios.
  2. INSERT INTO ... SELECT FROM FILES(): Uses the standard INSERT SELECT syntax in combination with the FILES() function, offering more flexible data processing capabilities.

This topic provides simple examples of how to use both methods for data import.

Method 1: Use LOAD DATA FROM FILES()

  1. Prepare the data file and place it in a server-accessible location.

    1. Create a directory for the data file and prepare the test data on the server:

      mkdir -p /home/admin/test_data
    2. Create a file using an editor:

      vi /home/admin/test_data/student.csv
    3. Add the following content to the file:

      1,lin,98
      2,hei,90
      3,ali,95
  2. Create the target table.

    CREATE TABLE student (id INT, name VARCHAR(50), score INT);
  3. Use the LOAD DATA FROM FILES() statement to import the data.

    LOAD DATA FROM FILES(
    LOCATION = '/home/admin/test_data/',
    FORMAT = (
    TYPE = 'csv',
    FIELD_DELIMITER = ','),
    PATTERN = 'student.csv')
    INTO TABLE student;

    Parameter description:

    • LOCATION: Specifies the directory path where the file is located.
    • FORMAT: Specifies the file format, TYPE = 'csv' indicates CSV format, and FIELD_DELIMITER specifies the column delimiter as a comma.
    • PATTERN: Specifies the file matching pattern, 'student.csv' indicates importing only the file named student.csv.
  4. View the import result.

    SELECT * FROM student;

    The returned result is as follows:

    +------+------+-------+
    | id | name | score |
    +------+------+-------+
    | 1 | lin | 98 |
    | 2 | hei | 90 |
    | 3 | ali | 95 |
    +------+------+-------+
    3 rows in set

Exception handling

The LOAD DATA FROM FILES() statement supports error diagnostics mode, allowing failed rows to be recorded without terminating the entire operation due to the first error. You can add the LOG ERRORS clause to the normal import statement to enable error diagnostics.

Example:

LOAD DATA FROM FILES(
LOCATION = '/home/admin/test_data/',
FORMAT = (
TYPE = 'csv',
FIELD_DELIMITER = ','),
PATTERN = 'student.csv')
INTO TABLE student
LOG ERRORS
INTO '/home/admin/error_logs/'
REJECT LIMIT 10
BADFILE '/home/admin/bad_files/';

Description:

  • LOG ERRORS: Enables error diagnostics mode.
  • INTO: Specifies the directory for storing error logs.
  • REJECT LIMIT: The maximum number of error rows allowed, 10 indicates that up to 10 rows of data can be imported, and the entire import operation will be terminated if the limit is exceeded.
  • BADFILE: Specifies the directory for saving failed data rows.

You can view the error log using the READ_ERROR_LOG() function:

SELECT * FROM READ_ERROR_LOG('/home/admin/error_logs/');

Method 2: Use INSERT INTO ... SELECT FROM FILES()

  1. Prepare the data file and place it in a server-accessible location.

    1. Create a directory for the data file and prepare the test data on the server (skip this step if already created):

      mkdir -p /home/admin/test_data
    2. Create the student.csv file (skip this step if the file already exists):

      vi /home/admin/test_data/student.csv
    3. Add the following content to the file:

      1,lin,98
      2,hei,90
      3,ali,95
  2. Create the target table.

    CREATE TABLE student (id INT, name VARCHAR(50), score INT);
  3. Use the INSERT INTO ... SELECT FROM FILES() statement to import the data.

    INSERT INTO student
    SELECT * FROM FILES(
    LOCATION = '/home/admin/test_data/',
    FORMAT = (
    TYPE = 'csv',
    FIELD_DELIMITER = ','),
    PATTERN = 'student.csv');
  4. View the import result.

    SELECT * FROM student;

    The returned result is as follows:

    +------+------+-------+
    | id | name | score |
    +------+------+-------+
    | 1 | lin | 98 |
    | 2 | hei | 90 |
    | 3 | ali | 95 |
    +------+------+-------+
    3 rows in set

Note

INSERT INTO ... SELECT FROM FILES() uses the same FILES() function syntax but provides more flexible data processing capabilities. You can use SQL features such as WHERE, JOIN, and aggregate functions in the SELECT statement to process the data before inserting it.

Client import

The client import method is suitable for scenarios where the server cannot directly access the file location. The data file is located on the client side, and the client tool reads the file and transfers it to the server.

Procedure:

  1. Create a test data file locally.

    1. Create a CSV file named test_tbl1.csv in the /home/admin/test_data directory on the client.

      vi /home/admin/test_data/test_tbl1.csv
    2. Add the following content to the file:

      1,11
      2,22
      3,33
  2. Start the client and enable the local file import feature.

    Execute the following statement to connect to the seekdb database using the client tool. You must add the --local-infile parameter to enable the data loading feature from local files.

    mysql --local-infile -hxxx.xxx.xxx.xxx -P2881 -uroot -p****** -Dtest

    Notice

    When using the LOAD DATA LOCAL INFILE feature, you must add the --local-infile parameter when connecting to the client. This is a behavior of MySQL clients and cannot be modified in any other way. If you do not add this parameter, executing the LOAD DATA LOCAL INFILE statement will result in an error.

    To use the LOAD DATA LOCAL INFILE feature, you must use a client tool that supports this feature.

  3. Create a test table.

    CREATE TABLE test_tbl1(col1 INT, col2 INT);
  4. On the client, execute the LOAD DATA LOCAL INFILE statement to load the local data file.

    LOAD DATA LOCAL INFILE '/home/admin/test_data/test_tbl1.csv'
    INTO TABLE test_tbl1
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';
  5. View the table information.

    SELECT * FROM test_tbl1;

    The returned result is as follows:

    +------+------+
    | col1 | col2 |
    +------+------+
    | 1 | 11 |
    | 2 | 22 |
    | 3 | 33 |
    +------+------+
    3 rows in set

References

  • For more information about the LOAD DATA syntax, see LOAD DATA.
  • For more information about secure_file_priv, see secure_file_priv.