LOAD DATA
Description
This statement is used to import data from external sources.
-
The
LOAD DATAstatement supports loading the following input files:-
Server-side (seekdb node) files: These files are located on the seekdb node. You can use the
LOAD DATA INFILEorLOAD DATA FROM URLstatement to load data from server-side files into a database table. -
Client-side (local) files: These files are located on the client's local file system. You can use the
LOAD DATA LOCAL INFILEorLOAD DATA FROM URLstatement to load data from client-side files into a database table.infoWhen you execute the
LOAD DATA LOCAL INFILEstatement, the system automatically adds theIGNOREoption. -
OSS files: These files are located in the OSS file system. You can use the
LOAD DATA REMOTE_OSS INFILEstatement to load data from OSS files into a database table.
-
-
Currently, the
LOAD DATAstatement can import data from CSV text files. The import process involves the following steps:-
Parse the file: seekdb reads the data from the specified file and parses the data in parallel or serially based on the specified parallelism.
-
Insert the data: After receiving the data, seekdb executes the
INSERToperation locally to insert the data into the corresponding partition.
-
Considerations
-
You cannot use the
LOAD DATAstatement for tables that have triggers. -
To import data from an external file, you must have the
FILEprivilege and the following settings:- When loading a server-side file, you must set the system variable secure_file_priv to specify the path for importing or exporting files.
- When loading a local client-side file, you must add the
--local-infile[=1]option when starting the MySQL client to enable the data loading feature from the local file system.
-
When using partitioned table direct load, the target table cannot be a replica table, and it cannot contain auto-increment columns, identity columns, or global indexes.
Syntax
-- Load data from a file.
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
[REMOTE_OSS | LOCAL] INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE table_name [PARTITION(PARTITION_OPTION)]
[COMPRESSION [=] {AUTO|NONE|GZIP|DEFLATE|ZSTD}]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(column_name_var
[, column_name_var] ...)]
load_mode:
'full'
| 'inc_replace'
PARTITION_OPTION:
partition_option_list
| subpartition_option_list
-- Load data from a URL.
LOAD DATA
[/*+ PARALLEL(N) [load_batch_size(M)] [APPEND | direct(bool, int, [load_mode])] | NO_DIRECT */]
[REPLACE | IGNORE]
FROM { url_table_function_expr |
( SELECT expression_list FROM url_table_function_expr ) }
INTO TABLE table_name
[PARTITION(PARTITION_OPTION)]
[(column_name_var [, column_name_var] ...)]
[LOG ERRORS [REJECT LIMIT {integer | UNLIMITED}]]
load_mode:
'full'
| 'inc_replace'
url_table_function_expr:
FILES (
LOCATION = '<string>',
{
FORMAT = (
TYPE = 'CSV',
LINE_DELIMITER = '<string>' | <expr>,
FIELD_DELIMITER = '<string>' | <expr>,
PARSE_HEADER = { TRUE | FALSE },
ESCAPE = '<character>' | <expr>,
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>,
ENCODING = 'charset',
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...),
SKIP_HEADER = <int>,
SKIP_BLANK_LINES = { TRUE | FALSE },
TRIM_SPACE = { TRUE | FALSE },
EMPTY_FIELD_AS_NULL = { TRUE | FALSE }
)
| FORMAT = ( TYPE = 'PARQUET' | 'ORC' )
},
[PATTERN = '<regex_pattern>']
)
| SOURCE (
TYPE = 'ODPS',
ACCESSID = '<string>',
ACCESSKEY = '<string>',
ENDPOINT = '<string>',
TUNNEL_ENDPOINT = '<string>',
PROJECT_NAME = '<string>',
SCHEMA_NAME = '<string>',
TABLE_NAME = '<string>',
QUOTA_NAME = '<string>',
COMPRESSION_CODE = '<string>'
)
PARTITION_OPTION:
partition_option_list
| subpartition_option_list
Parameter description
| Parameter | Description |
|---|---|
| parallel(N) | The parallelism for data loading. Default value: 4. |
| load_batch_size(M) | The size of the batch to be inserted. The default value of M is 100. We recommend that you set the value of M to a value in the range [100, 1000]. |
| APPEND | direct() |NO_DIRECT | Use a hint to enable direct load. Notice Do not upgrade seekdb during a direct load task. Otherwise, the direct load task may fail.
|
| REMOTE_OSS | LOCAL | An optional parameter.
|
| file_name | The path and name of the input file. file_name can be in the following formats:
Notice When you import a file from an object storage service, make sure that the following conditions are met:
|
| REPLACE | IGNORE | If a unique key conflict occurs, REPLACE indicates that the conflicting row is overwritten, and IGNORE indicates that the conflicting row is ignored. LOAD DATA determines whether the data is duplicated based on the primary key of the table. If the table does not have a primary key, the REPLACE and IGNORE options are equivalent. By default, when a duplicate data is encountered, LOAD DATA records the data in the log file. Notice
|
| url_table_function_expr | Optional. Specifies the FILES and SOURCE keywords to read data from a file system or data source. |
| table_name | The name of the table to which the data is imported.
|
| PARTITION_OPTION | Specifies the partition name when using direct load.
Note Direct load is supported for specifying partitions. Ordinary LOAD DATA is not supported. In other words, if you do not add a direct load hint or set a direct load parameter, specifying partitions in the LOAD DATA statement does not take effect. |
| COMPRESSION | Specifies the compression format of the file.
|
| FIELDS | COLUMNS | Specifies the format of the fields.
|
| LINES STARTING BY | Specifies the line starting character. |
| LINES TERMINATED BY | Specifies the line ending character. |
| IGNORE number { LINES | ROWS } | Specifies the number of lines to be ignored. LINES specifies the number of lines at the beginning of the file, and ROWS specifies the number of rows at the beginning of the file based on the field delimiter. By default, the system maps the fields in the input file to the columns in the table. If the input file does not contain all the columns, the missing columns are filled with the following default values:
Note The behavior of the IGNORE clause is the same when you import data from multiple files as when you import data from a single file. |
| column_name_var | Optional. Specifies the name of the imported column. |
| LOG ERRORS [REJECT LIMIT {integer | UNLIMITED}] | Optional. Specifies whether to enable error diagnostics during the import of data from a URL external table. For more information, see log_errors below. Note The syntax of the LOAD DATA statement for importing data from a URL external table in seekdb supports specifying error diagnostics. |
FILES
The FILES keyword consists of the LOCATION, FORMAT, and PATTERN clauses.
-
The
LOCATIONclause specifies the path where the external table files are stored. Typically, the data files of an external table are stored in a separate directory, which can contain subdirectories. When a table is created, the external table automatically collects all files in the directory.-
The format of a local
LOCATIONisLOCATION = '[file://] local_file_path', wherelocal_file_pathcan be a relative or absolute path. If a relative path is specified, the current directory must be the installation directory of SeekDB. Thesecure_file_privparameter specifies the file paths that the SeekDB node has permission to access.local_file_pathmust be a subpath of thesecure_file_privpath. -
The format of a remote
LOCATIONis as follows:LOCATION = '{oss|cos|S3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path', where$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required to access OSS, COS, and S3, respectively, ands3_regionis the region information selected when using S3. These sensitive access information are stored in the system table of the database in an encrypted manner.LOCATION = 'hdfs://$ {hdfs_namenode_address}:${port}/PATH.localhost', whereportis the port number of HDFS, andPATHis the directory path in HDFS.- With Kerberos authentication:
LOCATION = 'hdfs://localhost:port/user?principal=xxx&keytab=xxx&krb5conf=xxx&configs=xxx'. Where:principal: specifies the user for login authentication.keytab: specifies the path of the user authentication key file.krb5conf: specifies the path of the Kerberos environment description file.configs: specifies additional HDFS configuration items. The default value is empty. However, in a Kerberos environment, this configuration item usually has a value and needs to be configured, for example:dfs.data.transfer.protection=authentication,privacy, which specifies the data transmission protection level asauthenticationandprivacy.
- With Kerberos authentication:
tipWhen using an object storage path, the parameters of the object storage path are separated by the
&symbol. Please ensure that the parameter values you enter contain only uppercase and lowercase letters, numbers,\/-_$+=, and wildcards. If you enter other characters, the settings may fail.
-
-
The
FORMATclause specifies attributes related to reading files. It supports three file formats: CSV, PARQUET, and ORC.-
When TYPE = 'CSV', the following fields are included:
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'.FIELD_DELIMITER: optional. Specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'.PARSE_HEADER: optional. Specifies whether the first line of the CSV file is the column names for each column. The default value isFALSE, indicating that the first line of the CSV file is not specified as the column names for each column.ESCAPE: specifies the escape character of the CSV file. It can only be one byte. The default value isESCAPE ='\'.FIELD_OPTIONALLY_ENCLOSED_BY: optional. Specifies the symbol used to enclose field values in the CSV file. The default value is empty. Using this option indicates that only fields of certain types (such as CHAR, VARCHAR, TEXT, and JSON) are enclosed.ENCODING: specifies the character set encoding format of the file. If not specified, the default value isUTF8MB4.NULL_IF: specifies the string that is treated asNULL. The default value is empty.SKIP_HEADER: skips the file header and specifies the number of skipped lines.SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, indicating that blank lines are not skipped.TRIM_SPACE: specifies whether to remove leading and trailing spaces from fields in the file. The default value isFALSE, indicating that leading and trailing spaces in fields are not removed.EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, indicating that empty strings are not treated asNULL.
-
When TYPE = 'PARQUET/ORC', there are no additional fields.
-
-
The
PATTERNclause specifies a regular expression pattern to filter files in theLOCATIONdirectory. For each file path in theLOCATIONdirectory, if the file path matches the pattern, the external table accesses the file; otherwise, it skips the file. If this parameter is not specified, the external table can access all files in theLOCATIONdirectory by default.
SOURCE
The SOURCE keyword does not contain other clauses. In this case, TYPE = 'ODPS' and the following fields are included:
ACCESSID: specifies the ID of the ODPS user.ACCESSKEY: specifies the password of the ODPS user.ENDPOINT: specifies the connection address of the ODPS service.TUNNEL_ENDPOINT: specifies the connection address of the Tunnel data transmission service.PROJECT_NAME: specifies the project where the table to be queried is located.SCHEMA_NAME: optional. Specifies the schema of the table to be queried.TABLE_NAME: specifies the name of the table to be queried.QUOTA_NAME: optional. Specifies whether to use the specified quota.COMPRESSION_CODE: optional. Specifies the compression format of the data source. Supported compression formats includeZLIB,ZSTD,LZ4, andODPS_LZ4. If not specified, compression is not enabled.
log_errors
-
LOG ERRORS: enables error diagnostics during the import process, allowing failed rows to be recorded (currently recorded in thewarning buffer, which can be viewed usingshow warnings, with a maximum of 64 rows), instead of terminating the entire operation due to the first error. When used with theREJECT LIMITclause, it allows control over the maximum number of allowed error rows. -
REJECT LIMIT: optional. Specifies the maximum number of allowed error rows:- The default value is 0: no error rows are allowed, and the operation fails upon encountering the first error.
integer: the maximum number of allowed error rows on a single machine. For example, 10 indicates that up to 10 error rows can be encountered on a single machine.UNLIMITED: allows an unlimited number of error rows.
- If the
LOG ERRORSclause is not specified, the default behavior is to immediately report an error upon encountering the first error. - If the
LOG ERRORSclause is specified but theREJECT LIMITclause is not, it is equivalent to specifying aLIMITof 0. In this case, the operation will fail upon encountering the first error, but the first error will be recorded, and the error code will be a diagnostic-related error, such as "reject limit reached".
Wildcard rules for multi-file direct load
To facilitate multi-file direct load, the wildcard feature is supported for server-side and OSS file direct load, but not for client-side file direct load.
-
Server-side wildcard usage
-
Matching rules:
-
Match a file:
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.*.csv' replace into table t1 fields terminated by '|'; -
Match a directory:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.1.csv' replace into table t1 fields terminated by '|'; -
Match both a directory and a file:
load data /*+ parallel(20) direct(true, 0) */ infile '/aaa*bb/test.*.csv' replace into table t1 fields terminated by '|';
-
-
Considerations:
-
At least one matching file must exist. Otherwise, an error code 4027 is returned.
-
For the input
load data /*+ parallel(20) direct(true, 0) */ infile '/xxx/test.1*.csv,/xxx/test.6*.csv' replace into table t1 fields terminated by '|';,/xxx/test.1*.csv,/xxx/test.6*.csvis considered a whole match. If no matching files are found, an error code 4027 is returned. -
Only the wildcards supported by the POSIX GLOB function are supported, such as
test.6*(6|0).csvandtest.6*({0.csv,6.csv}|.csv). Although these wildcards can be found by using thelscommand, the GLOB function cannot match them, and an error code 4027 is returned.
-
-
-
Wildcard usage in the cloud object storage service (
OSS)-
Matching rules:
Match a file:
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://xxx/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|'; -
Considerations:
-
Directory matching is not supported. For example,
load data /*+ parallel(20) direct(true, 0) */ remote_oss infile 'oss://aa*bb/test.*.csv?host=xxx&access_id=xxx&access_key=xxx' replace into table t1 fields terminated by '|';returnsOB_NOT_SUPPORTED. -
Only the
*and?wildcards are supported for filenames. Other wildcards are not supported, even if they are specified.
-
-
Examples
When you execute the LOAD DATA statement, you can use \N to represent NULL.
Import data from a server-side file
Example 1: Import data from a server-side file.
-
Set the global secure file path.
SET GLOBAL secure_file_priv = "/"
Query OK, 0 rows affected
obclinet> \q
ByeinfoThe
GLOBALvariable, so you need to execute\qto exit and make it effective. -
After reconnecting to the database, import data from an external file.
LOAD DATA INFILE 'test.sql' INTO TABLE t1;
Query OK, 0 rows affected
Example 2: Enable direct load by using the APPEND hint.
LOAD DATA /*+ PARALLEL(4) APPEND */
INFILE '/home/admin/a.csv'
INTO TABLE t;
Example 3: Import a CSV file.
-
Import all columns from the
test1.csvfile.load data /*+ direct(true,0) parallel(2)*/
from files(
location = "data/csv",
format = (
type = 'csv',
field_delimiter = ',',
parse_header = true,
skip_blank_lines = true
),
pattern = 'test1.csv')
into table t1; -
Read the
c1andc2columns from thedata/csvpath of thetest1.csvfile and import them to thecol1andcol2columns of thet1table.load data /*+ direct(true,0) parallel(2)*/
from (
select c1, c2 from files(
location = 'data/csv'
format = (
type = 'csv',
field_delimiter = ',',
parse_header = true,
skip_blank_lines = true
),
pattern = 'test1.csv'))
into table t1 (col1, col2);
Example 4: Import a PARQUET file.
load data /*+ direct(true,0) parallel(2)*/
from files(
location = "data/parquet",
format = ( type = 'PARQUET'),
pattern = 'test1.parquet')
into table t1;
Example 5: Import an ORC file.
load data /*+ direct(true,0) parallel(2)*/
from files(
location = "data/orc",
format = ( type = 'ORC'),
pattern = 'test1.orc')
into table t1;
Example 5: Import an ODPS file.
load data /*+ direct(true,0) parallel(2)*/
from source (
type = 'ODPS',
accessid = '$ODPS_ACCESSID',
accesskey = '******',
endpoint= '$ODPS_ENDPOINT',
project_name = 'example_project',
schema_name = '',
table_name = 'example_table',
quota_name = '',
compression_code = '')
into table t1;
Import data from a client-side (local) file
Example 1: Import data from a local file to the seekdb table.
-
Open a terminal or command prompt window and enter the following command to start the client.
mysql -h127.0.0.1 -uroot -P2881 -p****** -A -DtestThe return result is as follows:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 3221504802
Server version: 5.7.25 OceanBase 4.3.5.3 SeekDB (r1.0.0.0) (Built 100000262025111218-5343637512e28c346f938516af53b7879d4d5974 Nov 12 2025)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> -
In the client, execute the
LOAD DATA LOCAL INFILEstatement to load the local data file.LOAD DATA LOCAL INFILE '/home/admin/test_data/tbl1.csv' INTO TABLE tbl1 FIELDS TERMINATED BY ',';The return result is as follows:
Query OK, 3 rows affected
Records: 3 Deleted: 0 Skipped: 0 Warnings: 0
Example 2: Directly import a compressed file by setting COMPRESSION.
LOAD DATA LOCAL INFILE '/your/file/lineitem.tbl.gz'
INTO TABLE lineitem
COMPRESSION GZIP
FIELDS TERMINATED BY '|';
Example 3: Specify a partition for direct load by using PARTITION.
- Specify a partition for direct load
load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0, p1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
- Specify a subpartition for direct load
load data /*+ direct(true,0) parallel(2) load_batch_size(100) */
infile "$FILE_PATH"
into table t1 partition(p0sp0, p1sp1)
fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from an OSS file
Example 1: Enable direct load by using the direct(bool, int) hint. The direct load file is stored in OSS.
load data /*+ parallel(1) direct(false,0)*/ remote_oss infile 'oss://antsys-oceanbasebackup/backup_rd/xiaotao.ht/lineitem2.tbl?host=***.oss-cdn.***&access_id=***&access_key=***' into table lineitem fields terminated by '|' enclosed by '' lines starting by '' terminated by '\n';
Import data from a server-side file in URL external table format
The IP addresses in the commands in the examples are desensitized. When you verify, you need to specify the actual IP address of your machine.
The following example shows how to create an external table based on the location of an external file in the local file system or in seekdb.
-
Prepare an external file.
Execute the following command to create a file named
column_conv.csvin the/home/admin/test_csvdirectory of the node where you log in to seekdb.vi column_conv.csvThe content of the file is as follows:
1,short,short
2,long_text,long_text
3,long_text,long_text -
Set the import file path.
tipFor security reasons, you can set the
secure_file_privsystem variable only by using a local Unix socket to connect to the database and execute an SQL statement to modify the global variable. For more information, see secure_file_priv.-
Execute the following command to log in to the node where seekdb is located.
ssh admin@10.10.10.1 -
Execute the following command to connect to the
mysql001tenant by using a local Unix socket.mysql -S /home/admin/oceanbase/run/sql.sock -uroot -p****** -
Execute the following SQL statement to set the import path to
/home/admin/test_csv.SET GLOBAL secure_file_priv = "/home/admin/test_csv";
-
-
Reconnect to seekdb.
Here is an example:
mysql -h127.0.0.1 -uroot -P2881 -p****** -A -Ddb_test -
Create a table named
test_tbl1.CREATE TABLE test_tbl1(col1 VARCHAR(5), col2 VARCHAR(5), col3 VARCHAR(5)); -
Use the
LOAD DATAstatement to import data to thetest_tbl1table in URL external table syntax and specify error diagnostics.LOAD DATA FROM FILES(
LOCATION = '/home/admin/test_csv',
FORMAT = (
TYPE = 'csv',
FIELD_DELIMITER = ','),
PATTERN = 'column_conv.csv')
INTO TABLE test_tbl1
LOG ERRORS REJECT LIMIT UNLIMITED;The return result is as follows:
Query OK, 1 row affected, 2 warnings
Records: 1 Deleted: 0 Skipped: 0 Warnings: 2 -
View the Warnings.
SHOW warnings;The return result is as follows:
+---------+------+----------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------------------------------------------------------------------------+
| Warning | 1406 | fail to scan file column_conv.csv at line 2 for column "db_test"."test_tbl1"."col2", error: Data too long for column |
| Warning | 1406 | fail to scan file column_conv.csv at line 3 for column "db_test"."test_tbl1"."col2", error: Data too long for column |
+---------+------+----------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.001 sec) -
View the data in the
test_tbl1table.SELECT * FROM test_tbl1;The return result is as follows:
+------+-------+-------+
| col1 | col2 | col3 |
+------+-------+-------+
| 1 | short | short |
+------+-------+-------+
1 row in set (0.001 sec)
References
- For more information about how to connect to seekdb, see Connection methods overview.
- For more information about how to use the
LOAD DATAstatement, see Import data by using the LOAD DATA statement. - For more information about how to bypass import by using the
LOAD DATAstatement, see Bypass import by using the LOAD DATA statement.