Skip to main content

SELECT INTO

Description

This statement is used to store the query results in a variable or a file. Specifically:

  • SELECT ... INTO OUTFILE stores the result set in an external file and allows you to specify the output format.

    info

    When using SELECT ... INTO OUTFILE, you can use \N to represent NULL.

  • SELECT ... INTO DUMPFILE writes an unformatted single line to an external file.

  • SELECT ... INTO var_list stores the result set in a variable.

Privilege requirements

To execute the SELECT INTO statement, you must have the FILE privilege and the SELECT privilege on the corresponding table. For more information about the privileges of seekdb, see Privilege classification of seekdb.

Here is an example:

To grant the FILE privilege to a user, you can use the following command:

GRANT FILE ON *.* TO user_name;

where user_name is the user who needs to execute the SELECT INTO command.

Syntax

select_stmt
INTO {OUTFILE 'file_name' [PARTITION BY part_expr] [{CHARSET | CHARACTER SET} charset_name] [field_opt] [line_opt] [file_opt]
| OUTFILE 'file_name' [PARTITION BY part_expr] [FORMAT = (external_file_format_list)] [file_opt]
| DUMPFILE 'file_name'
| into_var_list};

field_opt:
{COLUMNS | FIELDS} field_term_list

field_term_list:
field_term [, field_term ...]

field_term:
{[OPTIONALLY] ENCLOSED | TERMINATED | ESCAPED} BY string

line_opt:
LINES line_term_list

line_term_list:
line_term [line_term]

line_term:
{STARTING | TERMINATED} BY string

file_opt:
file_option [, file_option ...]

file_option:
SINGLE [=] {TRUE | FALSE}
| MAX_FILE_SIZE [=] {int | string}
| BUFFER_SIZE [=] {int | string}

external_file_format_list:
type_csv_option
| type_parquet_option
| type_orc_option

type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
FIELD_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
COMPRESSION = [NONE, GZIP, DEFLATE, ZSTD] | '<string>'
FILE_EXTENSION = ['<string>']

type_parquet_option:
TYPE = 'PARQUET',
COMPRESSION = '<string>'
ROW_GROUP_SIZE = '<string>' | <int>

type_orc_option:
TYPE = 'ORC'
COMPRESSION = '<string>'
COMPRESSION_BLOCK_SIZE = '<string>' | <int>
STRIPE_SIZE = '<string>' | <int>
ROW_INDEX_STRIDE = <int>

Parameters

ParameterDescription
select_stmtSpecifies the query statement to be executed. select_stmt must return a result set. For more information about the syntax and options of the query statement, see SELECT.
NoteThe INTO clause can also be placed before the FROM clause in the query statement. For example, SELECT * INTO OUTFILE '/home/admin/test_data/test_tbl1.csv' FROM test_tbl1 WHERE id > 5;.
file_nameSpecifies the path and name of the exported file. For more information, see file_name.
PARTITION BY part_expr
Note In seekdb, you can specify the partitioning method for exported data. This is an optional parameter. The value of part_expr is used as part of the export path. For each row of data, the value of part_expr is calculated. Rows with the same value of part_expr are placed in the same partition and exported to the same directory.
Notice
  • When you export data by partition, set SINGLE = FALSE to allow exporting to multiple files.
  • Currently, you can import data exported by partition only to OSS.
FORMAT = (external_file_format_list)Optional. The FORMAT clause specifies the file format of the exported file. You can use the TYPE clause to specify the file format as CSV, PARQUET, or ORC. The TYPE clause cannot be empty. For more information, see external_file_format_list.
CHARSET | CHARACTER SET charset_nameOptional. Specifies the character set for the exported file. charset_name specifies the name of the character set.
field_optOptional. Specifies the format of the exported fields. You can use the FIELDS or COLUMNS clause to specify the format. For more information, see field_term.
line_optOptional. Specifies the start and end delimiters for data rows. You can use the LINES clause to specify the start and end characters. For more information, see line_term.
file_optOptional. Specifies whether to export data to multiple files and the size of each file when exporting to multiple files. For more information, see file_option.

file_name

file_name has the following format:

  • When you save the exported file on an OBServer node, the format is: /\$PATH/\$FILENAME, where the parameters are described as follows:

    • \$PATH: specifies the path to save the exported file on the OBServer node.
    • \$FILENAME: specifies the name of the exported file. When SINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the prefix, the default prefix data is generated. The system generates the suffix.
  • When you save the exported file on Alibaba Cloud Object Storage Service (OSS), the format is: oss://\$PATH/\$FILENAME/?host=\$HOST&access_id=\$ACCESS_ID&access_key=\$ACCESSKEY, where the parameters are described as follows:

    • \$PATH: specifies the path to save the exported file on the OSS bucket.
    • \$FILENAME: specifies the name of the exported file. When SINGLE = FALSE, it specifies the prefix of the exported file. If you do not specify the prefix, the default prefix data is generated. The system generates the suffix.
    • \$HOST: specifies the hostname or the domain name of the CDN-accelerated endpoint of the OSS service.
    • \$ACCESS_ID: specifies the Access Key ID for accessing the OSS service, which is used for authentication.
    • \$ACCESSKEY: specifies the Access Key secret for accessing the OSS service, which is used for authentication.
    info
    • In seekdb, SELECT INTO supports S3 and object storage services based on the S3 protocol as the destination for data export.
    • Alibaba Cloud OSS has a file size limit. Therefore, files larger than 5 GB are split into multiple files, each smaller than 5 GB, when exported to OSS.

field_term

  • [OPTIONALLY] ENCLOSED BY string: specifies the symbol for wrapping field values. By default, no quotation marks are used. For example, ENCLOSED BY '"' indicates that character values are enclosed in double quotation marks. If you use the OPTIONALLY keyword, only string values are wrapped with the specified character.
  • TERMINATED BY string: specifies the symbol between field values. For example, TERMINATED BY ',' specifies a comma as the delimiter between two field values.
  • ESCAPED BY string: specifies the escape character for handling special characters or parsing data in special formats. By default, the backslash (\) is used as the escape character.

line_term

  • STARTING BY string: specifies the starting character for each line.
  • TERMINATED BY string: specifies the ending character for each line. By default, a newline character is used. For example, ... LINES TERMINATED BY '\n' ... indicates that a newline character is used as the delimiter for the end of a line.

file_option

  • SINGLE [=] {TRUE | FALSE}: specifies whether to export data to a single file or multiple files.

    • SINGLE [=] TRUE: the default value, indicating that data can be exported to only one file.

    • SINGLE [=] FALSE: indicating that data can be exported to multiple files.

      tip

      When the parallelism is greater than 1 and SINGLE = FALSE, data can be exported to multiple files, achieving parallel read and write and improving the export speed.

  • MAX_FILE_SIZE [=] {int | string}: specifies the size of a single file during export. This option is effective only when SINGLE = FALSE.

  • BUFFER_SIZE [=] {int | string}: specifies the amount of memory allocated to each thread for each partition (or a single partition if no partitioning is specified) during export. The default value is 1 MB.

    info
    • BUFFER_SIZE is used for export performance tuning. If your machine has sufficient memory and you want to improve the export efficiency, you can set a larger value (such as 4 MB). If your machine has insufficient memory, you can set a smaller value (such as 4 KB). If you set the value to 0, all partitions in a single thread share a common memory block.
    • seekdb supports the BUFFER_SIZE parameter.

external_file_format_list

  • When TYPE = 'CSV', the following fields are included:

    • LINE_DELIMITER: specifies the line delimiter for CSV files. The default value is LINE_DELIMITER='\n'.
    • FIELD_DELIMITER: specifies the column delimiter for CSV files. This is an optional parameter. The default value is FIELD_DELIMITER='\t'.
    • ESCAPE: specifies the escape character for CSV files. It must be a single byte. The default value is ESCAPE ='\'.
    • FIELD_OPTIONALLY_ENCLOSED_BY: specifies the symbol for wrapping field values in CSV files. This is an optional parameter. The default value is an empty string. If you use this option, only field values of specific types (such as CHAR, VARCHAR, TEXT, and JSON) are wrapped with the specified symbol.
    • FIELD_ENCLOSED_BY: specifies the symbol for wrapping field values in CSV files. This is an optional parameter. The default value is an empty string, indicating that no wrapping is applied. If you use this option, all field values are wrapped with the specified symbol.
    • ENCODING: specifies the character set encoding format of the file. If you do not specify this parameter, the default value is UTF8MB4.
    • COMPRESSION: specifies the compression format of the file. This is an optional parameter. Supported compression formats include NONE, GZIP, DEFLATE, and ZSTD.
      • GZIP/DEFLATE: specifies GZIP compression.
      • ZSTD: specifies ZSTD compression.
      • NONE: specifies no compression (default value).
    • FILE_EXTENSION: specifies the custom file extension for CSV files. This parameter is only applicable for multi-file exports and is supported only for CSV files. If you do not specify this parameter, the file extension is determined by the format type. The default file extension for CSV files is .csv.
  • When TYPE = 'PARQUET', the following fields are included:

    • COMPRESSION: specifies the compression format for PARQUET files. The default compression suffix is .parquet. Supported compression formats include UNCOMPRESSED (indicating no compression), SNAPPY, GZIP, BROTLI, ZSTD, LZ4, and LZ4_HADOOP. If you do not specify this parameter, the default value is UNCOMPRESSED.
    • ROW_GROUP_SIZE: specifies the size of a row group in PARQUET files, in bytes. You can specify a number or a string in the format '64MB'. If you do not specify this parameter, the default value is 256MB. We recommend that you use the default value.
  • When TYPE = 'ORC', the following fields are included:

    • COMPRESSION: specifies the compression format for ORC files. The default compression suffix is .orc. Supported compression formats include UNCOMPRESSED (indicating no compression), SNAPPY, ZLIB, LZ4, and ZSTD. If you do not specify this parameter, the default value is UNCOMPRESSED.
    • COMPRESSION_BLOCK_SIZE: specifies the size of data blocks during compression, in bytes. You can specify a number or a string in the format '64KB'. If you do not specify this parameter, the default value is 256KB. We recommend that you use the default value.
    • STRIPE_SIZE: specifies the size of a stripe in ORC files, in bytes. You can specify a number or a string in the format '64MB'. If you do not specify this parameter, the default value is 64MB. We recommend that you use the default value.
    • ROW_INDEX_STRIDE: specifies the frequency of index records. It defines how often an index is recorded. If you do not specify this parameter, the default value is 10000. We recommend that you use the default value.

Example

Export data to a local file

  1. Set the file path for export.

    To export files, you must set the secure_file_priv system variable to specify the path where the exported files can be accessed.

    tip

    For security reasons, you can only set the secure_file_priv system variable by connecting to the database through a local Unix socket. For more information, see secure_file_priv.

    1. Log in to the node where you want to connect to seekdb.

      ssh admin@xxx.xxx.xxx.xxx
    2. Execute the following command to connect to the mysql001 tenant through a local Unix socket.

      mysql -S /home/admin/oceanbase/run/sql.sock -uroot -p******
    3. Set the export path to /home/admin/test_data.

      SET GLOBAL secure_file_priv = "/home/admin/test_data";
    4. Log off.

  2. After reconnecting to the database, use the SELECT INTO OUTFILE statement to export data. Specify a comma as the delimiter between two field values; wrap string values with "; and use a newline character as the end marker.

    • To write a single file serially, specify the file name as test_tbl1.csv.

      SELECT /*+parallel(2)*/ * FROM test_tbl1
      INTO OUTFILE '/home/admin/test_data/test_tbl1.csv'
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
      LINES TERMINATED BY '\n';

      The return result is as follows:

      Query OK, 9 rows affected (0.087 sec)
    • To write multiple files in parallel, do not specify the file name (use the default prefix data), and ensure that each file does not exceed 4 MB in size.

      SELECT /*+parallel(2)*/ * FROM test_tbl1
      INTO OUTFILE '/home/admin/test_data/'
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
      LINES TERMINATED BY '\n'
      SINGLE = FALSE MAX_FILE_SIZE = '4MB';

      The return result is as follows:

      Query OK, 9 rows affected (0.093 sec)
    • To write multiple files in parallel, specify the file name prefix as dd2024, and ensure that each file does not exceed 4 MB in size.

      SELECT /*+parallel(2)*/ * FROM test_tbl1
      INTO OUTFILE '/home/admin/test_data/dd2024'
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
      LINES TERMINATED BY '\n'
      SINGLE = FALSE MAX_FILE_SIZE = '4MB';

      The return result is as follows:

      Query OK, 9 rows affected
    info
    • If multiple export tasks are performed to the same path at the same time, errors or partial data exports may occur. You can avoid these issues by setting the export path appropriately. For example, if you execute SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE; and SELECT /*+parallel(2)*/ * FROM t2 INTO OUTFILE 'test/data' SINGLE = FALSE; at the same time, an error may occur because the exported files have the same name. In this case, we recommend that you set the export paths to test/data1 and test/data2.
    • If SINGLE = FALSE and the export fails for reasons such as the file already existing, you can delete all files with the same prefix as the export target in the export directory or delete and recreate the export directory, and then retry the export. For example, if SELECT /*+parallel(2)*/ * FROM t1 INTO OUTFILE 'test/data' SINGLE = FALSE; fails, you can delete all files with the data prefix in the test directory, or directly delete and recreate the test directory, and then retry the export.
  3. Log in to the machine and check the exported file information in the /home/admin/test_data directory of the seekdb node.

ls

The return result is as follows:

data_0_0_0  data_0_1_0  dd2024_0_0_0  dd2024_0_1_0  test_tbl1.csv

In this example, test_tbl1.csv is the file name of the single-file serial export; data_0_0_0 and data_0_1_0 are the file names of the multiple-file parallel export without specified file names; and dd2024_0_0_0 and dd2024_0_1_0 are the file names of the multiple-file parallel export with the file name prefix dd2024.

Export data to an OSS bucket

Use the SELECT INTO OUTFILE statement to export data from the test_tbl2 table to the specified OSS storage location by partition. The partitioning is based on the combination of the col1 and col2 columns. Rows with the same values in these columns belong to the same partition and are exported to the same directory.

SELECT /*+parallel(3)*/ * FROM test_tbl2
INTO OUTFILE 'oss://$DATA_FOLDER_NAME/?host=$OSS_HOST&access_id=$OSS_ACCESS_ID&access_key=$OSS_ACCESS_KEY'
PARTITION BY CONCAT(col1,'/',col2)
SINGLE = FALSE BUFFER_SIZE = '2MB';

The storage location is specified by the $DATA_FOLDER_NAME variable. You also need to provide the OSS host address, Access ID, and Access Key.

References

Export data by using the OUTFILE statement