CREATE EXTERNAL TABLE
Description
This statement is used to create a new external table in the database.
An external table is a key feature of a database management system. In a database, tables are stored in the database's storage space, while the data of an external table is stored in an external storage service.
When creating an external table, you need to define the file path and file format of the data. After that, users can read the data from the external storage service through the external table. External tables are read-only and can be used in query statements, but DML operations cannot be performed on them. External tables do not support defining constraints or creating indexes.
Syntax
CREATE EXTERNAL TABLE table_name (column_definition_list)
LOCATION = 'file_name'
{FORMAT = (format_type_options)
| PROPERTIES = (properties_type_options)}
[PARTITION BY (column_name [, column_name ...])]
[PARTITION_TYPE = USER_SPECIFIED]
[PATTERN = 'regex_pattern']
[AUTO_REFRESH = 'xxx'];
column_definition_list:
column_definition [, column_definition ...]
column_definition:
column_name column_type [AS expr]
format_type_options:
type_csv_option
| type_parquet_option
| type_orc_option
type_csv_option:
TYPE = 'CSV'
LINE_DELIMITER = '<string>' | <expr>
FIELD_DELIMITER = '<string>' | <expr>
ESCAPE = '<character>' | <expr>
FIELD_OPTIONALLY_ENCLOSED_BY = '<character>' | <expr>
ENCODING = 'charset'
NULL_IF = ('<string>' | <expr>, '<string>' | <expr> ...)
SKIP_HEADER = <int>
SKIP_BLANK_LINES = {TRUE | FALSE}
TRIM_SPACE = {TRUE | FALSE}
EMPTY_FIELD_AS_NULL = {TRUE | FALSE}
IGNORE_LAST_EMPTY_COLUMN = {TRUE | FALSE}
type_parquet_option:
TYPE = 'PARQUET'
type_orc_option:
TYPE = 'ORC'
properties_type_options:
type_odps_option
type_odps_option:
TYPE = 'ODPS'
ACCESSID = '<string>'
ACCESSKEY = '<string>'
ENDPOINT = '<string>',
PROJECT_NAME = '<string>',
SCHEMA_NAME = '<string>',
TABLE_NAME = '<string>',
QUOTA_NAME = '<string>',
COMPRESSION_CODE = '<string>',
API_MODE = {"tunnel_api" | "storage_api"},
SPLIT = {"byte" | "row"}
Parameters
| Parameter | Description |
|---|---|
| table_name | The name of the external table to be created. |
| column_name | The name of the column of the external table. By default, the data columns in the file are automatically mapped to the columns defined in the external table in sequence. |
| column_type | The type of the column of the external table. Constraints such as DEFAULT, NOT NULL, UNIQUE, CHECK, PRIMARY KEY, and FOREIGN KEY cannot be defined. |
| AS expr | Manually specifies the column mapping. If the order of the columns in the file does not match the order of the columns defined in the external table, you can use the pseudo column metadata$filecol{N} to specify the mapping between the columns of the external table and the Nth column in the file. For example, c2 INT AS (metadata$filecol4) indicates that the c2 column of the external table corresponds to the 4th column in the file. Note that if you manually specify column mappings, the automatic mapping relationships will be disabled, and all columns must be manually defined with mapping relationships. |
| LOCATION = 'file_name' | The path where the external table files are stored. Usually, the data files of the external table are stored in a separate directory, which can contain subdirectories. When you create an external table, it automatically collects all files in this directory. For more information, see file_name below. |
| FORMAT = (format_type_options) | The attributes of the external file format. Use TYPE to specify the file format for export, such as CSV, PARQUET, or ORC. The TYPE parameter cannot be empty. For more information, see format_type_options below. |
| PROPERTIES = (properties_type_options) | The attributes of the external file format. Use TYPE to specify the file format for export, such as ODPS. The TYPE parameter cannot be empty. For more information, see properties_type_options below. |
| PATTERN | The regular expression pattern for filtering files in the LOCATION directory. For each file path in the LOCATION directory, if the path matches the pattern, the external table accesses the file; otherwise, it skips the file. If you do not specify this parameter, the external table accesses all files in the LOCATION directory by default. The external table saves the list of files in the LOCATION directory that match the PATTERN in the database system table. When the external table scans files, it accesses the external files based on this list. |
| PARTITION_TYPE = USER_SPECIFIED | If you want to manually add and delete partitions instead of letting the external table automatically manage partitions, specify the PARTITION_TYPE = USER_SPECIFIED parameter. |
| AUTO_REFRESH = 'xxx' | Specifies whether to enable automatic refresh for the external table. Valid values:
|
file_name
The path of the external file is in the following format:
-
If the file is stored locally, the
LOCATIONparameter is in theLOCATION = '[file://] local_file_path'format. In this case,local_file_pathcan be a relative path or an absolute path. If you specify a relative path, the current directory must be the installation directory of seekdb. Thesecure_file_privparameter specifies the file path that seekdb can access.local_file_pathmust be a subpath ofsecure_file_priv. -
If the file is stored remotely, the
LOCATIONparameter is in the following format:tipWhen you use an object storage path, the parameters of the object storage path are separated by
&. Make sure that the parameter values contain only uppercase and lowercase letters, digits, and the characters/-_$+=, and wildcards. Otherwise, the setting may fail.- If the file is stored in OSS or S3, the
LOCATIONparameter is in theLOCATION = '{oss\|s3}://$ACCESS_ID:$ACCESS_KEY@$HOST:s3_region/remote_file_path'format. In this case,$ACCESS_ID,$ACCESS_KEY, and$HOSTare the access information required for accessing Alibaba Cloud OSS, AWS S3, and object storage services that are compatible with the S3 protocol.s3_regionis the region information selected when you use S3. These sensitive access information are stored in the system table of the database in an encrypted manner.
- If the file is stored in OSS or S3, the
format_type_options
-
TYPE = 'CSV': specifies the format of the external file as CSV. It also contains the following fields:-
LINE_DELIMITER: specifies the line delimiter of the CSV file. The default value isLINE_DELIMITER='\n'. -
FIELD_DELIMITER: specifies the column delimiter of the CSV file. The default value isFIELD_DELIMITER='\t'. -
ESCAPE: specifies the escape character of the CSV file. It can be only one byte. The default value isESCAPE ='\'. -
FIELD_OPTIONALLY_ENCLOSED_BY: specifies the character that wraps the field values in the CSV file. The default value is empty.tipIf the external file contains
NULLvalues (non-stringNULLvalues, that is, values other than "NULL"), you must explicitly configure theFIELD_OPTIONALLY_ENCLOSED_BYparameter and the value of the parameter cannot be empty. -
ENCODING: specifies the character set encoding format of the file. For information about all character sets supported in MySQL mode, see Character sets. If you do not specify this parameter, the default value is UTF8MB4. -
NULL_IF: specifies the string that is treated asNULL. The default value is empty. -
SKIP_HEADER: specifies the number of header rows to skip. -
SKIP_BLANK_LINES: specifies whether to skip blank lines. The default value isFALSE, which indicates that blank lines are not skipped. -
TRIM_SPACE: specifies whether to remove leading and trailing spaces in the fields of the file. The default value isFALSE, which indicates that leading and trailing spaces in the fields of the file are not removed. -
EMPTY_FIELD_AS_NULL: specifies whether to treat empty strings asNULL. The default value isFALSE, which indicates that empty strings are not treated asNULL. -
IGNORE_LAST_EMPTY_COLUMN: specifies whether to ignore the last empty field at the end of a line. The default value isTRUE, which indicates that the last empty field at the end of a line is ignored.
-
-
TYPE = 'PARQUET': specifies the format of the external file asPARQUET. -
TYPE = 'ORC': specifies the format of the external file asORC.
properties_type_options
-
TYPE = 'ODPS': specifies the format of the external file asODPS. It also contains the following fields:-
ACCESSID: specifies the AccessKey ID of the Alibaba Cloud account for identity authentication. -
ACCESSKEY: specifies the AccessKey secret corresponding to the AccessKey ID for identity verification. -
ENDPOINT: specifies the connection address of the ODPS service. -
PROJECT_NAME: specifies the name of the ODPS project to be accessed. -
SCHEMA_NAME: specifies the schema name in ODPS. This parameter is optional. -
TABLE_NAME: specifies the name of the target table in ODPS. -
QUOTA_NAME: specifies the quota to be used. This parameter is optional. -
COMPRESSION_CODE: specifies the compression format of the data source. Valid values:ZLIB,ZSTD,LZ4, andODPS_LZ4. If you do not specify this parameter, compression is not enabled. -
API_MODE: specifies the API mode for accessing ODPS. Valid values:-
tunnel_api(default):-
No special network configuration is required. This mode is applicable to all deployment scenarios and does not require seekdb and MaxCompute to be deployed in the same VPC (Virtual Private Cloud).
-
No additional MaxCompute permissions are required. You only need to provide the AccessID and AccessKey for authentication. You do not need to enable the MaxCompute Storage API permission.
-
Applicable scenarios:
- seekdb and MaxCompute are not deployed in the same VPC.
- The MaxCompute Storage API is not enabled.
- The data transmission has low latency requirements.
-
-
storage_api:-
Network dependency: seekdb and MaxCompute must be deployed in the same VPC to achieve low-latency, high-throughput data transmission.
-
Permission dependency: You must enable the Storage API permission in MaxCompute and ensure that the AccessKey has the corresponding permissions.
-
Applicable scenarios:
- seekdb and MaxCompute are deployed in the same VPC.
- The MaxCompute Storage API is enabled.
- The data volume is extremely large or the real-time requirement is high.
-
-
-
SPLIT: specifies whether to split the task bybyteorrowwhen you use thestorage_apimode. If the data size of each row in a table varies significantly, setSPLITtobyte. Otherwise, setSPLITtorow.
-
Usage notes
- When an external file is deleted, the file no longer exists in the file list of the external table. In this case, the external table ignores the file.
- When an external file is modified, the external table accesses the latest data of the external file. If the modification of the external file and the query of the external table are performed concurrently, unexpected results may occur. Therefore, avoid modifying the external file while querying the external table.
- When a new file is added to the external directory, the external table accesses only the files in the file list. To add the new file to the file list of the external table, you must update the external table files.
Example
-
Prepare data. First, set the
secure_file_privpath to/home/admin/and place the CSV fileextdata.csvcorresponding to the external table data in the/home/admin/testdirectory of the local connection to seekdb.Here is an example of setting the global secure file path.
SET GLOBAL secure_file_priv = ""
Query OK, 0 rows affected
obclinet> \q
ByeinfoSince
secure_file_privis aGLOBALvariable, you need to execute\qto exit and make the setting take effect.The content of the CSV file is as follows:
1,'Dave','Smith','dsmith@outlook.com','friend',32
2,'Xena','Johnson','xjonson@outlook.com','contact',45
3,'Fred','Jackon','fjackson@outlook.com','co-worker',19
4,'Alma','Tyler','atyler@outlook.com','friend',53 -
Log in to the database and create the external table
contacts.CREATE EXTERNAL TABLE contacts (
id INT,
firstname VARCHAR(100),
lastname VARCHAR(100),
email VARCHAR(255),
category CHAR(30),
age NUMBER
)
LOCATION = '/home/admin/test'
FORMAT = (
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY ='\''
)
PATTERN = 'extdata.csv'; -
Query the data in the external table
contacts.SELECT * FROM contacts;
+----+-----------+----------+----------------------+-----------+------+
| id | firstname | lastname | email | category | age |
+----+-----------+----------+----------------------+-----------+------+
| 1 | Dave | Smith | dsmith@outlook.com | friend | 32 |
| 2 | Xena | Johnson | xjonson@outlook.com | contact | 45 |
| 3 | Fred | Jackon | fjackson@outlook.com | co-worker | 19 |
| 4 | Alma | Tyler | atyler@outlook.com | friend | 53 |
+----+-----------+----------+----------------------+-----------+------+
4 rows in set (0.003 sec)