Export data to an external table
Applies to: ✅ Microsoft Fabric ✅ Azure Data Explorer
You can export data by defining an external table and exporting data to it. The table properties are specified when creating the external table. The export command references the external table by name.
Permissions
You must have at least Table Admin permissions to run this command.
Syntax
.export
[async
] to
table
externalTableName
[with
(
propertyName =
propertyValue [,
...])
] <|
query
Learn more about syntax conventions.
Parameters
Name | Type | Required | Description |
---|---|---|---|
externalTableName | string |
✔️ | The name of the external table to which to export. |
propertyName, propertyValue | string |
A comma-separated list of optional properties. | |
query | string |
✔️ | The export query. |
Supported properties
The following properties are supported as part of the export to external table command.
Property | Type | Description | Default |
---|---|---|---|
sizeLimit |
long |
The size limit in bytes of a single storage artifact written before compression. A full row group of size parquetRowGroupSize is written before checking whether this row group reaches the size limit and should start a new artifact. Valid range: 100 MB (default) to 1 GB. |
|
distributed |
bool |
Disable or enable distributed export. Setting to false is equivalent to single distribution hint. |
Default is true . |
distribution |
string |
Distribution hint (single , per_node , per_shard ). See more details in Distribution settings |
Default is per_node . |
distributionKind |
string |
Optionally switches to uniform distribution when the external table is partitioned by string partition. Valid values are uniform or default . See more details in Distribution settings |
|
concurrency |
Number | Hints the system how many partitions to run in parallel. See more details in Distribution settings | The default value is 16. |
spread |
Number | Hints the system how to distribute the partitions among nodes. See more details in Distribution settings | The default value is Min(64, number-of-nodes) . |
parquetRowGroupSize |
int |
Relevant only when data format is Parquet. Controls the row group size in the exported files. This value takes precedence over sizeLimit , meaning a full row group will be exported before checking whether this row group reaches the size limit and should start a new artifact. |
Default row group size is 100,000 records. |
Distribution settings
The distribution of an export to external table operation indicates the number of nodes and threads that are writing to storage concurrently. The default distribution depends on the external table partitioning:
External table partitioning | Default distribution |
---|---|
External table isn't partitioned, or partitioned by datetime column only |
Export is distributed per_node - all nodes are exporting concurrently. Each node writes the data assigned to that node. The number of files exported by a node is greater than one, only if the size of the data from that node exceeds sizeLimit . |
External table is partitioned by a string column | The data to export is moved between the nodes, such that each node writes a subset of the partition values. A single partition is always written by a single node. The number of files written per partition should be greater than one only if the data exceeds sizeLimit . If the external table includes several string partitions, then data is partitioned between the node based on the first partition. Therefore, the recommendation is to define the partition with most uniform distribution as the first one. |
Change the default distribution settings
Changing the default distribution settings can be useful in the following cases:
Use case | Description | Recommendation |
---|---|---|
Reduce the number of exported files | Export is creating too many small files, and you would like it to create a smaller number of larger files. | Set distribution =single or distributed =false (both are equivalent) in the command properties. Only a single thread performs the export. The downside of this is that the export operation can be slower, as concurrency is much reduced. |
Reduce the export duration | Increasing the concurrency of the export operation, to reduce its duration. | Set distribution =per_shard in the command properties. Doing so means concurrency of the write operations is per data shard, instead of per node. This is only relevant when exporting to an external table that isn't partitioned by string partition. This might create too much load on storage, potentially resulting in throttling. See Storage failures. |
Reduce the export duration for external tables that are partitioned by a string partition | If the partitions aren't uniformly distributed between the nodes, export might take a longer time to run. If one partition is much larger than the others, the node assigned to that partition does most of the export work, while the other nodes remain mostly idle. For more information, see Distribution settings. | There are several settings you can change: * If there's more than one string partition, define the one with best distribution first. * Set distributionKind =uniform in the command properties. This setting disables the default distribution settings for string-partitioned external tables. Export runs with per-node distribution and each node exports the data assigned to the node. A single partition might be written by several nodes, and the number of files increases accordingly. To increase concurrency even further, set distributionKind =uniform along with distribution =per_shard for highest concurrency (at the cost of potentially many more files written)* If the cause for slow export isn't outliers in the data, reduce duration by increasing concurrency, without changing partitioning settings. Use the hint.spread and hint.concurrency properties, which determine the concurrency of the partitioning. See partition operator. By default, the number of nodes exporting concurrently (the spread ) is the minimum value between 64 and the number of nodes. Setting spread to a higher number than number of nodes increases the concurrency on each node (max value for spread is 64). |
Authentication and authorization
In order to export to an external table, you must set up write permissions. For more information, see the Write permissions for Azure Storage external table or SQL Server external table.
Output
Output parameter | Type | Description |
---|---|---|
ExternalTableName | string |
The name of the external table. |
Path | string |
Output path. |
NumRecords | string |
Number of records exported to path. |
Notes
The export query output schema must match the schema of the external table, including all columns defined by the partitions. For example, if the table is partitioned by DateTime, the query output schema must have a Timestamp column matching the TimestampColumnName. This column name is defined in the external table partitioning definition.
It isn't possible to override the external table properties using the export command. For example, you can't export data in Parquet format to an external table whose data format is CSV.
If the external table is partitioned, exported artifacts are written to their respective directories according to the partition definitions. For an example, see partitioned external table example.
- If a partition value is null/empty or is an invalid directory value, per the definitions of the target storage, the partition value is replaced with a default value of
__DEFAULT_PARTITION__
.
- If a partition value is null/empty or is an invalid directory value, per the definitions of the target storage, the partition value is replaced with a default value of
For suggestions to overcome storage errors during export commands, see failures during export commands.
External table columns are mapped to suitable target format data types, according to data types mapping rules.
Parquet native export is a more performant, resource light export mechanism. An exported
datetime
column is currently unsupported by Synapse SQLCOPY
.
Number of files
The number of files written per partition depends on the distribution settings of the export operation:
If the external table includes
datetime
partitions only, or no partitions at all, the number of files written for each partition that exists, should be similar to the number of nodes (or more, ifsizeLimit
is reached). When the export operation is distributed, all nodes export concurrently. To disable distribution, so that only a single node does the writes, setdistributed
to false. This process creates fewer files, but reduces the export performance.If the external table includes a partition by a string column, the number of exported files should be a single file per partition (or more, if
sizeLimit
is reached). All nodes still participate in the export (operation is distributed), but each partition is assigned to a specific node. Settingdistributed
to false, causes only a single node to do the export, but behavior remains the same (a single file written per partition).
Examples
Non-partitioned external table example
The following example exports data from table T
to the ExternalBlob
table. ExternalBlob
is a non-partitioned external table.
.export to table ExternalBlob <| T
Output
ExternalTableName | Path | NumRecords |
---|---|---|
ExternalBlob | http://storage1.blob.core.windows.net/externaltable1cont1/1_58017c550b384c0db0fea61a8661333e.csv | 10 |
Partitioned external table example
The following example first creates a partitioned external table, PartitionedExternalBlob
with a specified blob storage location. The data is stored in CSV format with a path format which organizes the data by customer name and date.
.create external table PartitionedExternalBlob (Timestamp:datetime, CustomerName:string)
kind=blob
partition by (CustomerName:string=CustomerName, Date:datetime=startofday(Timestamp))
pathformat = ("CustomerName=" CustomerName "/" datetime_pattern("yyyy/MM/dd", Date))
dataformat=csv
(
h@'http://storageaccount.blob.core.windows.net/container1;secretKey'
)
It then exports data from table T
to the PartitionedExternalBlob
external table.
.export to table PartitionedExternalBlob <| T
Output
ExternalTableName | Path | NumRecords |
---|---|---|
ExternalBlob | http://storageaccount.blob.core.windows.net/container1/CustomerName=customer1/2019/01/01/fa36f35c-c064-414d-b8e2-e75cf157ec35_1_58017c550b384c0db0fea61a8661333e.csv | 10 |
ExternalBlob | http://storageaccount.blob.core.windows.net/container1/CustomerName=customer2/2019/01/01/fa36f35c-c064-414d-b8e2-e75cf157ec35_2_b785beec2c004d93b7cd531208424dc9.csv | 10 |
If the command is executed asynchronously by using the async
keyword, the output is available using the show operation details command.