Open mirroring landing zone requirements and format
This article details the landing zone and table/column operation requirements for open mirroring in Microsoft Fabric.
Important
This feature is in preview.
Once you have created your open mirrored database via the Fabric portal or public API in your Fabric workspace, you get a landing zone URL in OneLake in the Home page of your mirrored database item. This landing zone is where your application to create a metadata file and land data in Parquet format (uncompressed, Snappy, GZIP, ZSTD).
Landing zone
For every mirrored database, there is a unique storage location in OneLake for metadata and delta tables. Open mirroring provides a landing zone folder for application to create a metadata file and push data into OneLake. Mirroring monitors these files in the landing zone and read the folder for new tables and data added.
For example, if you have tables (Table A
, Table B
, Table C
) to be created in the landing zone, create folders like the following URLs:
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database id>/LandingZone/TableA
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database id>/LandingZone/TableB
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database id>/LandingZone/TableC
Metadata file in the landing zone
Every table folder must contain a _metadata.json
file.
This table metadata file contains a JSON record to currently specify only the unique key columns as keyColumns
.
For example, to declare columns C1
and C2
as a compound unique key for the table:
{
"keyColumns": ["C1", "C2"],
}
If keyColumns
or _metadata.json
is not specified, then update/deletes are not possible. This file can be added anytime, but once added keyColumns
can't be changed.
Data file and format in the landing zone
Open mirroring supports Parquet as the landing zone file format with or without compression. Supported compression formats include Snappy, GZIP, and ZSTD.
All the Parquet files written to the landing zone have the following format:
<RowMarker><DataColumns>
RowMarker
: column name is__rowMarker__
(including two underscores before and afterrowMarker
).RowMaker
values:0
for INSERT1
for UPDATE2
for DELETE4
for UPSERT
Row order: All the logs in the file should be in natural order as applied in transaction. This is important for the same row being updated multiple times. Open mirroring applies the changes using the order in the files.
File order: Files should be added in monotonically increasing numbers.
File name: File name is 20 digits, like
00000000000000000001.parquet
for the first file, and00000000000000000002.parquet
for the second. File names should be in continuous numbers. Files will be deleted by the mirroring service automatically, but the last file will be left so that the publisher system can reference it to add the next file in sequence.
Initial load
For the initial load of data into an open mirrored database, all rows should have INSERT as row marker. Without RowMarker
data in a file, mirroring treats the entire file as an INSERT.
Incremental changes
Open mirroring reads incremental changes in order and applies them to the target Delta table. Order is implicit in the change log and in the order of the files.
Updated rows must contain the full row data, with all columns.
Here is some sample parquet data of the row history to change the EmployeeLocation
for EmployeeID
E0001 from Redmond to Bellevue. In this scenario, the EmployeeID
column has been marked as a key column in the metadata file in the landing zone.
__rowMarker__,EmployeeID,EmployeeLocation
0,E0001,Redmond
0,E0002,Redmond
0,E0003,Redmond
1,E0001,Bellevue
If key columns are updated, then it should be presented by a DELETE on previous key columns and an INSERT rows with new key and data. For example, the row history to change the RowMarker
unique identifier for EmployeeID
E0001 to E0002. You don't need to provide all column data for a DELETE row, only the key columns.
__rowMarker__,EmployeeID,EmployeeLocation
0,E0001,Bellevue
2,E0001,NULL
0,E0002,Bellevue
Table operations
Open mirroring supports table operations such as add, drop, and rename tables.
Add table
Open mirroring picks up any table added to landing zone by the application. Open mirroring scans for new tables in every iteration.
Drop table
Open mirroring keeps track of the folder name. If a table folder is deleted, open mirroring drops the table in the mirrored database.
If a folder is recreated, open mirroring drops the table and recreates it with the new data in the folder, accomplished by tracking the ETag for the folder.
When attempting to drop a table, you can try deleting the folder, but there is a chance that open mirroring is still using the data from the folder, causing a delete failure for publisher.
Rename table
To rename a table, drop and recreate the folder with initial and incremental data. Data will need to be repopulated to the renamed table.
Schema
A table path can be specified within a schema folder. A schema landing zone should have a <schemaname>.schema
folder name. There can be multiple schemas and there can be multiple tables in a schema.
For example, if you have schemas (Schema1
, Schema2
) and tables (Table A
, Table B
, Table C
) to be created in the landing zone, create folders like the following paths in OneLake:
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database id>/LandingZone/Schema1.schema/TableA
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database id>/LandingZone/Schema1.schema/TableB
https://onelake.dfs.fabric.microsoft.com/<workspace id>/<mirrored database id>/LandingZone/Schema2.schema/TableC
Table columns and column operations
Column types
- Simple parquet types are supported in the landing zone.
- Complex types should be written as a JSON string.
- Binary complex types like geography, images, etc. can be stored as binary type in the landing zone.
Add column
If new columns are added to the parquet files, open mirroring adds the columns to the delta tables.
Delete column
If a column is dropped from the new log files, open mirroring stores NULL
for those columns in new rows, and old rows have the columns present in the data. To delete the column, drop the table and create the table folder in the landing zone again, which will result into recreation of the Delta table with new schema and data.
Open mirroring always unions all the columns from previous version of added data. To remove a column, recreate the table/folder.
Change column type
To change a column type, drop and recreate the folder with initial and incremental data with the new column type. Providing a new column type without recreating the table results an error, and replication for that table will stop. Once the table folder is recreated, replication resumes with new data and schema.
Rename column
To rename a column, delete the table folder and recreate the folder with all the data and with the new column name.