Migrate data and pipelines from Azure Synapse Analytics to Microsoft Fabric
The first step in data and pipeline migration is to identify the data that you want to make available in OneLake, and the pipelines you intend to move.
You have two options for data migration:
- Option 1: Azure Data Lake Storage (ADLS) Gen2 as default storage. If you’re currently using ADLS Gen2 and want to avoid data copying, consider using OneLake shortcuts.
- Option 2: OneLake as default storage. If you want to move from ADLS Gen2 to OneLake as a storage layer, consider reading/writing from/to OneLake from your notebooks and Spark job definitions.
Data migration
Option 1: ADLS Gen2 as storage (shortcuts)
If you’re interacting with ADLS Gen2 and want to avoid data duplication, you can create a shortcut to the ADLS Gen2 source path in OneLake. You can create shortcuts within the Files and Tables sections of the lakehouse in Fabric with the following considerations:
- The Files section is the unmanaged area of the lake. If your data is in CSV, JSON, or Parquet format, we recommend creating a shortcut to this area.
- The Tables section is the managed area of the lake. All tables, both Spark-managed and unmanaged tables, are registered here. If your data is in Delta format, you can create a shortcut in this area and the automatic discovery process automatically registers those Delta tables in the lakehouse’s metastore.
Learn more on creating an ADLS Gen2 shortcut.
Option 2: OneLake as storage
To use OneLake as a storage layer and move data from ADLS Gen2, you should initially point the Azure Synapse Spark-related items to OneLake and then transfer the existing data to OneLake. For the former, see integrate OneLake with Azure Synapse Spark.
To move the existing data to OneLake, you have several options:
- mssparkutils fastcp: The mssparkutils library provides a fastcp API that enables you to copy data between from ADLS Gen2 to OneLake.
- AzCopy: You can use AzCopy command-line utility to copy data from ADLS Gen2 to OneLake.
- Azure Data Factory, Azure Synapse Analytics, and Data Factory in Fabric: Use copy activity to copy data to the lakehouse.
- Use shortcuts: You can enable ADLS Gen2 historical data in OneLake using shortcuts. No data copy needed.
- Azure Storage Explorer: You can move files from ADLS Gen2 location to OneLake using Azure Storage Explorer. See how integrate OneLake with Azure Storage Explorer.
Pipelines migration (Spark-related activities)
If your Azure Synapse data pipelines include notebook and/or Spark job definition activities, you will need to move those pipelines from Azure Synapse to Data Factory data pipelines in Fabric, and reference the target notebooks. The notebook activity is available in Data Factory data pipelines. See all supported data pipeline activities in Fabric here.
- For Spark-related data pipeline activity considerations, refer to differences between Azure Synapse Spark and Fabric.
- For notebook migration, refer to migrate notebooks from Azure Synapse to Fabric.
- For data pipeline migration, see migrate to Data Factory in Fabric.