Hello @WeirdMan I will try to resume the maximum and please keep in mind that you can also rely on other answers !
For data ingestion, you can use the Copy Data activity to ingest data from the on-premises SQL Server databases into Azure Blob Storage or ADLS as an intermediate storage layer. This allows you to decouple the ingestion process from the transformation process, improving fault tolerance and scalability.
When it comes to data transformations, you can use Mapping Data Flows in ADF to perform complex transformations such as pivoting, unpivoting, and aggregations.
Alternatively, if the transformations are too complex for Mapping Data Flows, you can use Azure Databricks and invoke a Databricks notebook from ADF using the Databricks Notebook activity.
For Fault Tolerance, you can implement retry policies in the pipeline activities to handle transient failures. Also there is the checkpointing in Mapping Data Flows that helps you for the transformation process can resume from where it left off in case of a failure.
You can enable logging and monitoring in ADF to track pipeline execution and identify failures quickly.
Tips for scalability :
- Use parallel execution in the Copy Data activity to ingest data from multiple SQL Server databases simultaneously.
- Configure the IR to scale out automatically based on the data volume and transformation complexity.
- Use partitioning in Mapping Data Flows to process large datasets in parallel.