Manage integration runtimes
In Data Factory, an activity defines the action to be performed. A linked service defines a target data store or a compute service. An integration runtime provides the infrastructure for the activity and linked services.
Integration Runtime is referenced by the linked service or activity, and provides the compute environment where the activity either runs on or gets dispatched from. This way, the activity can be performed in the region closest possible to the target data store or compute service in the most performant way while meeting security and compliance needs.
In short, the Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory. It provides the following data integration capabilities across different network environments, including:
- Data Flow: Execute a Data Flow in managed Azure compute environment.
- Data movement: Copy data across data stores in public network and data stores in private network (on-premises or virtual private network). It provides support for built-in connectors, format conversion, column mapping, and performant and scalable data transfer.
- Activity dispatch: Dispatch and monitor transformation activities running on a variety of compute services such as Azure Databricks, Azure HDInsight, Azure Machine Learning, Azure SQL Database, SQL Server, and more.
- SSIS package execution: Natively execute SQL Server Integration Services (SSIS) packages in a managed Azure compute environment.
Whenever an Azure Data Factory instance is created, a default Integration Runtime environment is created that supports operations on cloud data stores and compute services in public network. This can be viewed when the integration runtime is set to Auto-Resolve
Integration runtime types
Data Factory offers three types of Integration Runtime, and you should choose the type that best serve the data integration capabilities and network environment needs you are looking for. These three types are:
- Azure
- Self-hosted
- Azure-SSIS
You can explicitly define the Integration Runtime setting in the connectVia property, if this is not defined, then the default Integration Runtime is used with the property set to Auto-Resolve.
The following table describes the capabilities and network support for each of the integration runtime types:
IR type | Public network | Private network |
---|---|---|
Azure | Data Flow | Data Flow |
Data movement | Data movement | |
Activity Dispatch | Activity Dispatch | |
Self-hosted | Data movement | Data movement |
Activity dispatch | Activity dispatch | |
Azure-SSIS | SSIS package execution | SSIS package execution |
Determining which integration runtime to use
There are a range of factors that affect the Integration Runtime that you will use. The following is a guide that will help you select the right IR
Copy activity
For the Copy activity, it requires source and sink linked services to define the direction of data flow. The following logic is used to determine which integration runtime instance is used to perform the copy:
Copying between two cloud data sources: when both source and sink linked services are using Azure IR, ADF will use the regional Azure IR if you specified, or auto determine a location of Azure IR if you choose the auto-resolve IR (default) as described in Integration runtime location section.
Copying between a cloud data source and a data source in private network: if either source or sink linked service points to a self-hosted IR, the copy activity is executed on that self-hosted Integration Runtime.
Copying between two data sources in private network: both the source and sink linked Service must point to the same instance of integration runtime, and that integration runtime is used to execute the copy Activity.
Lookup and GetMetadata activity
The Lookup and GetMetadata activity is executed on the integration runtime associated to the data store linked service.
Transformation activity
Each transformation activity has a target compute Linked Service, which points to an integration runtime. This integration runtime instance is where the transformation activity is dispatched from.
Data Flow activity
Data Flow activity is executed on the integration runtime associated to it.