What's new and planned for Data Factory in Microsoft Fabric
Important
The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.
Data Factory in Microsoft Fabric combines citizen data integration and pro data integration capabilities into a single, modern data integration experience. It provides connectivity to more than 100 relational and nonrelational databases, lakehouses, data warehouses, generic interfaces like REST APIs, OData, and more.
Dataflows: Dataflow Gen2 enables you to perform large-scale data transformations, and supports various output destinations that write to Azure SQL Database, Lakehouse, Data Warehouse, and more. The dataflows editor offers more than 300 transformations, including AI-based options, and lets you transform data easily with better flexibility than any other tool. Whether you're extracting data from an unstructured data source such as a web page or reshaping an existing table in the Power Query editor, you can easily apply Power Query's Data Extraction By Example, that uses artificial intelligence (AI) and simplifies the process.
Data pipelines: Data pipelines offer the capability to create versatile data orchestration workflows that bring together tasks like data extraction, loading into preferred data stores, notebook execution, SQL script execution, and more. You can quickly build powerful metadata-driven data pipelines that automate repetitive tasks. For example, loading and extracting data from different tables in a database, iterating through multiple containers in Azure Blob Storage, and more. Furthermore, with data pipelines, you can access the data from Microsoft 365, using the Microsoft Graph Data Connection (MGDC) connector.
Copy Job: Copy job simplifies the data ingestion experience with a streamlined and user-friendly process, moving data at petabyte-scale from any source to any destination. You can copy data with various data delivery styles, including batch copy, incremental copy and more.
Apache Airflow Job: Apache Airflow job is the next generation of Azure Data Factory's Workflow Orchestration Manager. It is a simple and efficient way to create and manage Apache Airflow orchestration jobs, enabling you to run Directed Acyclic Graphs (DAGs) at scale with ease. Apache Airflow job empowers you with a modern data integration experience to ingest, prepare, transform and orchestrate data from a rich set of data sources using code.
Database Mirroring: Database Mirroring in Fabric is a low-cost, low-latency solution, designed with open standards (e.g. Delta Lake table format). It enables you to replicate data and metadata from various systems quickly. Using Database Mirroring, you can continuously replicate your data estate into Microsoft Fabric OneLake for analytics. With a highly integrated, easy-to-use experience, you can now simplify how you can get started with your analytics needs.
To learn more, see the documentation.
Investment areas
Over the next few months, Data Factory in Microsoft Fabric will expand its connectivity options and continue to add to the rich library of transformations and data pipeline activities. Moreover, it enables you to perform real-time, high-performance data replication from operational databases, and bring this data into the lake for analytics.
Dataflow Gen2 CI/CD and Public APIs support
Estimated release timeline: Q4 2024
Release Type: Public preview
Dataflow Gen2 capabilities will be enhanced to support the following features in Fabric, including:
- Being able to include Dataflow Gen2 items in ALM deployment pipelines.
- Being able to leverage Dataflow Gen2 items with source control (Git integration) capabilities.
- Public CRUDLE APIs for Dataflow Gen2 items.
These are highly requested capabilities from many customers, and we're excited to make them available as a Preview feature.
Semantic Model Refresh Tables and Partitions
Estimated release timeline: Q4 2024
Release Type: Public preview
Pipeline users are very excited about our very popular Semantic Model Refresh pipeline activity. A common ask has been to improve their ELT processing pipeline by refreshing specific tables and partitions in their models. We’ve now enabled this feature making the pipeline activity the most effective way to refresh you Fabric semantic models!
Fabric Data Factory Pipeline Import and Export
Estimated release timeline: Q4 2024
Release Type: General availability
As a Data Factory pipeline developer, you will often want to export your pipeline definition to share it with other developers or to reuse it in other workspaces. We’ve now added the capability to export and import your Data Factory pipelines from your Fabric workspace. This powerful feature will enable even more collaborative capabilities and will be invaluable when troubleshoot your pipelines with our support teams.
Copilot for Data Factory (Data pipeline)
Estimated release timeline: Q4 2024
Release Type: Public preview
Copilot for Data Factory (Data pipeline) empowers customers to build data pipelines using natural language and provides troubleshooting guidance.
Mirroring for Azure SQL DB
Estimated release timeline: Q4 2024
Release Type: General availability
Mirroring provides a seamless no-ETL experience to integrate your existing Azure SQL DB data with the rest of your data in Microsoft Fabric. You can continuously replicate your Azure SQL DB data directly into Fabric OneLake in near real-time, without any effect on the performance of your transactional workloads.
Learn more about Mirroring in Microsoft Fabric
Open Mirroring
Estimated release timeline: Q4 2024
Release Type: Public preview
Open Mirroring, is a powerful feature that enhances Fabric’s extensibility by allowing any application or data provider to bring their data estate directly into OneLake with minimal effort. By enabling data providers and applications to write change data directly into a mirrored database within Fabric, Open Mirroring simplifies the handling of complex data changes, ensuring that all mirrored data is continuously up-to-date and ready for analysis.
Data Pipelines Public APIs SPN support
Estimated release timeline: Q4 2024
Release Type: Public preview
To make the use of pipeline REST APIs in Fabric much easier and more secure, we will enable SPN (service principal) support for public APIs.
Data Pipeline support for Fabric Workspace variables
Estimated release timeline: Q4 2024
Release Type: Public preview
When implementing CICD across your Fabric Data Factory pipeline environments, it is very important to update values from dev to test to prod,etc. By using variables inside of Fabric, you can replace values between environments and also share values across pipelines similar to ADF's global parameters.
On-premises data gateway auto-update
Estimated release timeline: Q1 2025
Release Type: Public preview
The on-premises data gateway auto-upgrade feature ensures that the gateway always runs the latest version, providing improved functionality, security updates, and new features without manual intervention. This feature simplifies the management of the gateway by automatically downloading and installing updates as they become available.
Data Pipeline support for VNET gateways
Estimated release timeline: Q1 2025
Release Type: Public preview
VNET data gateway will support Fabric Data Pipeline including pipeline copy activity and other pipeline activities. Customers will be able to securely connect to their data sources in pipeline via VNET data gateway.
Dataflow Gen2 Output Destination to SharePoint Files
Estimated release timeline: Q1 2025
Release Type: Public preview
After cleaning and preparing data with Dataflow Gen 2, this feature allows to select SharePoint files as its data destination. This feature makes it easy to export transformed data into a CSV file and store it in Microsoft SharePoint to be made available to everyone with permission to the site.
Data Pipeline support for Tumbling window triggers
Estimated release timeline: Q1 2025
Release Type: Public preview
Scheduling pipeline runs using time windows that are non-overlapping and can be "replayed" is a very important feature in pipelines that many ADF users have enjoyed using. We are super excited to bring this tumbling window feature to pipeline scheduling to Fabric Data Factory.
Azure Data Factory item
Estimated release timeline: Q1 2025
Release Type: General availability
We are super excited to announce the general availability of the Azure Data Factory item in Fabric. With this new capability, existing ADF users can quickly and easily make their data factories from Azure available to their Fabric workspace. Now you can manage, edit, and invoke your ADF pipelines directly from Fabric!
Data Pipeline Copy Activity support for additional sources
Estimated release timeline: Q1 2025
Release Type: General availability
We are expanding support for more source connectors in Copy activity, enabling customers to seamlessly copy data from a wide range of sources, including Teradata, Spark, Azure databricks delta lake, HubSpot, Cassandra, Salesforce Service Cloud, Oracle (bundled) and more.
Dataflows Gen 2 Parallelized Execution
Estimated release timeline: Q1 2025
Release Type: Public preview
"Users want a flexible way to define the logic of their Dataflow Gen2 transformations and parallelize the execution with different arguments. Today they need to create multiple dataflows or multiple queries within their single dataflow in order to have a logic that can be reused with different arguments.
As part of this enhancement, we will enable ways for users to set a ""foreach"" loop for their entire dataflow item driven from a standalone query that acts as the list of parameter values to iterate over and drive this containerized approach for parallelized and dynamic execution."
Data source identity management (Azure Key Vault)
Estimated release timeline: Q1 2025
Release Type: Public preview
Support for Azure Key Vault - You can store your keys and secrets in Azure Key Vault and connect to it. This way, you can manage your keys in a single place.
Mirroring for CosmosDB
Estimated release timeline: Q1 2025
Release Type: General availability
Mirroring provides a seamless no-ETL experience to integrate your existing Azure Cosmos DB data with the rest of your data in Microsoft Fabric. You can continuously replicate your Azure Cosmos DB data directly into Fabric OneLake in near real-time, without any effect on the performance of your transactional workloads.
Dataflow Gen2 CI/CD and Public APIs support
Estimated release timeline: Q1 2025
Release Type: General availability
Dataflow Gen2 items will support CI/CD capabilities in Fabric, including source control (Git integration) as well as ALM Deployment Pipelines. Additionally, customers will be able to programmatically interact with Dataflow Gen2 items in Fabric via the Fabric REST APIs, providing support for CRUDLE operations over Dataflow Gen2 items.
Dataflow Gen2 Public APIs SPN support
Estimated release timeline: Q1 2025
Release Type: Public preview
Dataflow Gen2 items will be supported via Fabric REST APIs with Service Principal authentication support.
Dataflow Gen2 Incremental Refresh
Estimated release timeline: Q1 2025
Release Type: General availability
At the end of September 2024, we released Dataflow Gen2 Incremental Refresh as a Public Preview feature. We will continue monitoring customer feedback and enhancing this feature leading up to its General Availability, planned for the end of Q1CY2025.
Dataflow Gen2 Incremental Refresh support for Lakehouse destination
Estimated release timeline: Q1 2025
Release Type: Public preview
Dataflow Gen2 Incremental Refresh optimizes dataflow execution in order to only retrieve the latest data changed in your dataflow's data sources, based on a datetime partition column. This ensures that data can be incrementally loaded into OneLake for downstream transformations or outputing to a dataflow output destination.
As part of this enhancement, we will provide direct support for Incremental Refresh to output data directly into Fabric Lakehouse tables.
Dataflow Gen2 Parameterization
Estimated release timeline: Q1 2025
Release Type: Public preview
Users are accustomed to running metadata-driven pipelines where they can inject variables or parameters into different activities of a pipeline and thus executing things in a more dynamic way: Create once, reuse multiple times.
As part of this enhancement, we will make it such that dataflows executed via a Data Pipeline in Fabric can be provided with parameter values for their existing dataflow parameters.
Dataflow Gen2 support for Save As new item
Estimated release timeline: Q1 2025
Release Type: Public preview
Customers often would like to recreate an existing dataflow as a new dataflow. Today, in order to accomplish this, they need to create the new Dataflow Gen2 item from scratch and copy-paste their existing queries, or leverage the Export/Import Power Query template capabilities. This, however, is not only inconvenient due to unnecessary steps, but it also does not carry over additional dataflow settings, such as Scheduled Refresh and other item properties (name, description, sensitivity label, etc.).
As part of this enhancement, we will provide a quick ""Save As"" gesture within the Dataflow Gen2 editing experience, allowing users to save their existing dataflow as a new dataflow.
Dataflow Gen1 support for Save As Dataflow Gen2 new item
Estimated release timeline: Q1 2025
Release Type: Public preview
Customers often would like to recreate an existing Dataflow Gen1 item as a new Dataflow Gen2 item. Today, in order to accomplish this, they need to create the new Dataflow Gen2 item from scratch and copy-paste their existing queries, or leverage the Export/Import Power Query template capabilities. This, however, is not only inconvenient due to unnecessary steps, but it also does not carry over additional dataflow settings, such as Scheduled Refresh and other item properties (name, description, sensitivity label, etc.).
As part of this enhancement, we will provide a quick ""Save As"" gesture within the Dataflow Gen1 editing experience, allowing users to save their existing Dataflow Gen1 item as a new Dataflow Gen2 item.
Copy Job - Incremental copy without users having to specify watermark columns
Estimated release timeline: Q1 2025
Release Type: Public preview
We will introduce native CDC (Change Data Capture) capability in Copy Job for key connectors. This means incremental copy will automatically detect changes—no need for customers to specify incremental columns.
Copy Job
Estimated release timeline: Q1 2025
Release Type: General availability
Copy Job in Data Factory elevates the data ingestion experience to a more streamlined and user-friendly process from any source to any destination. Now, copying your data is easier than ever before. Copy job supports various data delivery styles, including both batch copy and incremental copy, offering the flexibility to meet your specific needs.
Copy Job CI/CD support
Estimated release timeline: Q1 2025
Release Type: Public preview
Copy Job items will support CI/CD capabilities in Fabric, including source control (Git integration) as well as ALM Deployment Pipelines.
Copy Job Public APIs support
Estimated release timeline: Q1 2025
Release Type: Public preview
Customers will be able to programmatically interact with Copy Job items in Fabric via the Fabric Public APIs, providing support for CRUDLE operations over Copy Job Items
Dataflow Gen2 support for additional Fast Copy sources
Estimated release timeline: Q1 2025
Release Type: Public preview
We are expanding Fast Copy in Dataflow Gen2 to support more source connectors, allowing customers to load data with higher performance. New connectors will include Fabric Lakehouse files, Google BigQuery, Amazon Redshift, and more—enabling faster and more efficient data integration.
Copy Job support for additional sources
Estimated release timeline: Q1 2025
Release Type: Public preview
We are expanding support for more source connectors in Copy Job, enabling customers to seamlessly copy data from a wide range of sources. At the same time, we will keep keep the simplified experience while offering diverse copy patterns, including both full copy and incremental copy.
Data Pipeline support for OneLake storage event triggers
Estimated release timeline: Q1 2025
Release Type: Public preview
A popular mechanism used to invoke pipelines in Fabric Data Factory is using the file trigger. When file events (i.e. file arrival, file delete …) are detected against Blob store or ADLS Gen2, your Fabric Data Factory pipeline will be invoked. Now we have added OneLake file events to the trigger event types in Fabric.
Enabling customers to parameterize their connections
Estimated release timeline: Q1 2025
Release Type: Public preview
Connections provide a common framework for defining connectivity and authentication for your data stores. These connections can be shared across different items. With parameterization support, you'll be able to build complex and reusable pipelines, notebooks, dataflows, and other item types.
Data pipeline support for DBT
Estimated release timeline: Q1 2025
Release Type: Public preview
DBT CLI Orchestration (Data Build Tool): Incorporates the data build tool (dbt) for data transformation workflows.
User-assigned Managed Identities support in Connections
Estimated release timeline: Q2 2025
Release Type: Public preview
This enhancement to support user-assigned managed identities in Connections provides significant value by offering a more secure and flexible authentication method for accessing data resources. It avoids hardcoding credentials, simplifies management by eliminating the need to rotate secrets, ensures compliance with security policies, integrates seamlessly with Azure services, and supports scalability in connections by allowing multiple instances to share the same identity.
Shipped feature(s)
Azure Data Factory in Fabric
Shipped (Q3 2024)
Release Type: Public preview
Bring your existing Azure Data Factory (ADF) to your Fabric workspace! This is a new preview capability that allows you to connect to your existing ADF factories from your Fabric workspace.
You will now be able to fully manage your ADF factories directly from the Fabric workspace UI! Once your ADF is linked to your Fabric workspace, you’ll be able to trigger, execute, and monitor your pipelines as you do in ADF but directly inside of Fabric.
Support for invoking cross-workspace data pipelines
Shipped (Q3 2024)
Release Type: Public preview
Invoke Pipelines activity update: We are enabling some new and exciting updates to the Invoke Pipeline activity. In response to overwhelming customer and community requests, we are enabling running data pipelines across workspaces. You will now be able to invoke pipelines from other workspaces that you have access to execute. This will enable very exciting data workflow patterns that can utilize collaboration from your data engineering and integration teams across workspaces and across functional teams.
On-premises data gateway (OPDG) support added to data pipelines
Shipped (Q3 2024)
Release Type: General availability
This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.
Copy Job
Shipped (Q3 2024)
Release Type: Public preview
Copy Job simplifies the experience for customers who need to ingest data, without having to create a Dataflow or Data pipeline. Copy Job supports full and incremental copy from any data sources to any data destinations. Sign-up for Private Preview now.
Mirroring for Snowflake
Shipped (Q3 2024)
Release Type: General availability
Mirroring provides a seamless no-ETL experience to integrate your existing Snowflake data with the rest of your data in Microsoft Fabric. You can continuously replicate your Snowflake data directly into Fabric OneLake in near real-time, without any effect on the performance of your transactional workloads.
Improved email notifications for Refresh failures
Shipped (Q3 2024)
Release Type: Public preview
Email notifications allow Dataflow Gen2 creators to monitor the results (success/failure) of a dataflow’s refresh operation.
Fast Copy support in Dataflow Gen2
Shipped (Q3 2024)
Release Type: General availability
We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.
Incremental refresh support in Dataflow Gen2
Shipped (Q3 2024)
Release Type: Public preview
We're adding incremental refresh support in Dataflow Gen2. This feature enables you to incrementally extract data from data sources, apply Power Query transformations, and load into various output destinations.
Data source identity management (Managed Identity)
Shipped (Q3 2024)
Release Type: Public preview
This enables Managed identity to be configured at a workspace level. You can use the Fabric managed identities to connect to your data source securely.
Data pipeline support for Azure Databricks Jobs
Shipped (Q3 2024)
Release Type: Public preview
We are updating the Data Factory data pipelines Azure Databricks activities to now use the latest jobs API enabling exciting workflow capabilities like executing DLT jobs.
Copilot for Data Factory (Dataflow)
Shipped (Q3 2024)
Release Type: General availability
Copilot for Data Factory (Dataflow) empowers customers to express their requirements using natural language when creating data integration solutions with Dataflows Gen2.
Data pipeline support for SparkJobDefinition
Shipped (Q2 2024)
Release Type: General availability
Now you can execute your Spark code, including JAR files, directly from a pipeline activity. Just point to your Spark code and the pipeline will execute the job on your Spark cluster in Fabric. This new activity enables exciting data workflow patterns that leverages the power of Fabric's Spark engine while including the Data Factory control flow and data flow capabilities in the same pipeline as your Spark Jobs.
Data pipeline support for Event-Driven Triggers
Shipped (Q2 2024)
Release Type: Public preview
A common use case for invoking Data Factory data pipelines is to trigger the pipeline upon file events like file arrival and file delete. For customers coming from ADF or Synapse to Fabric, using ADLS/Blog storage events is very common as a way to either signal for a new pipeline execution or to capture the names of the files created. Triggers in Fabric Data Factory leverage Fabric platform capabilities including EventStreams and Reflex triggers. Inside of the Fabric Data Factory pipeline design canvas, you will have a Trigger button that you can press to create a Reflex trigger for your pipeline or you can create the trigger directly from the Data Activator experience.
Staging defaults for Dataflow Gen 2 Output destination
Shipped (Q2 2024)
Release Type: Public preview
Dataflow Gen2 provides capabilities to ingest data from a wide range of data sources into the Fabric OneLake. Upon staging this data, it can be transformed at high-scale leveraging the High-Scale Dataflows Gen2 engine (based on Fabric Lakehouse/Warehouse SQL compute).
The default behavior for Dataflows Gen2 is to stage data in OneLake to enable high-scale data transformations. While this works great for high-scale scenarios, it does not work as well for scenarios involving small amounts of data being ingested given that it introduces an extra hop (staging) for data before it is ultimately loaded into the dataflow output destination.
With the planned enhancements, we’re fine tuning the default Staging behavior to be disabled, for queries with an output destination that doesn’t require staging (namely, Fabric Lakehouse and Azure SQL Database).
Staging behavior can be manually configured on a per-query basis via the Query Settings pane or the query contextual menu in the Queries pane.
Data pipeline support for Azure HDInsight
Shipped (Q2 2024)
Release Type: General availability
HDInsight is the Azure PaaS service for Hadoop that enables developers to build very powerful big data solutions in the cloud. The new HDI pipeline activity will enable HDInsights job activities inside of your Data Factory data pipelines similar to the existing funcationality that you've enhoyed for years in ADF and Synapse pipelines. We've now brought this capability directly into Fabric data pipelines.
New connectors for Copy Activity
Shipped (Q2 2024)
Release Type: Public preview
New connectors will be added for Copy activity to empower customer to ingest from the following sources, while leveraging data pipeline: Oracle, MySQL, Azure AI Search, Azure Files, Dynamics AX, Google BigQuery.
Apache Airflow job: Build data pipelines powered by Apache Airflow
Shipped (Q2 2024)
Release Type: Public preview
Apache Airflow job (earlier referred to as Data workflows) are powered by Apache Airflow and offer an integrated Apache Airflow runtime environment, enabling you to author, execute, and schedule Python DAGs with ease.
Data source identity management (SPN)
Shipped (Q2 2024)
Release Type: General availability
Service principal - To access resources that are secured by an Azure AD tenant, the entity that requires access must be represented by a security principal. You'll be able to connect to your data sources with the service principal.
Data Factory Git integration for data pipelines
Shipped (Q1 2024)
Release Type: Public preview
You can connect to your Git repository to develop data pipelines in a collaborative way. The integration of data pipelines with the Fabric platform's Application Lifecycle Management (ALM) capability enables version control, branching, commits, and pull requests.
Enhancements to output destinations in Dataflow Gen2 (query schema)
Shipped (Q1 2024)
Release Type: Public preview
We're enhancing the output destinations in Dataflow Gen2 with the following highly requested capabilities:
- Ability to handle query schema changes after configuring an output destination.
- Default destination settings to accelerate dataflows creation.
To learn more, see Dataflow Gen2 data destinations and managed settings
Get data experience improvements(Browse Azure Resources)
Shipped (Q1 2024)
Release Type: Public preview
Browsing Azure resources provides seamless navigation to browse Azure resources. You can easily navigate your Azure subscriptions and connect to your data sources through an intuitive user interface. It helps you quickly find and connect to the data you need.
On-premises data gateway (OPDG) support added to data pipelines
Shipped (Q1 2024)
Release Type: Public preview
This feature enables data pipelines to use Fabric data gateways to access data that is on-premises and behind a virtual network. For users using self-hosted integration runtimes (SHIR), they'll be able to move to on-premises data gateways in Fabric.
Fast Copy support in Dataflow Gen2
Shipped (Q1 2024)
Release Type: Public preview
We're adding support for large-scale data ingestion directly within the Dataflow Gen2 experience, utilizing the pipelines Copy Activity capability. This supports sources such Azure SQL Databases, CSV, and Parquet files in Azure Data Lake Storage and Blob Storage.
This enhancement significantly scales up the data processing capacity of Dataflow Gen2 providing high-scale ELT (Extract-Load-Transform) capabilities.
Cancel refresh support in Dataflow Gen2
Shipped (Q4 2023)
Release Type: Public preview
We're adding support to cancel ongoing Dataflow Gen2 refreshes from the workspace items view.