What's new and planned for Fabric Data Engineering in Microsoft Fabric
Important
The release plans describe functionality that may or may not have been released yet. The delivery timelines and projected functionality may change or may not ship. Refer to Microsoft policy for more information.
Fabric Data Engineering empowers data engineers to be able to transform their data at scale using Spark and build out their lakehouse architecture.
Lakehouse for all your organizational data: The lakehouse combines the best of the data lake and the data warehouse in a single experience. It enables users to ingest, prepare, and share organizational data in an open format in the lake. Later you can access it through multiple engines such as Spark, T-SQL, and Power BI. It provides various data integration options such as dataflows and pipelines, shortcuts to external data sources, and data product sharing capabilities.
Performant Spark engine & runtime: Fabric Data engineering provides customers with an optimized Spark runtime with the latest versions of Spark, Delta, and Python.. It uses Delta Lake as the common table format for all engines, enabling easy data sharing and reporting with no data movement. The runtime comes with Spark optimizations, enhancing your query performance without any configurations. It also offers starter pools and high-concurrency mode to speed up and reuse your Spark sessions, saving you time and cost.
Spark Admin & configurations: Workspace admins with appropriate permissions can create and configure custom pools to optimize the performance and cost of their Spark workloads. Creators can configure environments to install libraries, select the runtime version, and set Spark properties for their notebooks and Spark jobs.
Developer Experience: Developers can use notebooks, Spark jobs, or their preferred IDE to author and execute Spark code in Fabric. They can natively access the lakehouse data, collaborate with others, install libraries, track history, do in-line monitoring, and get recommendations from the Spark advisor. They can also use Data Wrangler to easily prepare data with a low-code UI.
Platform Integration: All Fabric data engineering items, including notebooks, Spark jobs, environments,and lakehouses, are integrated deeply into the Fabric platform (enterprise information management capabilities, lineage, sensitivity labels, and endorsements).
Investment areas
Python notebook
Estimated release timeline: Q4 2024
Release Type: Public preview
Fabric notebooks support pure Python experience. This new solution is targeting BI developers and Data Scientists working with smaller datasets (up to a few GB) and using Pandas, and Python as their primary language. Through this new experience, they'll be able to benefit from native Python language and its native features and libraries out of the box, will be able to switch from a Python version to another (initially two versions will be supported) and finally will benefit with a better resource utilization by using a smaller 2VCore machine.
ArcGIS GeoAnalytics for Microsoft Fabric Spark
Estimated release timeline: Q4 2024
Release Type: Public preview
Microsoft and Esri have partnered to bring spatial analytics into Microsoft Fabric. This collaboration introduces a new library, ArcGIS GeoAnalytics for Microsoft Fabric, enabling an extensive set of spatial analytics right within Microsoft Fabric Spark notebooks and Spark job definitions (across both Data Engineering and Data Science experiences / workloads).
This integrated product experience empowers Spark developers or data scientists to natively use Esri capabilities to run ArcGIS GeoAnalytics functions and tools within Fabric Spark for spatial transformation, enrichment, and pattern / trend analysis of data – even big data – across different use cases without any need for separate installation and configuration.
Installing libraries from ADLS Gen2 Storage account
Estimated release timeline: Q4 2024
Release Type: Public preview
Supporting a new source for users to install libraries. Through creating a custom conda/PyPI channel, which is hosted on their storage account, users can install the libraries from their storage account in their Fabric Environments.
Notebook live versioning
Estimated release timeline: Q1 2025
Release Type: Public preview
With live versioning Fabric Notebook developers can track the history of changes made to their notebooks, compare different verions and restore previous versions if needed.
VSCode Satellite Extension for User Data Functions in Fabric
Estimated release timeline: Q1 2025
Release Type: Public preview
The VSCode Satellite extensionn for User Data Functions will provide developer support (editing, building, debugging, publishing) for User Data Functions in Fabric.
User Data Functions in Fabric
Estimated release timeline: Q1 2025
Release Type: Public preview
User Data Functions will provide a powerful mechanism for implementing and re-using custom, specialized business logic into Fabric data science and data engineering workflows, increasing efficiency and flexibility.
Public monitoring APIs
Estimated release timeline: Q1 2025
Release Type: Public preview
The Public Monitoring API feature for Fabric Spark aims to expose Spark monitoring APIs, allowing users to monitor Spark job progress, view execution tasks, and access logs programmatically. This feature is aligned with the public API standards, providing a seamless monitoring experience for Spark applications.
Lakehouse Shortcuts metadata on git and deployment pipelines
Estimated release timeline: Q1 2025
Release Type: Public preview
To deliver a compelling application lifecycle management story, tracking object metadata in git and supporting deployment pipelines is imperative. In the Data Engineering modules, as workspaces are integrated to git.
In this first iteration, OneLake Shortcuts will automatically be deployed across pipeline stages and workspaces. Shortcut connections can be remapped across stages using a new Microsoft Fabric item named variable library, assuring proper isolation and environment segmentation customers expect.
Delta Lake improvements in Spark experiences
Estimated release timeline: Q1 2025
Release Type: General availability
Having proper defaults and aligning with the latest standards are of the utmost importance to Delta Lake standards in Microsoft Fabric. INT64 will be the new default encoding type for all timestamp values. This moves away from INT96 encodings, which the Apache Parquet deprecated years ago. The changes don't affect any reading capabilities, it's transparent and compatible by default, but ensures that all new parquet files in your Delta Lake table are written in a more efficient and future proof way.
We're also releasing a faster implementation of the OPTIMIZE command, making it skip already V-Ordered files.
Support for snapshots of in-progress Notebook jobs
Estimated release timeline: Q1 2025
Release Type: Public preview
This feature allows users to view a Notebook snapshot while it is still running, which is essential for monitoring progress and troubleshooting performance issues. Users can see the original source code, input parameters, and cell outputs to better understand the Spark job, and they can track the Spark execution progress at the cell level. Users can also review the output of completed cells to validate the accuracy of the Spark application and estimate the remaining work. Additionally, any errors or exceptions from already executed cells are displayed, helping users identify and address issues early.
RLS/CLS Support for Spark and Lakehouse
Estimated release timeline: Q1 2025
Release Type: Public preview
The feature allows users to implement security policies for data access within the Spark engine. Users may define Object, Row, or Column level security, ensuring that data is secured as defined by these policies when accessed through Fabric Spark and is aligned with the OneSecurity initiative being enabled across Microsoft Fabric.
Spark Connector for Fabric Data Warehouse - General Availability
Estimated release timeline: Q1 2025
Release Type: General availability
The Spark connector for Microsoft Fabric Data Warehouse enables Spark developers and data scientists to access and work with data from a warehouse and the SQL analytics endpoint of a lakehouse. It offers a simplified Spark API, abstracts underlying complexity, and operates with just one line of code, while upholding security models like object-level security (OLS), row-level security (RLS), and column-level security (CLS).
Shipped feature(s)
Ability to sort and filter tables and folders in Lakehouse
Shipped (Q4 2024)
Release Type: General availability
This feature allows customers to sort and filter their tables and folders in the Lakehouse by several different methods, including alphabetically, created date, and more.
Notebooks in an app
Shipped (Q4 2024)
Release Type: Public preview
Org apps are available as a new item in Fabric and you can include Notebooks alongside Power BI reports and dashboards in Fabric apps and distribute them to business users. App consumers can interact with widgets and visuals in the notebook, as an alternative reporting and data exploration mechanism. This enable you to create and share rich and engaging stories with your data.
VSCode Core Extension for Fabric
Shipped (Q3 2024)
Release Type: Public preview
Core VSCode Extension for Fabric will provide common developer support for Fabric services.
T-SQL notebook
Shipped (Q3 2024)
Release Type: Public preview
Fabric notebooks support T-SQL language to consume data against Data Warehouse. By adding a Data Warehouse or SQL analytics endpoint to a notebook, T-SQL developers can run queries directly on the connected endpoint. BI analysts can also perform cross-database queries to gather insights from multiple warehouses and SQL analytics endpoints. T-SQL Notebooks offer a great authoring alternative to the existing tools to SQL users and include Fabric native features, like, sharing, GIT integration and collaboration.
VS Code for the Web - debugging support
Shipped (Q3 2024)
Release Type: Public preview
Visual Studio Code for the Web is currently supported in Preview for authoring and execution scenarios. We add to the list of capabilities the ability to debug code using this extension for notebook.
High concurrency in pipelines
Shipped (Q3 2024)
Release Type: General availability
In addition to high concurrency in notebooks, we'll also enable high concurrency in pipelines. This capability will allow you to run multiple notebooks in a pipeline with a single session.
Schema support and workspace in namespace in Lakehouse
Shipped (Q3 2024)
Release Type: Public preview
This will allow to organize tables using schemas and query data across workspaces.
Spark Native Execution Engine
Shipped (Q2 2024)
Release Type: Public preview
The native execution engine is a groundbreaking enhancement for Apache Spark job executions in Microsoft Fabric. This vectorized engine optimizes the performance and efficiency of your Spark queries by running them directly on your lakehouse infrastructure. The engine's seamless integration means it requires no code modifications and avoids vendor lock-in. It supports Apache Spark APIs and is compatible with Runtime 1.2 (Spark 3.4), and works with both Parquet and Delta formats. Regardless of your data's location within OneLake, or if you access data via shortcuts, the native execution engine maximizes efficiency and performance
Spark Connector for Fabric Data Warehouse
Shipped (Q2 2024)
Release Type: Public preview
Spark Connector for Fabric DW (Data Warehouse) empowers a Spark developer or a data scientist to access and work on data from Fabric Data Warehouse with a simplified Spark API, which literally works with just one line of code. It offers an ability to query the data, in parallel, from Fabric data warehouse so that it scales with increasing data volume and honors security model (OLS/RLS/CLS) defined at the data warehouse level while accessing the table or view. This first release will support reading data only and the support for writing data back will be coming soon.
Microsoft Fabric API for GraphQL
Shipped (Q2 2024)
Release Type: Public preview
API for GraphQL will allow Fabric data engineers, scientists, data solution architects to effortlessly expose and integrate Fabric data, for more responsive, performant and rich analytical applications, leveraging the power and flexibility of GraphQL.
Create and attach environments
Shipped (Q2 2024)
Release Type: General availability
To customize your Spark experiences at a more granular level, you can create and attach environments to your notebooks and Spark jobs. In an environment, you can install libraries, configure a new pool, set Spark properties, and upload scripts to a file system. This gives you more flexibility and control over your Spark workloads, without affecting the default settings of the workspace. As part of GA, we're making various improvements to environments including API support and CI/CD integration.
Job Queueing for Notebook Jobs
Shipped (Q2 2024)
Release Type: General availability
This feature allows scheduled Spark Notebooks to be queued when Spark usage is at its maximum number of jobs it can execute in parallel and then execute once usage has dropped back below the maximum number of parallel jobs allowed.
Optimistic Job Admission for Fabric Spark
Shipped (Q2 2024)
Release Type: General availability
With Optimistic Job Admission, Fabric Spark only reserves the minimum number of cores that a job needs to start, based on the minimum number of nodes that the job can scale down to. This allows more jobs to be admitted if there are enough resources to meet the minimum requirements. If a job needs to scale up later, the scale up requests is approved or rejected based on the available cores in capacity.
Spark autotune
Shipped (Q1 2024)
Release Type: Public preview
Autotune uses machine learning to automatically analyze previous runs of your Spark jobs and tunes the configurations to optimize the performance. It configures how your data is partitioned, joined, and read by Spark. This way it will significantly improve the performance. We have seen customer jobs run 2x faster with this capability.