Reference Architecture for Sustainability data solutions in Microsoft Fabric
Overview
With the global emphasis on environmental changes, sustainable economies, and ESG regulations, organizations must comprehensively manage their sustainability practices. Businesses are focusing on tracking the progress of key ESG KPIs against their goals. Data-driven insights into reduction opportunities help minimize overall environmental impact and enable informed decisions for more sustainable operations. This approach leads to improved sustainability KPIs and greater compliance with increasingly mandatory ESG disclosure regulations. Effective sustainability management, enhanced business sustainability, and better regulatory compliance require rich ESG data from diverse sources to be unified for improved efficiency and value.
To effectively manage and enhance overall sustainability, organizations must collect, standardize, process, and analyze ESG data. This task involves gathering data from various sources and transforming it into a standardized ESG schema, ensuring all data is stored in a consistent format. This preparation is crucial for both disclosure and analytics purposes. Next, data needs to be processed further to generate KPIs required for regulatory disclosures such as Corporate Sustainability Reporting Directive (CSRD).
In some instances, analytics necessitates enriching standardized ESG data with additional information, such as revenue and production quantities from various business functions. To ensure continuous progress on sustainability KPIs, organizations need to identify actionable business levers and understand their impact on these KPIs. Every step of the sustainability journey relies on high-quality data synthesized from different functional areas within the organization.
Sustainability data solutions in Microsoft Fabric is set of workloads built on Microsoft Fabric that provides unique capabilities to ingest, harmonize, and process disparate enterprise data systems, such as Environmental, Social, Governance, Finance, including data from Microsoft Sustainability Manager, etc., and standardize in a comprehensive data estate in Microsoft Fabric OneLake to prepare ESG metrics, datasets, and functions to aid to the advanced analytics and AI functions for specific sustainability scenarios.
Refer to the Product documentation to explore the functional capabilities offered in Sustainability data solutions in Microsoft Fabric.
Solution Architecture
Organization that track and report the environmental measures (ESG) need to ingest, process, and analyze vast amounts of complex and heterogeneous data sets from various sources. But they often face challenges such as inaccessible siloed data, inconsistent data quality, and limited insights.
To overcome these challenges, Sustainability data solutions in Microsoft Fabric offer seamless integration, data engineering, real-time analytics, and business intelligence capabilities. It also ensures data privacy and security. It enables efficient transformation of ESG data for analysis by breaking down data silos, harmonizing disparate data, and providing intuitive tools such as data pipelines and notebooks. Organizations can use Fabric autoscaling capabilities for optimized performance and unlock actionable insights, drive innovation, and improve the sustainability outcomes.
The following reference architecture shows how different personas interact with the data in Microsoft Fabric as it flows through and transforms within the system:
The foundation of Sustainability data solutions lies in the innovative medallion lakehouse architecture. This framework organizes and processes data in a systematic, multi-layered manner. It continually enhances the structure and quality of data as it traverses through each layer. At its core, the medallion architecture has three fundamental layers:
- RAW: Also called the BRONZE Lakehouse, this first Lakehouse stores the source data in its original format. The data in this layer is typically append-only and immutable.
- REFINED or ENRICHED: Also called the SILVER Lakehouse, this Lakehouse stores data sourced from the bronze layer. Here, data undergoes refinement processes, including validation checks and enrichment techniques, to enhance its accuracy and utility for downstream analytics.
- AGGREGATED or CURATED: Also called the GOLD Lakehouse, this final Lakehouse stores data sourced from the silver layer. The data is refined to meet specific downstream business and analytics requirements. This layer serves as the primary source for high-quality, aggregated data sets ready for comprehensive analysis and insights extraction.
After the data is transformed and unified across layers, you can use traditional SQL tooling for exploratory analysis on the ESG datasets. Sustainability data solutions follow the medallion architecture principles to provide a robust framework for managing and harnessing ESG data effectively. This approach helps you derive actionable insights and make informed decisions in the dynamic Sustainability landscape.
Note
To deploy Sustainability data solutions in Fabric, follow the Deployment Instructions.
Integration between Microsoft Sustainability Manager and Sustainability data solutions in Fabric
The environmental data and insights capability within Sustainability data solutions in Microsoft Fabric helps unify and harmonize your organization's environmental data from Microsoft Sustainability Manager. This data is then transformed into the environmental, social, and governance (ESG) data model schema. The transformed environmental data can be used to generate ESG metrics for reporting and dashboard visualization, meeting ESG disclosure requirements. This objective is made possible through the native integration options between Microsoft Sustainability Manager and Microsoft Fabric
Integration between Microsoft Sustainability Manager and Sustainability data solutions in Fabric follows the standard integration patterns supported by the underlying Dataverse platform and Microsoft Fabric. These options help making the Sustainability Manager data available in bronze lakehouse (also known as IngestedRawData lakehouse) as Raw data. The Dataverse platform offers two standard options to establish a one-way sync between Dataverse and the Fabric workspace. Additionally, Microsoft Fabric provides a third option, known as "Dataverse shortcuts" to integrate Sustainability Manager data shortcuts into Fabric.
Azure Synapse Link for Dataverse: This approach may be ideal for existing customers who already have ADLS Gen2 configured to store Sustainability Manager data. It can be further extended to support data flows within Microsoft Fabric.
Note
This option replicates the data in CSV format to the ADLS Gen2 storage. The data is then transformed into Delta lake format using the notebooks.
Fabric Link for Dataverse: This option is beneficial for customers who have already set up "Link to Microsoft Fabric," which creates a direct and secure link between your Sustainability Manager data in Dataverse and a Fabric workspace. There's no need to provide a storage account or Synapse workspaces. When you link to Fabric from Power Apps, the system creates an optimized replica of your data in delta parquet format, the native format of Fabric and OneLake, using Dataverse storage so that your operational workloads aren't impacted. Dataverse governs and secures this replica, ensuring it stays within the same region as your Dataverse environment while enabling Fabric workloads to operate on this data.
Dataverse Shortcut to Microsoft Fabric: This option is provided by the Fabric to refer to the Sustainability Manager data in Dataverse through the Shortcut in Delta lake format. Technically, both options 2 and 3 follow the same pattern. That is, no direct copy of data into Fabric.
Comparing Fabric Link for Dataverse with Azure Synapse Link for Dataverse
Azure Synapse Link for Dataverse enables IT admins to export data to their own storage and build data integration pipelines. Azure Synapse Link assists with provisioning and configuring Azure resources within an integrated experience. Link to Microsoft Fabric feature enables direct connectivity between your data in Dataverse with Microsoft Fabric without the need to bring your own storage and Synapse workspaces. Link to Fabric uses storage built into Dataverse and removes the need to provision and manage your own storage.
This table provides a comparison between the options.
Link to Fabric (Options 2&3) | Azure Synapse Link for Dataverse (Option 1) |
---|---|
No copy, no ETL; Direct integration with Microsoft Fabric. | Export data to your own storage account and integrate with Synapse, Microsoft Fabric, and other tools. |
Data stays in Dataverse - users get secure access in Microsoft Fabric. | Data stays in your own storage. You manage access to users. |
All tables chosen by default. | System administrators can choose required tables. |
Consumes extra Dataverse storage. | Consumes your own storage and other compute and integration tools. |
FAQs
How can I build custom metrics?
The capability is designed to be flexible and extendable, allowing you to compute custom metrics using the same computation pipelines. To build custom metrics, follow below steps:
- Generate Aggregate Datasets: Prepare the data required for the custom metrics by creating aggregate datasets as inputs.
- Define a Power BI (PBI) Measure: Create a DAX measure to capture the metric computation logic in a reusable format.
- Define the Metric Definition: Use prebuilt utility functions to formalize the metric definition, ensuring consistency across computations.
Is there a change in the tech stack used to compute metrics in the capability?
Yes, Previously, we used SQL to compute metrics from aggregate tables. We transitioned to use DAX measures with the Fabric Semantic Link library. This change not only enhances computation efficiency but also allows us to store metric computation logic in a standardized format, rather than spreading it across multiple notebooks. Additionally, it simplifies the process of adding custom metrics as needed.
Do I require any additional privileges to deploy and access the Azure emissions data using the Microsoft Azure Emissions Insights capability?
Users with an active Microsoft Fabric license and a valid Azure billing enrollment ID for the supported account types can deploy and access the Azure emissions data of the provided billing account. They must have administration or reader access to the billing ID to use the Microsoft Azure Emissions Insights capability.
When can I see my account’s Azure emissions data after deploying the capability?
A user with appropriate admin access to Azure billing/enrollment accounts will be able to see the data in a few minutes after clicking the update data button.
How can I see the breakdown of emissions by Scope 1, 2 and 3 in the emissions dashboard?
The prebuilt dashboard provides views and trends of the total Azure cloud emissions of the chosen billing IDs. You can customize this dashboard or build you own dashboard using the emissions data broken down by the different scopes from the aggregated data Lakehouse.
Can I access and download the data in the aggregated Lakehouse to external databases and applications?
Yes, the data in the aggregated Lakehouse can be exported to external applications and databases using the Fabric OneLake APIs. You can find more information on this here.
We have secured our Azure Data Lake Gen2 Storage by restricting network access to public Azure resources. How do we then enable the data from Microsoft Sustainability Manager to be synced to our data lake?
Yes, It's possible to sync Microsoft Sustainability Manager data to Azure Data Lake Gen2 storage configured in restricted network mode. This sync is feasible because Sustainability Manager is provisioned on Dataverse. The Azure Synapse Link for Dataverse service supports managed identities for Azure, and ensures that the access to your storage account is restricted to requests originating from the Dataverse environments associated with your tenant. To set up, refer to Use managed identities for Azure with your Azure data lake storage