Self-serve analytics and insights (preview)
Self-serve analytics and insights refer to data, tools, and platforms that enable business users to access, analyze, and generate insights from data independently. The Microsoft Purview data governance application publishes domain model of metadata into Fabric OneLake and AdlsG2 (Azure Data Lake storage), allowing customers to analyze and generate insights by bringing their own tools and compute. Self-service analytics of data governance metadata is valuable in driving continuous improvement of customers' data estate health management and fostering a data-driven culture across the organization by democratizing access to data estate insights and health management.
Key components
- Data Model: 3NF Model with domains and dimensions details
- Metadata: Data governance metadata that includes:
- Governance domains
- data products
- data assets
- glossary terms
- subscription request
- data quality rules
- dimensions
- data quality facts (pass and fail counts)
Benefits
- Empowerment: Empowers data practitioners, data product owners, data stewards, and analysts to explore data governance metadata and link metadata from various sources to derive insights.
- Flexibility and Efficiency: Customer will able to create custom reports in addition to out-of-the-box reports in health management.
- Agility: Allows customers organizations to respond more swiftly to health management issues and remediation.
- Cost-Effective: Reduces the need for setting up platforms and build tools. All data is available in OneLake and customer will able to use available tools (Fabric semantic model, PBI reporting, Data flow, and notebook) in OneLake.
Currently available reports (out-of-the-box)
Here are the available out-of-the-box reports. These reports aren't customizable.
- Classic assets - an overview of assets by type and collection, and their curation status.
- Classic catalog adoption - to understand at a glance how Unified Catalog is being used. your glossary, providing a snapshot of terms and their status.
- Classic classifications - an overview of assets classified and the types of classifications.
- Classic data stewardship - an overview of assets classified and the types of classifications.
- Classic glossary - health and use of glossary terms.
- Classic sensitivity labels -an overview of assets that have sensitivity labels applied and the types of labels applied.
- Data governance - The data governance health report allows your team to track your health progress at a glance, and identify areas that need more work.
- Data quality health - Data quality dimensions and data quality rules performance reporting.
Data model for self-serve analytics metadata
The 3NF domain model is part of the normalization process in relational database design, which ensures that the database is free of redundancy and update anomalies. A database schema is in the Third Normal Form if it meets the requirements of the First and Second Normal Forms and all its attributes are functionally dependent only on the primary key. The purpose of 3NF domain model is used to structure data in a way that minimizes duplication and ensures data integrity. It focuses on breaking down data into smaller, related tables where each piece of information is stored only once.
Characteristics:
- Elimination of Transitive Dependencies: Nonkey attributes shouldn't depend on other nonkey attributes.
- Logical Grouping: Data is logically grouped into domains based on its function or meaning.
- Entity-Relationship Diagrams (ERDs): Commonly used to represent 3NF domain models, showing how entities relate to each other.
Table name | Description | Relationship keys |
---|---|---|
Access Policy Provisioning State | Information about Provisioning state is stored in this table. | ProvisioningStateId |
Access Policy Resource Type | Access porlich resource information is stored in this table. | ResourceTypeId |
Access Policy Set | Overall information about access policy, policy use case details, and where the policy applied one, etc. are stored in this table. | AccessPolicySetId (UniqueId), ResourceTypeId (FK), ProvisioningStateId (FK) |
Business Domain | Business domain name, description, status, and ownership details are published in the Business domain table | Parent Business Domain ID (FK), Created By User ID (FK), Last Modified By User ID (FK) |
Classification | Data asset classification information is stored in this table. | ClassificationId |
Custom Access Use Case | Access use case information is stored in this table. | AccessPolicySetId |
Data Asset | Data asset name, description, and source information is stored in this table. | DataAssetId (UniqueId), DataAssetTypeId (FK), CreatedByUserId (FK), LastModifiedByUserId (FK) |
Data Asset Column | Data asset column name, column description, and references are stored in this table. | DataAssetId (FK), ColumnId (Unique), DataAssetTypeId (FK), DataTypeId (FK), Created By User Id (FK), Last Modified By User Id (FK) |
Data Asset Column Classification Assignment | Data classification assignment related reference keys are stored in this table. | DataAssetId (FK), ColumnId (FK), ClassificationId (FK) |
Data Asset Domain Assignment | Data asset governance domain assignment related information are available in this table. | DataAssetId (FK), BusinessDomainId (FK), AssignedByUserId (FK) |
Data Asset Owner | Data asset owner information stored in this table. | DataAssetOwnerId |
Data Asset Owner Assignment | Data asset owner assignment information is stored in this table. | DataAssetId, DataAssetOwnerId |
Data Asset Type Data Type | Data asset type information is stored in this table. | DataTypeId (UniqueId), DataAssetTypeId (FK) |
Data Product | Data product name, description, use cases, status, and other relevant information stored in this table. | DataProductId (UniqueId), DataProductTypeId (FK), DataProductStatusId (FK), UpdateFrequencyId (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK) |
Data Product Asset Assignment | Data product and data asset assignment information are stored in this table. | DataProductId, DataAssetId |
Data Product Business Domain Assignment | Data product and governance domain assignment information are stored in this table. | DataProductId (FK), BusinessDomainId (FK), AssignedByUserId (FK) |
Data Product Documentation | Data product documentation reference information is stored in this table. | DataProductId, DocumentationId |
Data Product Owner | Data product owner information is stored in this table. | DataProductId, DataProductOwnerId |
Data Product Status | Data product status (like published or draft) related information stored in this table. | DataProductStatusId |
Data Product Terms Of Use | Data product usage terms information is stored in this table. | DataProductId, TermOfUsedId, DataAssetId |
Data Product Type | The information about data product types - Master, Reference, Operational, etc. are stored in this table. | DataProductTypeId |
Data Product Update Frequency | The information about how often this data product's data are updated is stored in this table. | UpdateFrequencyId |
Data Quality Asset Rule Execution | Data quality scanning results stored in this table | RuleId (FK), DataAssetId (FK), JobExecutionId (FK) |
Data Quality Job Execution | Data quality job execution status is stored in this table. | JobExecutionId (UniqueId) |
Data Quality Rule | Information about data quality rules is stored in this table. | RuleId (UniqueId), RuleTypeId (FK), BusinessDomainId (FK), DataProductId (FK), DataAssetId (FK), JobTypeDisplayName (FK), RuleOriginDisplayName (FK), RuleTargetObjectType (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK) |
Data Quality Rule Column Execution | Information about data quality rules pass and fail count, data quality score in columns level, and including data quality job execution details are stored in this table. | RuleId (FK), DataAssetId (FK), ColumnId (FK), JobExecutionId (FK) |
Data Quality Rule Type | Data quality rule type and associated dimensions are stored in this table. | RuleTypeId (UniqueId), DimensionDisplayName (FK) |
Data Subscription Request | Information about data subscribers, policies applied, subscription request status, and other relevant information stored in this table. | SubscriberRequestId (UniqueId), SubscriberIdentityTypeDisplayName (FK), RequestorIdentityTypeDisplayName (FK), RequestorStatusDisplayName (FK) |
Glossary Term | Information about glossary term, description, and overall status of the glossary term are stored in this table. | GlossaryTermId (UniqueId), ParentGlossaryTermId (FK), CreatedByUserId (FK), LastModifiedByUserId (FK) |
Glossary Term Business Domain Assignment | Information about glossary term governance domain assignment and statuses are stored in this table. | GlossaryTermId (FK), BusinessDomainId (FK), AssignedByUserId (FK), GlossaryTermStatusId (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK) |
Glossary Term Data Product Assignment | Information about glossary term data product assignment is stored in this table. | GlossaryTermId (FK), DataProductId (FK), AssignedByUserId (FK), GlossaryTermStatusId (FK), CreatedByUserId (FK), LastUpdatedByUserId (FK) |
Policy Set Approver | Policy set and the approver information are stored in this table. | SubscriberRequestId (FK), AccessPolicySetId (FK), ApproverUserId (FK) |
Relationship | Information about source type and target information are stored in this table. | AccountId, SourceId, TargetId |
Subscribe Microsoft Purview catalog metadata to Fabric OneLake
You can subscribe to Microsoft Purview's data governance metadata for analytics and derive insights by following these steps:
Select Settings from the left pane, select Unified Catalog, then select solution integrations.
Select Edit.
Add Storage type and Enabled the setup.
Add Location URL (example:
https://onelake.dfs.fabric.microsoft.com/workspace name/lakehouse name/Files/purviewmetadata
)- Select Properties to copy URL.
- Copy URL from the Properties page.
Add folder name at the end of the URL - example: /DEH (see the screenshot)
Grant contributor accesses to Microsoft Purview Manage Service Identity (MSI) to your fabric workspace.
Test connection.
Select Save to save the configuration to publish Purview metadata to your OneLake workspace.
Create a semantic model in OneLake
A semantic model in the context of data and analytics refers to a structured representation of data that defines the meaning, relationships, and rules within a specific domain. It provides a layer of abstraction that helps users understand and interact with complex data by making it more intuitive and accessible, especially in the context of business intelligence (BI) and analytics platforms. A semantic model is always required before any reports can be built. Within the warehouse, a user can add warehouse objects - tables or views to their default Power BI semantic model. They can also add other semantic modeling properties, such as hierarchies and descriptions. These properties are then used to create the Power BI semantic model's tables. Users can also remove objects from the default Power BI semantic model.
To create a semantic model from the Microsoft Purview Data Governance metadata domain model:
Open the Lakehouse in your Fabric workspace.
Publish the domains model files to delta tables
- Select the ellipsis button (...)
- Select **Load to Tables > New table
- New table name prepopulated
- Select File type parquet
You can also use shortcut to shortcut the domain model from OneLake and within OneLake
- Select the ellipsis button (...) of Tables
- Select New Shortcut and select Microsoft OneLake from New Shortcut sources page.
- Select domains model table to shortcut
After you publish all files to delta tables either manually or via shortcut, you're able to add the delta tables to semantic model.
- Switch to SQL analytics endpoint page from the Lakehouse page.
- Select Reporting from top left corner of the SQL analytics endpoint page.
- Select Manage default semantic model.
- Select the tables from dbo > Tables that you want to add to the semantic model for reporting.
To add objects such as tables or views to the default Power BI semantic model, select Automatically update semantic model.
Note
You need to draw semantic model relationship manually using the relationship keys.
Subscribe to Microsoft Purview Catalog metadata to AdlsG2 storage
You can subscribe to Microsoft Purview's data governance metadata to publish and store in your AdlsG2 storage for analytics and derive insights by following these steps:
Select Settings from the left pane, select Unified Catalog, then select solution integrations.
Select Edit.
Select Storage type. and Enabled the setup.
Add Location URL, this need to be AdlsG2 path + "/(container name)"
- Go to portal.azure.com
- Select adlsg2 storage (Home > adlsg2)
- Go to Settings > Endpoints and select Primary endpoint of your data lake storage.
Grant Storage Blob Data Contributor access to Microsoft Purview Manage Service Identity (MSI) to your AdlsG2 container
Test connection.
Select Save tab to save the configuration to publish the domain model to your adlsg2 storage.
Review published model and data
Open portal.azure.com
Select your adlsg2 storage
select the container which you added with the adlsg2 endpoint in purview
Browse list of delta parquet files published in the container.
Browse published model and metadata (see the images below).
Create a Power BI report
Power BI is natively integrated in the whole Fabric experience. This native integration includes a unique mode, called DirectLake, for accessing the data from the lakehouse to provide the most performant query and reporting experience. DirectLake is a groundbreaking new capability that allows you to analyze very large semantic models in Power BI. With DirectLake, you load parquet-formatted files directly from a data lake without needing to query a data warehouse or lakehouse endpoint, and without needing to import or duplicate data into a Power BI semantic model. DirectLake is a fast path to load the data from the data lake straight into the Power BI engine, ready for analysis.
In traditional DirectQuery mode, the Power BI engine directly queries the data from the source for each query execution, and the query performance depends on the data retrieval speed. DirectQuery eliminates the need to copy data, ensuring that any changes in the source are immediately reflected in query results.
For more details follow the guideline: how to create a Power BI report in Microsoft Fabric.
Important
- Default refreshing cycle is every 24 hours.
- Purview MSI need contributor access to your Fabric workspace if you are subscribing Microsoft Purview metadata to publish into your fabric worksapce.
- Purview MSI need Storage Blob Data Contributor access to your Azure Data Lake Storage Gen2 if you are subscrbing purview metadata to publish into your adlsg2 container.
Note
- Scheduling the data refreshing job is not yet supported.
- vNet is not supported yet.