How to get lineage from Power BI into Microsoft Purview
This article elaborates on the data lineage for Power BI sources in Microsoft Purview.
Prerequisites
To see data lineage in Microsoft Purview for Power BI, you must first register and scan your Power BI source.
Common scenarios
After a Power BI source has been scanned, lineage information for your current data assets, and data assets referenced by Power BI, will automatically be added in the Microsoft Purview Unified Catalog.
Data consumers can perform root cause analysis of a report or dashboard from Microsoft Purview. For any data discrepancy in a report, users can easily identify the upstream datasets and contact their owners if necessary.
Data producers can see the downstream reports or dashboards consuming their dataset. Before making any changes to their datasets, the data owners can make informed decisions.
Users can search by name, endorsement status, sensitivity label, owner, description, and other business facets to return the relevant Power BI artifacts.
Power BI artifacts in Microsoft Purview
Once the scan of your Power BI is complete, following Power BI artifacts will be inventoried in Microsoft Purview:
- Workspaces
- Dashboards
- Reports
- Datasets
- Dataflows
- Datamarts
Lineage of Power BI artifacts in Microsoft Purview
Users can search for a Power BI artifact by name, description, or other details to see relevant results. Under the asset overview and properties tabs, the basic details such as description, classification are shown. Under the lineage tab, asset relationships are shown with the upstream and downstream dependencies.
Microsoft Purview captures lineage among Power BI artifacts (for example: Dataflow -> Dataset -> Report -> Dashboard) and external data assets.
Note
For lineage between Power BI artifacts and external data assets, currently the supported source types are:
- Azure SQL Database
- Azure Blob Storage
- Azure Data Lake Store Gen1
- Azure Data Lake Store Gen2
In addition, column level lineage (Power BI subartifact lineage) and transformation inside of Power BI datasets (except for Dataflows) are captured when using Azure SQL Database as source. For measures, you can further select column -> Properties -> expression to see the transformation details.
Note
Column level lineage and transformations (except for Dataflows) are supported when using Azure SQL Database as source. Other sources are currently not supported.
Known limitations
- Microsoft Purview leverages the scanner API to retrieve the metadata and lineage. Learn about some API limitations from Metadata scanning - Considerations and limitations.
- In case you have the dataset table connected to another dataset table, when the middle dataset disables the "Enable load" option inside the Power BI desktop, and the lineage can't be extracted.
- For lineage between Power BI artifacts and external data assets:
- Currently the supported source types are Azure SQL Database, Azure Blob Storage, Azure Data Lake Store Gen1 and Azure Data Lake Store Gen2.
- Column level lineage and transformations (except for Power BI Dataflows) are only supported when using Azure SQL Database as source. Other sources are currently not supported.
- Lineage is not captured when you use dynamic M query parameters in Power BI, e.g. pass server/database names as parameter values.
- Limited information is currently shown for data sources where the Power BI Dataflow is created. For example, for a SQL server source of Power BI dataset, only server/database name is captured.
- Note due to the following limitations, if you have such scenarios and scan both Power BI and the data sources that Power BI artifacts connect to, currently you may see duplicate assets in the catalog.
- The source object names in assets and fully qualified names follow the case used in Power BI settings/queries, which may not align with the object case in original data source.
- When Power BI references SQL views, they are currently captured as SQL table assets.
- When Power BI references Azure Dedicated SQL pool (formerly SQL DW) sources, it's currently listed as Azure SQL Database assets. (Lineage between Power BI artifacts and external Azure Dedicated SQL pools isn't currently supported.)
- For Power BI subartifact lineage:
- Some measures aren't shown in the subartifact lineage, for example,
COUNTROWS
. - In the lineage graph, when selecting a measure that is derived by columns using the COUNT function, the underlying column isn't selected automatically. Check the measure expression in the column properties tab to identify the underlying column.
- If you scanned your Power BI source before subartifact lineage was supported, you may see a database asset along with the new table assets in the lineage graph, which isn't removed.
- Some measures aren't shown in the subartifact lineage, for example,
- Currently for the same data asset in Azure SQL Database which has UTF-8 characters in name, the full qualified name generated from scanning Power BI may be different with the fully qualified name generated from scanning Azure SQL Database as the former will go through additional encoding, and it may introduce duplicate Azure SQL data assets.