Data quality for Microsoft Fabric shortcut databases

Note

The Microsoft Purview Data Catalog is changing its name to Microsoft Purview Unified Catalog. All the features will stay the same. You'll see the name change when the new Microsoft Purview Data Governance experience is generally available in your region. Check the name in your region.

Shortcuts are objects in OneLake that point to other storage locations. The location can be internal or external to OneLake. The location that a shortcut points to is known as the target path of the shortcut. The location where the shortcut appears is known as the shortcut path. Shortcuts appear as folders in OneLake and any workload or service that has access to OneLake can use them.

Shortcuts in Microsoft OneLake allow you to unify your data across domains, clouds, and accounts by creating a single virtual data lake for your entire enterprise. All Microsoft Fabric experiences and analytical engines can directly connect to your existing data sources such as Azure, Amazon Web Services (AWS), and OneLake through a unified namespace. OneLake manages all permissions and credentials so you don't need to separately configure each Fabric workload to connect to each data source.

For more details about Microsoft Fabric shortcuts, review the Fabric documentation.

Configure data quality for Fabric shortcut databases

Log in to your Microsoft Fabric workspace. Select the ellipsis button under Tables, and select New Shortcut. From here you can create:

Screenshot of the Fabric workspace, with the new shortcut button highlighted.

Azure Data Lake Gen2 shortcut

  1. Select the Azure Data Lake Storage Gen2 shortcut from Fabric workspace New shortcut page.

    Screenshot of the Fabric  new shortcut page with ADLS Gen2 highlighted.

  2. Select ADLS Gen2 SAS authentication.

    Screenshot of  the new shortcut window with the SAS token authentication selected.

  3. Generate a SAS and connection string for your ADLS Gen2 resource in the Azure portal.

  4. Copy the endpoint of the data lake.

    Screenshot of copying the data lake end point in the Azure portal.

  5. Add storage details for the shortcut storage.

    Screenshot to add storage details to the Fabric shortcut in the new shortcut window.

  6. Navigate to and choose the correct delta folder.

    Screenshot to choose correct delta folder in the new shortcut window.

  7. Preview the shortcut delta table in your Fabric workspace.

    Screenshot of the OneLake delta table preview.

  8. Start a scan of your Azure Data Lake Gen2 resource in the Microsoft Purview Data Map using service principal authentication.

    Screenshot of the data map scan for ADLS Gen2.

  9. Once the scan is finished, your data asset should appear in the data catalog as a lakehouse table.

  10. Associate the asset with a data product for curation and data quality assessment.

    Screenshot of shortcut data asset in catalog.

  11. Open the Microsoft Purview Data Quality solution and run a data quality scan or profile your data as usual.

Amazon S3 shortcut

  1. Select New shortcut in the Microsoft Fabric workspace.

  2. Select AWS S3 and add the URL, access key ID, and access key shortcut.

    Screenshot of the Amazon S3 new shortcut page with added details.

  3. Add the connection URL and storage details.

    Screenshot of the Amazon S3 new shortcut page with added connection URL and storage details.

  4. Preview the shortcut in your Fabric workspace.

  5. Start a scan of your Amazon S3 resource in the Microsoft Purview Data Map using service principal authentication.

  6. Once the scan is finished, your data asset should appear in the data catalog.

  7. Associate the asset with a data product for curation and data quality assessment.

  8. Open the Microsoft Purview Data Quality solution and run a data quality scan or profile your data as usual.

Google Cloud Storage (GCS) shortcut

  1. Select New shortcut in the Microsoft Fabric workspace.

  2. Select Google Cloud Storage and add the URL, access key ID, and access key shortcut.

    Screenshot of GCS shortcut HMAC key.

  3. Add the connection URL and storage details,

    Screenshot of GCS connection url.

  4. Preview the shortcut in your Fabric workspace.

  5. Start a scan of your Amazon S3 resource in the Microsoft Purview Data Map using service principal authentication.

  6. Once the scan is finished, your data asset should appear in the data catalog.

  7. Associate the asset with a data product for curation and data quality assessment.

  8. Open the Microsoft Purview Data Quality solution and run a data quality scan or profile your data as usual.

Important

  • Use a service principal for data map scans and managed identity for data quality scans.
  • Any data sourced through a shortcut will be processed in the same region.
  • The metadata harvest in Purview for Fabric Lakehouse subartifacts is an enhancement based on the metadata harvest for Fabric which was released in December 2023. This feature is at private preview stage.
  • There is a dependency on Fabric team to differentiate shortcut items from native items in the OneLake SDK for Lakehouse subartifacts. For now all shortcut items (tables and files) will be considered as native items in scanning. You need to allowlist your tenant to enable fabric lakehouse data DQ assessment.