Automate Lineage into azure purview from Azure Databricks Unity Catalog

Sri Lakshman Velugubantla 20 Reputation points
2025-02-13T08:15:02.3066667+00:00

Hi Team,

In my last question someone from Microsoft team suggested to create lineage tables separately and scan the databricks unity catalog source and then lineage automatically came to azure purview.

But when I tried this solution, I'm not able to get the lineage and purview treated those tables as normal tables instead of metadata tables so, I didn't get lineage automatically.

Ideally as far i know purview will connect to databricks api and pull this metadata so how to include these two tables in metadata

Is there any other way to get lineage automatically from unity catalog or any way that purview can scan those tables as metadata tables

Can you please help me on this issue

If any solution is there, please mention steps as well to achieve this solution.

Thanks in Advance.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,331 questions
Microsoft Purview
Microsoft Purview
A Microsoft data governance service that helps manage and govern on-premises, multicloud, and software-as-a-service data. Previously known as Azure Purview.
1,388 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 17,571 Reputation points
    2025-02-13T18:32:11.1633333+00:00

    Hello Sri Lakshman Velugubantla,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are facing some challenges with integrating Databricks Unity Catalog with Azure Purview to automatically capture lineage. Meaning, lineage is not automatically captured from Unity Catalog in Azure Purview.

    The solution below will ensure Azure Purview Captures Lineage from Unity Catalog:

    1. Ensure lineage is enabled in Unity Catalog. Databricks automatically tracks lineage for notebooks, jobs, and queries, but it must be enabled. Run the following SQL command in Databricks to check lineage data:
     SHOW TABLE EXTENDED IN catalog_name.schema_name LIKE 'table_name';
    

    If lineage metadata is missing, Unity Catalog may not be tracking it properly.

    1. To ensure Databricks Unity Catalog API Permissions you will need Azure Purview fetches metadata via Databricks REST API by making the service principal or managed identity assigned to Purview has the correct permissions. RUN the following commands:
         databricks permissions get --table-name "<your_table>"
    
         # If the Purview service principal does not have access, grant it:
         databricks permissions set --table-name "<your_table>" --json '{
           "access_control_list": [
             {
               "user_name": "<purview-service-principal>",
               "permission_level": "READ_METADATA"
             }
           ]
         }'
    
    1. Then you will need to modify table properties to Ensure Metadata Recognition. If Purview treats the tables as normal instead of metadata tables, explicitly mark them in Unity Catalog:
      ALTER TABLE catalog_name.schema_name.table_name SET TBLPROPERTIES ('metadata' = 'true');
    

    Retry scanning after setting this property.

    1. The next is to ensure Proper Azure Purview Scan Configuration and when setting up the scan:
    • Choose Databricks Unity Catalog as the data source (not a standard Databricks connection).
    • Select Managed Identity authentication and verify it has Metadata Reader permissions.
    • Ensure Lineage Extraction is explicitly enabled in scan settings.
    1. Now, if automatic lineage retrieval does not work, use Azure Purview REST API to manually register lineage. For an example:
        {
           "typeName": "DataSet",
           "attributes": {
             "name": "my_table",
             "qualifiedName": "databricks://catalog_name/schema_name/my_table",
             "lineage": {
               "inputs": ["source_table_1", "source_table_2"],
               "outputs": ["my_table"]
             }
           }
         }
    

    Use POST https://{purview-endpoint}/catalog/api/atlas/v2/entity to register it.

    1. After running the scan, go to Azure Purview > Data Map > Lineage and check if the tables appear with correct lineage information.

    My final recommendation: If the problem persists, manually verify API responses from Databricks:

    curl -X GET -H "Authorization: Bearer <token>" \
           "https://<databricks-instance>/api/2.0/unity-catalog/tables"
    

    If metadata is missing, the issue is on Databricks' side.

    • If metadata is present but not appearing in Purview, the issue is with Purview ingestion.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.