Hello Sri Lakshman Velugubantla,
Welcome to the Microsoft Q&A and thank you for posting your questions here.
I understand that you are facing some challenges with integrating Databricks Unity Catalog with Azure Purview to automatically capture lineage. Meaning, lineage is not automatically captured from Unity Catalog in Azure Purview.
The solution below will ensure Azure Purview Captures Lineage from Unity Catalog:
- Ensure lineage is enabled in Unity Catalog. Databricks automatically tracks lineage for notebooks, jobs, and queries, but it must be enabled. Run the following SQL command in Databricks to check lineage data:
SHOW TABLE EXTENDED IN catalog_name.schema_name LIKE 'table_name';
If lineage metadata is missing, Unity Catalog may not be tracking it properly.
- To ensure Databricks Unity Catalog API Permissions you will need Azure Purview fetches metadata via Databricks REST API by making the service principal or managed identity assigned to Purview has the correct permissions. RUN the following commands:
databricks permissions get --table-name "<your_table>"
# If the Purview service principal does not have access, grant it:
databricks permissions set --table-name "<your_table>" --json '{
"access_control_list": [
{
"user_name": "<purview-service-principal>",
"permission_level": "READ_METADATA"
}
]
}'
- Then you will need to modify table properties to Ensure Metadata Recognition. If Purview treats the tables as normal instead of metadata tables, explicitly mark them in Unity Catalog:
ALTER TABLE catalog_name.schema_name.table_name SET TBLPROPERTIES ('metadata' = 'true');
Retry scanning after setting this property.
- The next is to ensure Proper Azure Purview Scan Configuration and when setting up the scan:
- Choose Databricks Unity Catalog as the data source (not a standard Databricks connection).
- Select Managed Identity authentication and verify it has Metadata Reader permissions.
- Ensure Lineage Extraction is explicitly enabled in scan settings.
- Now, if automatic lineage retrieval does not work, use Azure Purview REST API to manually register lineage. For an example:
{
"typeName": "DataSet",
"attributes": {
"name": "my_table",
"qualifiedName": "databricks://catalog_name/schema_name/my_table",
"lineage": {
"inputs": ["source_table_1", "source_table_2"],
"outputs": ["my_table"]
}
}
}
Use POST https://{purview-endpoint}/catalog/api/atlas/v2/entity
to register it.
- After running the scan, go to Azure Purview > Data Map > Lineage and check if the tables appear with correct lineage information.
My final recommendation: If the problem persists, manually verify API responses from Databricks:
curl -X GET -H "Authorization: Bearer <token>" \
"https://<databricks-instance>/api/2.0/unity-catalog/tables"
If metadata is missing, the issue is on Databricks' side.
- If metadata is present but not appearing in Purview, the issue is with Purview ingestion.
I hope this is helpful! Do not hesitate to let me know if you have any other questions.
Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.