Azure Data Factory and DataBricks end-to-end project

paul sudarshan 20 Reputation points
2025-03-06T12:19:52.7433333+00:00

my goal is to copy data from ADLS using ADF(copy activity) and connecting that data to databricks to process and analyse the data and after completion I want save the processed data back to the storage account

so, I have my dataset in my local system so I created storage account in the azure and inside that storage I have created 5 containers(1.source, 2.raw-data, 3.sink,4.processed-data, 5.final-sink )

coming back to ADF created linked service for storage and databricks and next created a new pipeline in it added a copy activity and in the source I added source container which I have created earlier in storage account and in sink I added sink container and next I added notebook and configured it with connecting the linked service and notebook path to it after all the set up while triggering I have issue

it will run sometimes but majority of the times its failing only in the notebook part so please help in resolving the issue and run my pipeline successfully thank you

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,323 questions
{count} votes

Accepted answer
  1. Chandra Boorla 9,835 Reputation points Microsoft External Staff
    2025-03-06T18:30:31.5633333+00:00

    Hi @paul sudarshan

    Thank you for posting your query!

    It looks like you have set up an Azure Data Factory pipeline to copy data from Azure Data Lake Storage (ADLS) and process it in Databricks, but you are facing intermittent failures in the Notebook Activity. Here are some potential causes and solutions to help you resolve this issue:

    Possible Causes of Notebook Failure

    The issue is likely happening due to one of the following reasons:

    Databricks Cluster Issues - If you're using an interactive cluster, it might be shutting down due to inactivity. Try switching to a job cluster, which spins up when needed and terminates after execution, ensuring better stability. Ensure the cluster is running before the notebook execution starts.

    Databricks Linked Service Authentication Issues - If your Linked Service authentication (PAT token, Key Vault, etc.) has expired, the connection may fail. Try revalidating the Databricks Linked Service in ADF.

    Errors in the Notebook Code - The notebook might fail due to incorrect data paths, missing libraries, or schema mismatches. Try running the notebook manually in Databricks to check for errors before executing it from ADF.

    Network or Firewall Restrictions - If Databricks is inside a private VNet, ADF might not be able to reach it. Check NSG (Network Security Group) rules and firewall settings to ensure ADF has access to Databricks.

    Cluster Performance & Resource Limitations - If the cluster runs out of memory, the notebook might fail. Try increasing the worker node count or using a more powerful instance type. Enable auto-scaling in Databricks to allocate more resources dynamically.

    How to Debug & Fix

    • Check ADF Monitor Logs → Go to Monitor > Pipeline Runs > Activity Runs and review the Notebook Activity logs for error messages.
    • Run the Notebook Manually in Databricks → This will help determine if the issue is within the notebook or ADF.
    • Check the Databricks Jobs UI → Logs from failed job runs in Databricks can provide additional error details.
    • Use a Job Cluster Instead of an Interactive Cluster → This prevents failures due to inactive clusters.
    • Enable Retry Logic in ADF → Configure Retry Settings in the Notebook Activity to automatically retry execution in case of temporary failures.
    • Validate Data Paths in the Notebook → Ensure the file paths inside the notebook are correct, especially if you’re using DBFS mounts or direct storage access.

    For more details, please refer the following Microsoft documentations:

    I hope this information helps. Please do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.