Unable to run Spark activity through data factory pipeline

Aleksandr Maxermis 0 Reputation points
2024-07-31T10:23:06.4366667+00:00

After configuring the network and ensuring that the test connection between data factory and the respective spark cluster is successful, I am unable to submit nor view any spark jobs that are submitted through the data pipeline using the spark activity. When I curl to the spark cluster using Livy, I am able to successfully run the script. Note that the script is within the associated storage account and the same script in this location was referenced in the curl command. The script is successful on running with Livy but is unable to run using the pipeline. Hope to get some help on how I could resolve this!

 

Azure HDInsight
Azure HDInsight
An Azure managed cluster service for open-source analytics.
216 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,913 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 17,520 Reputation points Microsoft Vendor
    2024-08-02T11:25:26.2633333+00:00

    Hi @Aleksandr Maxermis
    Thanks for the question and using MS Q&A platform.
    As I understand that you have already verified that the network is configured correctly and that the test connection between Data Factory and the Spark cluster is successful. Since you are able to run the script successfully using Livy, it is possible that there is an issue with the Spark activity configuration in your pipeline.

    Here are some things you can check to troubleshoot the issue:

    • Verify that the Spark activity is configured correctly in your pipeline. Make sure that the Spark cluster and the script path are specified correctly in the activity settings.
    • Check the logs for the Spark activity in Data Factory. The logs may provide more information about the error that is occurring.
    • Check the permissions for the storage account where the script is located. Make sure that the account has the necessary permissions to access the script.
    • Check the version of Spark that you are using. Make sure that the version is compatible with the version of Data Factory that you are using.
    • Try running a different script using the Spark activity to see if the issue is specific to the script you are using.

    If none of these steps resolve the issue, you may need to contact Microsoft support for further assistance.
    Reference: https://learn.microsoft.com/en-us/azure/data-factory/transform-data-using-spark

    I hope this helps! Let me know if you have any further questions.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.