ADF/ Synapse Activity times out after 12 hours

Marc 0 Reputation points
2025-01-13T23:02:58.05+00:00

Hello,

I have a Notebook activity in one of my pipelines that is expected to run longer than 12 hours.

Below we configured the timeout to the max value, 7 days.User's image

But the activity would still timeout after 12 hours.
image.png

Is there a way for the activity timeout to reflect the value that I set(7:00:00:00) ? Or is there anything that would override the Timeout parameter to default back to the 12 hr timeout?

The need for a longer timeout is due to this function in my notebook

mssparkutils.fs.ls(file_path).

mssparkutils.fs.ls() lists all contents in the file path. The amount of files I'm looking at is in the million, alternatively is there way to only read through a limited number of files?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,526 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,317 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,123 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,106 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Vinodh247 27,016 Reputation points MVP
    2025-01-14T06:24:29.47+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Potential Causes and Solutions:

    1. Integration Runtime Timeout Settings:
      • Cause: The IR used by your pipeline may have its own timeout settings that override individual activity settings.
      • Solution: Review the timeout configurations for the IR associated with your pipeline. Ensure that its timeout is set to accommodate longer running activities.
    2. Service Limitations:
    • Cause: Azure services sometimes enforce maximum execution durations for activities to maintain system reliability.
      • Solution: Consult the official ADF and Synapse documentation or contact Azure support to confirm if there's a hard limit on activity durations.
    1. Activity-Specific Settings:
    • Cause: Certain activities might have inherent timeout settings that need adjustment.
      • Solution: Double-check the timeout property within the Notebook activity configuration to ensure its correctly set and not being overridden by other settings.

    Optimizing mssparkutils.fs.ls(file_path) Usage:

    The mssparkutils.fs.ls(file_path) function lists all contents in the specified directory. When dealing with millions of files, this operation can become resource-intensive and time-consuming. To enhance performance:

    • Limit the Number of Files Processed:
      • Approach: Instead of processing all files simultaneously, consider implementing logic to process files in batches or based on specific criteria (date ranges, file name patterns).
      • Implementation: Use filtering functions or parameters within your notebook to target a subset of files. For example, if files are named with date stamps, process only files from a particular date range.
      Parallel Processing:
      • Approach: Distribute the workload across multiple nodes to handle large datasets more efficiently.
        • Implementation: Utilize Spark's parallel processing capabilities to process multiple files concurrently, reducing overall execution time.
        Optimize Storage Access:
        - **Approach:** Ensure that your storage access patterns are efficient to minimize latency.
        
           - **Implementation:** Consider partitioning your data in Azure Data Lake Storage to enable faster access and processing of specific data segments.
        

    By addressing the potential causes of the timeout and optimizing the file listing operation, you can enhance the performance and reliability of your pipeline activities.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.