ADF/ Synapse Activity times out after 12 hours

Marc 0 Reputation points
2025-01-13T23:02:58.05+00:00

Hello,

I have a Notebook activity in one of my pipelines that is expected to run longer than 12 hours.

Below we configured the timeout to the max value, 7 days.User's image

But the activity would still timeout after 12 hours.
image.png

Is there a way for the activity timeout to reflect the value that I set(7:00:00:00) ? Or is there anything that would override the Timeout parameter to default back to the 12 hr timeout?

The need for a longer timeout is due to this function in my notebook

mssparkutils.fs.ls(file_path).

mssparkutils.fs.ls() lists all contents in the file path. The amount of files I'm looking at is in the million, alternatively is there way to only read through a limited number of files?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,527 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,327 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,135 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,120 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 27,201 Reputation points MVP
    2025-01-14T06:24:29.47+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    Potential Causes and Solutions:

    1. Integration Runtime Timeout Settings:
      • Cause: The IR used by your pipeline may have its own timeout settings that override individual activity settings.
      • Solution: Review the timeout configurations for the IR associated with your pipeline. Ensure that its timeout is set to accommodate longer running activities.
    2. Service Limitations:
    • Cause: Azure services sometimes enforce maximum execution durations for activities to maintain system reliability.
      • Solution: Consult the official ADF and Synapse documentation or contact Azure support to confirm if there's a hard limit on activity durations.
    1. Activity-Specific Settings:
    • Cause: Certain activities might have inherent timeout settings that need adjustment.
      • Solution: Double-check the timeout property within the Notebook activity configuration to ensure its correctly set and not being overridden by other settings.

    Optimizing mssparkutils.fs.ls(file_path) Usage:

    The mssparkutils.fs.ls(file_path) function lists all contents in the specified directory. When dealing with millions of files, this operation can become resource-intensive and time-consuming. To enhance performance:

    • Limit the Number of Files Processed:
      • Approach: Instead of processing all files simultaneously, consider implementing logic to process files in batches or based on specific criteria (date ranges, file name patterns).
      • Implementation: Use filtering functions or parameters within your notebook to target a subset of files. For example, if files are named with date stamps, process only files from a particular date range.
      Parallel Processing:
      • Approach: Distribute the workload across multiple nodes to handle large datasets more efficiently.
        • Implementation: Utilize Spark's parallel processing capabilities to process multiple files concurrently, reducing overall execution time.
        Optimize Storage Access:
        - **Approach:** Ensure that your storage access patterns are efficient to minimize latency.
        
           - **Implementation:** Consider partitioning your data in Azure Data Lake Storage to enable faster access and processing of specific data segments.
        

    By addressing the potential causes of the timeout and optimizing the file listing operation, you can enhance the performance and reliability of your pipeline activities.

    Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

    0 comments No comments

  2. phemanth 13,060 Reputation points Microsoft Vendor
    2025-01-15T11:12:50.3433333+00:00

    @Marc

    Thanks for reaching out to Microsoft Q&A

    It seems like you're dealing with a challenging issue. To address This.

    Timeout Configuration: If you've set the timeout to 7 days but the activity still times out after 12 hours, there might be other settings or limitations in your environment that override the timeout parameter. Here are a few things to check:

    • Pipeline Settings: Ensure there are no other timeout settings at the pipeline level that might be affecting your activity.
    • Resource Limits: Sometimes, resource limits or quotas set by your cloud provider can cause timeouts. Check if there are any such limits in place.
    • Service Limits: Verify if there are any service-specific limits that might be causing the timeout.

    Optimizing mssparkutils.fs.ls: Since mssparkutils.fs.ls(file_path) lists all contents in the file path and you have millions of files, this can be very time-consuming. Here are a few alternatives:

    • Pagination: If possible, implement pagination to process a limited number of files at a time.
    • Filtering: Use filters to narrow down the list of files to only those you need to process.
    • Parallel Processing: If your environment supports it, consider parallel processing to speed up the listing and processing of files.

    Hope this helps. Please Do let us know if you any further queries.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.