ADF/ Synapse Activity times out after 12 hours

Question

Hello,

I have a Notebook activity in one of my pipelines that is expected to run longer than 12 hours.

Below we configured the timeout to the max value, 7 days. User's image

But the activity would still timeout after 12 hours.

Is there a way for the activity timeout to reflect the value that I set(7:00:00:00) ? Or is there anything that would override the Timeout parameter to default back to the 12 hr timeout?

The need for a longer timeout is due to this function in my notebook

mssparkutils.fs.ls(file_path).

mssparkutils.fs.ls() lists all contents in the file path. The amount of files I'm looking at is in the million, alternatively is there way to only read through a limited number of files?

Answer

Hi ,

Thanks for reaching out to Microsoft Q&A.

Potential Causes and Solutions:

Integration Runtime Timeout Settings:
- Cause: The IR used by your pipeline may have its own timeout settings that override individual activity settings.
- Solution: Review the timeout configurations for the IR associated with your pipeline. Ensure that its timeout is set to accommodate longer running activities.
Service Limitations:

Cause: Azure services sometimes enforce maximum execution durations for activities to maintain system reliability.
- Solution: Consult the official ADF and Synapse documentation or contact Azure support to confirm if there's a hard limit on activity durations.

Activity-Specific Settings:

Cause: Certain activities might have inherent timeout settings that need adjustment.
- Solution: Double-check the timeout property within the Notebook activity configuration to ensure its correctly set and not being overridden by other settings.

Optimizing mssparkutils.fs.ls(file_path) Usage:

The mssparkutils.fs.ls(file_path) function lists all contents in the specified directory. When dealing with millions of files, this operation can become resource-intensive and time-consuming. To enhance performance:

Limit the Number of Files Processed:
- Approach: Instead of processing all files simultaneously, consider implementing logic to process files in batches or based on specific criteria (date ranges, file name patterns).
- Implementation: Use filtering functions or parameters within your notebook to target a subset of files. For example, if files are named with date stamps, process only files from a particular date range.
Parallel Processing:
- Approach: Distribute the workload across multiple nodes to handle large datasets more efficiently.
  - Implementation: Utilize Spark's parallel processing capabilities to process multiple files concurrently, reducing overall execution time.
  Optimize Storage Access:
```
- **Approach:** Ensure that your storage access patterns are efficient to minimize latency.

   - **Implementation:** Consider partitioning your data in Azure Data Lake Storage to enable faster access and processing of specific data segments.
```

By addressing the potential causes of the timeout and optimizing the file listing operation, you can enhance the performance and reliability of your pipeline activities.

Please feel free to click the 'Upvote' (Thumbs-up) button and 'Accept as Answer'. This helps the community by allowing others with similar queries to easily find the solution.

Share via

ADF/ Synapse Activity times out after 12 hours

1 answer

Your answer