Help with Sporadic Internal Errors in Azure Automation Runbooks

Ivaylo Milkov 10 Reputation points
2024-12-04T11:15:16.5733333+00:00

Hello everyone,

I’m hoping someone in the community has faced a similar issue and can share their insights. Since August, we’ve been experiencing problems with our Azure Automation Runbooks, originally running on the PowerShell 5.1 runtime. The main issue is that jobs sporadically fail with an "Internal Error," as indicated in the exception block of the affected jobs.

This makes troubleshooting quite challenging:

  • No clear pattern: The errors occur irregularly, whether at the start or near the end of the job execution.
  • Lack of detailed error messages: We only see the "Job was suspended due to an internal error. Please retry after sometime." message without additional context to pinpoint the root cause.

Here’s what we’ve tried so far:

  1. Avoiding parallel executions: We adjusted the scheduling of nightly jobs to eliminate parallel runs, but this hasn’t resolved the issue.
  2. Throttling as a suspect: Based on Azure diagnostics, we suspected a throttling issue and added Sleep commands in several parts of the Runbooks, but this also didn’t help.
  3. Upgrading the runtime: We migrated the Runbooks from PowerShell 5.1 to PowerShell 7.4, but the errors persist even in the updated environment.
  4. Analysis with Azure metrics: We tried investigating the throttling hypothesis using Azure’s built-in metrics but couldn’t find any relevant data to confirm this.

Has anyone else encountered similar problems or have suggestions on how to proceed? Specifically:

  • Is there a way to retrieve more detailed error messages for these jobs?
  • Are there tools or best practices for better analyzing potential throttling issues?
  • Could there be an alternative approach to improving the stability of our Runbooks?

Thank you in advance for your support! I’m happy to provide more details if needed.

Best regards

Azure Automation
Azure Automation
An Azure service that is used to automate, configure, and install updates across hybrid environments.
1,278 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Pranay Reddy Madireddy 1,230 Reputation points Microsoft Vendor
    2024-12-05T21:20:28.0066667+00:00

    Hi Ivaylo Milkov

    Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.

    Run the script on your local machine first to check for issues like missing modules, syntax errors, or logic mistakes before deploying it to Azure.

    Check that all needed modules are in your Automation account. If your runbook uses any, make sure they are updated and properly installed to avoid unexpected errors.

    Add more output statements to your runbook to track its execution flow. This will help you determine what occurs just before the runbook is suspended or fails.

    Since you've upgraded to PowerShell 7.4, make sure all scripts and modules are compatible with this version, as it could offer better stability than previous versions.

    If relevant, deploying Hybrid Runbook Workers can help resolve issues by running jobs closer to the resources they manage.

    For long-running tasks, using checkpoints can help control the execution flow and recover from failures without losing progress.

    For reference, please review this documentation:-
    https://learn.microsoft.com/en-us/azure/automation/troubleshoot/runbooks
    https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/automation/troubleshoot/runbooks.md?plain=1
    https://learn.microsoft.com/en-us/azure/automation/troubleshoot/extension-based-hybrid-runbook-worker
    https://github.com/uglide/azure-content/blob/master/articles/automation/automation-troubleshooting-automation-errors.md

    If you have any further queries, do let us know.


    If the answer is helpful, please click "Accept Answer" and "Upvote it".

    1 person found this answer helpful.
    0 comments No comments

  2. Ellis 0 Reputation points
    2024-12-20T10:20:05.66+00:00

    We have had similar issues and have followed the support route. Long story short is that the Azure Automation platform itself is broken and probably won't be fixed for at least a quarter. We have been advised to use hybrid workers in the meantime. We are escalating the issue.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.