How to fix and troubleshoot 503 errors in Azure Cosmos DB with substatus code of 20003

Katherine Bayona 0 Reputation points
2025-01-20T09:40:50.08+00:00

Hi,

We are currently experiencing intermittent 503 errors in our cosmos db. Below is the error message:

ServiceUnavailable`` (503); Substatus: 20003; ActivityId: 0b3f26d8-f0ea-4d67-9260-c984144c7f7f; Reason: (GatewayStoreClient Request Timeout. Start Time UTC:01/20/2025 06:23:33; Total Duration:71507.1309 Ms; Request Timeout 65000 Ms; Http Client Timeout:65000 Ms; Activity id: 0b3f26d8-f0ea-4d67-9260-c984144c7f7f;)

Looking at the exception message it is showing "A task has been cancelled". When this happens, when we check the RU consumption, it is not at 100%.

What are the possible causes of this and what is the best way to troubleshoot?

Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,745 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sai Raghunadh M 2,230 Reputation points Microsoft Vendor
    2025-01-20T11:37:06.4266667+00:00

    Hi @Katherine Bayona

    Thanks for the Question and using Microsoft Q&A.

    It seems like you are encountering issues with Azure Cosmos DB. The 503 Service Unavailable error with substatus code 20003 signifies underlying I/O errors related to the operating system.

    User's image

    "A task has been cancelled" message and the fact that RU consumption is not at 100% indicate that the issue may be related to client-side timeouts or resource starvation.

    Here are some steps to troubleshoot and potentially resolve this problem:

    Please verify that your client machine is not experiencing high CPU or memory usage, as this can lead to timeouts.

    https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/troubleshoot-dotnet-sdk-request-timeout?tabs=cpu-new#high-cpu-utilization

    Please ensure your application includes a robust retry mechanism to effectively handle transient errors. This will assist in recovering from temporary issues.

    https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/conceptual-resilient-sdk-applications#should-my-application-retry-on-errors

    consider adjusting the timeout settings for your client requests to allow additional time for operations to complete successfully. Please go through this similar thread for reference:

    https://github.com/Azure/azure-cosmos-db-emulator-docker/issues/63

    Check for any network latency or connectivity issues between your client and Azure Cosmos DB

    https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/troubleshoot-dotnet-sdk#high-network-latency

    Please go through this similar thread that might help you:https://learn.microsoft.com/en-us/answers/questions/1414611/microsoft-azure-cosmos-cosmosexception-response-st

    Hope this helps. Do let us know if you any further queries. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.