The problem was isolated to an internal application issue and was not a failure in azure infrastructure.
Azure Front Door 499 responses from pop_s EWR
Front Door is reporting a lot of 499 responses from the EMR pop location, significantly more than I'm seeing from other locations.
The back-end of this Front Door instance is an API Management instance. The back-end of that is an app service.
Below is a screenshot showing the results I'm seeing with KQL where you can see that EWR is the noisiest of all the locations.
What are some things that could cause this behavior?
Azure Front Door
-
GitaraniSharma-MSFT • 49,691 Reputation points • Microsoft Employee
2023-08-23T12:58:05.1333333+00:00 Hello @EJ Marmonti ,
Welcome to Microsoft Q&A Platform. Thank you for reaching out & hope you are doing well.
I understand that your Azure Front Door is reporting a lot of 499 responses from the EMR pop location, and you would like to know what could cause this behavior.
As per the Azure Front Door metrics and logs document,
The HTTP status code returned from Azure Front Door. If the request to the origin timed out, the value for the HttpStatusCode field is 0. If the client closed the connection, the value for the HttpStatusCode field is 499.
I checked internally and found that this new http status code of 499 could be due to the ongoing rollout of new software for POPs. The new software returns a 499 for client disconnections whereas the previous software returned a 50x error.
You can try to increase the Azure Front Door timeout setting and check if that helps fix the issue.
You can increase the default timeout to up to 4 minutes (240 seconds). To configure the setting, go to overview page of the Front Door profile. Select Origin response timeout and enter a value between 16 and 240 seconds.
Refer: https://learn.microsoft.com/en-us/azure/frontdoor/troubleshoot-issues#troubleshooting-steps
Regards,
Gita
-
EJ Marmonti • 146 Reputation points
2023-08-23T13:50:22.0833333+00:00 Hello,
I understand that a 499 translates to a client disconnect. The timeout is already set to 4 minutes / 240 seconds. The thing is, it looks like most of these are disconnecting after 0 seconds from EWR.
If I look over the past 7 days at traffic, I can see that EWR responded with a 499 on almost 6k requests where timeTaken_d was 0. This number id drastically different from the other pop locations:
Further, if I query the same thing, except where timeTaken_d > 0, the results looks more evenly distributed across the 8 pop locations:
Additionally, if I generate a timechart with data since January with the above query (where timeTaken_d == 0), it tells me that these 499's at EWR started around 7/10/2023, and as you can see it has significantly more 499 responses than the other pop locations:
Lastly, and just for reference, here is a chart where I'm not filtering for a specific httpStatusCode, nor a timeTaken_d value, and it's just to prove that we do have all of this data in the logs going back to January 2023, alas showing the 499's where timeTaken == 0 began showing up in July:
So it still feels like there is maybe something going on with EWR when you consider that timeTaken_d == 0. Could it be that maybe there is some other setting I'm missing that is causing clients to terminate the moment it connects? I'm trying to understand what would cause timeTaken_d to be 0.
Thank you
-
GitaraniSharma-MSFT • 49,691 Reputation points • Microsoft Employee
2023-08-23T15:23:07.56+00:00 Thank you for the details, @EJ Marmonti .
To find out the exact root cause behind this issue, we may need to look into the backend logs and would need access to your resources for further investigation. Hence, if you have a support plan, I request you file a support ticket, else please do let us know, we will try and help you get a one-time free technical support.
In case you need help with a one-time free technical support, please send an email with subject line "ATTN gishar | Azure Front Door 499 responses from pop_s EWR" to AzCommunity[at]Microsoft[dot]com with the following details, I will follow-up with you.
- Reference this Q&A thread
- Your Azure Subscription ID
Note: Do not share any PII data as a public comment.
We will post a summarized answer once the issue is resolved.
Regards,
Gita
-
GitaraniSharma-MSFT • 49,691 Reputation points • Microsoft Employee
2023-08-28T13:07:11.4333333+00:00 @EJ Marmonti , Could you please provide an update on this post?
-
EJ Marmonti • 146 Reputation points
2023-08-28T13:43:38.13+00:00 Hello,
I will do some additional troubleshooting in our application with development to see if I can narrow this down to a potential timeout misconfiguration in the application. Once I have that information / if that doesn't solve it, I'll open a ticket through our support plan.
Thanks
-
GitaraniSharma-MSFT • 49,691 Reputation points • Microsoft Employee
2023-08-29T08:39:52.6533333+00:00 Sure @EJ Marmonti , thanks for the update. Please keep me posted on the progress. In case, you need help with a one-time free technical support, please send us an email as requested in my previous comment.
-
GitaraniSharma-MSFT • 49,691 Reputation points • Microsoft Employee
2023-09-07T15:35:31.7833333+00:00 Hello @EJ Marmonti , could you please provide an update on this issue? Is there any further progress?
-
EJ Marmonti • 146 Reputation points
2023-09-07T15:58:15.2833333+00:00 Hello, I have no updates on my side. There is a work item assigned to a developer on my end related to this, which should be completed in the next week or so. But I'm 90% certain these 499's are happening due to a misconfigured timeout in our front end application.
-
GitaraniSharma-MSFT • 49,691 Reputation points • Microsoft Employee
2023-09-07T16:14:28.45+00:00 Thank you for the update, @EJ Marmonti .
Please do post the solution once the issue is fixed, as this can be beneficial to other community members facing similar issues.
-
Balakrishnan, Vidhyasagar • 1 Reputation point
2023-12-27T18:35:05.5933333+00:00 Hi @EJ Marmonti / @GitaraniSharma-MSFT - do you have any further update on this. In our application as well we are noticing lot of 499's recorded on Frontdoor and subsequent http status code 0's on backend communication. Thanks to clarify.
-
EJ Marmonti • 146 Reputation points
2023-12-27T20:39:00.5266667+00:00 @Balakrishnan, Vidhyasagar Actually this issue has been back-burnered for me, but I think it's due to a timeout we're hitting in our application for certain http calls, and probably has less to do with Azure infrastructure.
-
Thavi Ellary • 0 Reputation points
2024-02-06T19:22:18.9033333+00:00 @EJ Marmonti Hello , trust you are well. Did you manage to find a root cause of your 499 issues ? If so please share . thank you so mcuh
Sign in to comment
1 answer
Sort by: Most helpful
-
EJ Marmonti • 146 Reputation points
2024-02-06T19:49:30.76+00:00 -
Vinay Bhoj • 0 Reputation points
2025-01-09T14:49:11.4533333+00:00 We have similar setup and facing similar issue on AFD on PAR pop. Can you please let us know what was the issue with your internal application and how was it resolved as I am facing similar issue with PAR pop.
Sign in to comment -