Azure 502 Bad Gateway Issue
I use an application gateway with WAF setup to run our web application deployed in a single Azure VM.
When I access the application through App GW from the browser, I sometimes get a 502 Bad Gateway error. The App GW health probe responds with, "Cannot connect to backend server. Check whether any NSG/UDR/Firewall is blocking access to the server. Check if application is running on correct port."
This issue does not always occur. Whenever I hit the server multiple times, this issue occurs, and it will fix itself automatically after some time or if I clear the browser cache.
In App GW log, I get sometime "error_info_s: ERRORINFO_UPSTREAM_NO_LIVE" or "ERRORINFO_UPSTREAM_CLOSED_CONNECTION".
Is this behaviour of App GW or any solution to fix the issue.? Appreciate your suggestions or sharing your experiences.
Azure Application Gateway
-
Sai Prasanna Sinde • 3,585 Reputation points • Microsoft Vendor
2025-01-15T22:32:06.86+00:00 Welcome to the Microsoft Q&A Platform! Thank you for asking your question here.
Please go through the below points:
- Your VM might be intermittently overloaded (like high CPU, memory, or network usage), making it unable to respond to requests from the application gateway health probes or user traffic in a timely manner.
- Please use Azure Monitor to track CPU, memory, network, and disk I/O on your VM. Look for spikes or sustained high usage around the times when the 502 errors occur. For your reference: https://learn.microsoft.com/en-us/azure/virtual-machines/monitor-vm#:~:text=Simplified%20onboarding%20of%20the%20Azure%20Monitor%20agent%20and%20the%20Dependency%20agent%2C%20so%20that%20you%20can%20monitor%20a%20virtual%20machine%20(VM)%20guest%20operating%20system%20and%20workloads.
- The web application running on your VM might be restarting intermittently, causing temporary unavailability. Please try to review your web application's logs for errors or exceptions that might indicate crashes.
- The backend server closes connections, but the application gateway still sends traffic, make sure that the backend server's keep-alive timeout is greater than application gateway idle timeout.
- Try to review the NSG rules associated with both your application gateway subnet and your VM's subnet. Ensure there are rules allowing inbound traffic on the ports used by your application (like 80 or 443) from the application gateway IP range.
- Verify that your UDRs are not directing traffic away from your VM. Ensure that traffic destined for your VM's IP address is routed correctly.
- If you're using the VM's firewall, ensure that there's a rule allowing inbound traffic on the required ports from the application gateway IP address range
- Ensure the health probe interval is not too short (30 sec is common). A very short interval can overwhelm a busy server.
- Make sure the timeout is appropriate (like 30 sec). It should be long enough for the server to respond even under load.
- An appropriate threshold prevents flapping between healthy and unhealthy states. Verify the port used in your backend settings matches the port your application is listening on within the VM.
- If you are enabled the connection draining, be sure to configure an appropriate timeout for connection draining.
- Analyze your application gateway WAF logs to see if any requests are being blocked around the time of the 502 errors.
- Make sure that your DNS records are correctly configured and that the DNS servers you're using are reliable.
- As you mentioned the issue resolves itself or with a browser cache clear, it might be a temporary network glitches or routing problems within Azure's infrastructure could be causing the connection failures.
- Enable diagnostic logging for your application gateway and send the logs to a Log Analytics workspace. This will give you detailed insights into health probe failures, request routing, and WAF activity.
- Use Network Watcher's connection troubleshoot feature to diagnose network connectivity issues between your application gateway and VM. For your reference: https://learn.microsoft.com/en-us/azure/network-watcher/connection-troubleshoot-portal#:~:text=In%20this%20article%2C%20you%20learn%20how%20to%20use%20the%20connection%20troubleshoot%20feature%20of%20Azure%20Network%20Watcher%20to%20diagnose%20and%20troubleshoot%20connectivity%20issues.%20For%20more%20information%20about%20connection%20troubleshoot%2C%20see%20Connection%20troubleshoot%20overview.
Kindly let us know if the above helps or you need further assistance on this issue.
Thanks,
Sai.
-
Sai Prasanna Sinde • 3,585 Reputation points • Microsoft Vendor
2025-01-16T18:54:39.1933333+00:00 Hi @Mohammed Shafi,
Greetings of the day!
Just checking in to see if you had a chance to see my response to your question. Please tell us if it was helpful and feel free to reach out to us if you have any queries.
Thanks,
Sai. -
Sai Prasanna Sinde • 3,585 Reputation points • Microsoft Vendor
2025-01-17T17:55:41.6566667+00:00 Hi @Mohammed Shafi,
Hope you are having a great day.
Just checking in to see if you have got a chance to see my response to your question in resolving the issue.
If you are still facing any further issues, please don't hesitate to reach out to us. We are happy to assist you.
Looking forward to your response and appreciate your time on this.
Cheers,
Sai.
-
Sai Prasanna Sinde • 3,585 Reputation points • Microsoft Vendor
2025-01-27T01:47:11.85+00:00 Hi @Mohammed Shafi,
Hope you are having a great day.
I wanted to check if you have had the chance to review the answer posted above and if it is helpful in resolving your issue.
If it was helpful, please click "Upvote and Accept Answer" on this post to let us know.
We're here to help, so if you have any further questions, don't hesitate to ask.
Thanks,
Sai.
-
Mohammed Shafi • 20 Reputation points
2025-01-27T15:52:49.7066667+00:00 Hi Sai.
Sorry for my late response. Still i could not find a solutions.
Please see my answers in Bold and Italic text, and please let me know your thoughts and suggestions to fix the issues.
Thanks in Advance.
- Your VM might be intermittently overloaded (like high CPU, memory, or network usage), making it unable to respond to requests from the application gateway health probes or user traffic in a timely manner. I checked VM Insghits, no high CPU, Memory, Network usage..But when client request comes frequenltly through the gateway, VM getting unreachaable.
- Please use Azure Monitor to track CPU, memory, network, and disk I/O on your VM. Look for spikes or sustained high usage around the times when the 502 errors occur. For your reference: https://learn.microsoft.com/en-us/azure/virtual-machines/monitor-vm#:~:text=Simplified%20onboarding%20of%20the%20Azure%20Monitor%20agent%20and%20the%20Dependency%20agent%2C%20so%20that%20you%20can%20monitor%20a%20virtual%20machine%20(VM)%20guest%20operating%20system%20and%20workloads. Enabled already, But not getting a solution for our issue.
- The web application running on your VM might be restarting intermittently, causing temporary unavailability. Please try to review your web application's logs for errors or exceptions that might indicate crashes. No chance, we are using the same application in another azure VM without gateway setup and not having any issue if we access the app direcly without gateway and this issue comes when app access through the gateway.
- The backend server closes connections, but the application gateway still sends traffic, make sure that the backend server's keep-alive timeout is greater than application gateway idle timeout. In application gateway backend setrings i extended Request time out to 220 seconds. and we are using HTTP/2 connections to the frontend IP address on Application Gateway v2 SKU, the idle timeout is set to 180 seconds and is nonconfigurable.
- Try to review the NSG rules associated with both your application gateway subnet and your VM's subnet. Ensure there are rules allowing inbound traffic on the ports used by your application (like 80 or 443) from the application gateway IP range. No NSG configured to application gateway and for VM we are using NSG, port 80 and 443 allowed in NSG Inbound Rule
- Verify that your UDRs are not directing traffic away from your VM. Ensure that traffic destined for your VM's IP address is routed correctly. We are not using any UDR, using system default routing.
- If you're using the VM's firewall, ensure that there's a rule allowing inbound traffic on the required ports from the application gateway IP address range Even we disabled the VM OS firewall, this issue comes.
- Ensure the health probe interval is not too short (30 sec is common). A very short interval can overwhelm a busy server. I tested deafult 30 sec health probe and tested custom probe (60 Sec), but no luck
- Make sure the timeout is appropriate (like 30 sec). It should be long enough for the server to respond even under load. Tested with 60 sec also, But how much we can give maximum here?
- An appropriate threshold prevents flapping between healthy and unhealthy states. Verify the port used in your backend settings matches the port your application is listening on within the VM. In app gateay backend we enabled port 443 and the listener we created for 443 only.
- If you are enabled the connection draining, be sure to configure an appropriate timeout for connection draining.. Not enabled connection draining.
- Analyze your application gateway WAF logs to see if any requests are being blocked around the time of the 502 errors. Enabled already, in the log we get error like ERRORINFO_UPSTREAM_NO_LIVE*" or "ERRORINFO_UPSTREAM_CLOSED_CONNECTION". B*ut this 502 error comes even we dsabled the WAF also.
- Make sure that your DNS records are correctly configured and that the DNS servers you're using are reliable. DNS configured correctly and we are using the same DNS server for other applcation server also.
- As you mentioned the issue resolves itself or with a browser cache clear, it might be a temporary network glitches or routing problems within Azure's infrastructure could be causing the connection failures. Sorry It is not like that, even i close and reopen the browser not getting fixed. It is automatically getting fixed after a while.
- Enable diagnostic logging for your application gateway and send the logs to a Log Analytics workspace. This will give you detailed insights into health probe failures, request routing, and WAF activity. Enabled already, But no solution for our issue.
- Use Network Watcher's connection troubleshoot feature to diagnose network connectivity issues between your application gateway and VM. For your reference: https://learn.microsoft.com/en-us/azure/network-watcher/connection-troubleshoot-portal#:~:text=In%20this%20article%2C%20you%20learn%20how%20to%20use%20the%20connection%20troubleshoot%20feature%20of%20Azure%20Network%20Watcher%20to%20diagnose%20and%20troubleshoot%20connectivity%20issues.%20For%20more%20information%20about%20connection%20troubleshoot%2C%20see%20Connection%20troubleshoot%20overview. We are using the application gateway connection troubleshoot to get the connection details, During testing i am getting troubelshoot details as an attached image 1 and during the 502 error am getting as image 2 Image 1 - No Error.png Image 2 - 502 Error Troubleshoot 1.png Image 2 - 502 Error Troubleshoot 2.png
-
BolliVinayKumar • 0 Reputation points • Microsoft Vendor
2025-01-28T10:06:27.55+00:00 Hi Mohammed Shafi,
Thank you for reaching out & I hope you are doing well.
Seems like you have gone through lot of troubleshooting and still not able to figure out the issue.
Can you share more info on the point you mentioned we are using the same application in another azure VM without gateway setup what does that mean? Are you referring to access the same application from another VM or you mention to access different application with same configuration and also please confirm the use of any corporate network you access from.
-
Mohammed Shafi • 20 Reputation points
2025-01-28T10:25:53.49+00:00 Hi @BolliVinayKumar ,Thank you for your enquiries.
Yes, we are using a different VM with the same size like VM, Disc, etc. and we set up that environment without an application gateway and deployed the same web application and no issue at all for accessing an application.
But this bad gateway 502 error occurs when we access the application through the application gateway set-up.
-
BolliVinayKumar • 0 Reputation points • Microsoft Vendor
2025-01-29T17:42:50.0766667+00:00 Thank you for reaching out.
As per the insights from you only faces the issue when you use gateway for the app service.
Along with the steps you tried please try the steps mentioned below
Try access into the application logs on the VM to see if there are any patterns or errors that correlate with the 502 errors. Sometimes, application-level issues can manifest as gateway errors.
Use static IPs for the backend pool instead of relying on DNS resolution. If DNS is required, ensure the DNS server is highly available and responsive
Check with SSL/TLS because sometimes mismatched or outdated SSL/TLS cipher suites between the Application Gateway and the backend server can cause connection failures.
-
KapilAnanth-MSFT • 48,576 Reputation points • Microsoft Employee
2025-01-31T08:19:21.06+00:00 Welcome to the Microsoft Q&A Platform. Thank you for reaching out & I hope you are doing well.
Per your discussion with Sai Prasanna Sinde and BolliVinayKumar , I take it that
- You have a VM acting as backend of Application Gateway
- You are receiving 502 Bad Gateway intermittently
- The App gateway access logs, and you see errors "ERRORINFO_UPSTREAM_NO_LIVE" and "ERRORINFO_UPSTREAM_CLOSED_CONNECTION" (which belong to App Gw 5XX Errors)
- You informed NSG Rules are allowed and there is no UDR configured, issue remains even with Local OS Firewall is disabled.
Next Steps,
- Did you check the application logs from the VM's application at the time of the issue?
- Were you able to see the HTTP health probe requests coming from App Gateway at the time of the issue?
- If not, I suggest you enable them and see if the Application responds to incoming App Gw Health probe requests at the time of the issue
- You can also do a circular packet capture on the backend VM and check the log file pertaining to the issue timestamp
- Can you please share the entire error log(JSON) from App gateway access logs where the response is 502.
- Intermittent issues mostly indicate that there is an issue related to the backend (either at VM or OS Level)
- Note : Request timeout maximum to private backend is 24 hours, however, I would recommend you to fine tune it according to your application
- Verify if there are some requests that require 5 minutes to give a response - in this case you can get the timeout to 7 minutes (with 2 min as buffer)
- Similarly, you can arrive at a timeout that matches your use case
Kindly let us know what the backend application logs say at the time of the issue
Thanks,
Kapil
Sign in to comment