Hi All,
We have experienced issues where FunctionApps hosting in Azure can be quite unreliable, and sometimes go offline after operating for a while, in some cases for years.
On the 18th of February at 21:20, after a standard 'warmup' operation one of ours went offline giving 500 status to all further requests (whether using valid or invalid keys). The end-to-end transaction logs show a standard warmup, followed by a load of 'null bindings':
18/02/2025, 21:40:51
-Trace
Stopped the listener 'Microsoft.Azure.WebJobs.Extensions.Http.HttpTriggerAttributeBindingProvider+HttpTriggerBinding+NullListener' for function '<somename1>'
Severity level: Information
18/02/2025, 21:40:51
-Trace
Stopping the listener 'Microsoft.Azure.WebJobs.Extensions.Http.HttpTriggerAttributeBindingProvider+HttpTriggerBinding+NullListener' for function '<somename1>'
Severity level: Information
18/02/2025, 21:40:51
-Trace
Stopped the listener 'Microsoft.Azure.WebJobs.Extensions.Http.HttpTriggerAttributeBindingProvider+HttpTriggerBinding+NullListener' for function '<somename2>'
Severity level: Information
18/02/2025, 21:40:51
-Trace
Stopping the listener 'Microsoft.Azure.WebJobs.Extensions.Http.HttpTriggerAttributeBindingProvider+HttpTriggerBinding+NullListener' for function '<somename2>'
Severity level: Information
From this point forward, all requests give 500 status errors (regardless of if the keys are correct or not to access the function).
No access attempt ever gets logged, beyond housekeeping that takes place automatically (i.e. no attempt to access the FunctionApp is logged in the end-to-end transaction logs or any the other logs via Application Insights).
Indeed it's my belief, that no requests even reach the FunctionApp at all due to Azure itself initially throwing a 500 error being unable to delegate the request after handling the initial incoming one, though I can't yet prove this.
Further investigation shows the worker process is still online, and working fine, and that the functions in code are correctly found on startup.
The FunctionApp runs on an S1 App Service Plan tier, with plenty of resource, with "Always On" turned on using Managed Identity access to the storage account.
No changes have been made to this app since the last deployment on 2024-07-30T09:11:07.5666953Z
No configuration changes have been made by ourselves during this time either
This isn't the first time this has happened, but the first time I've been able to see in detail the automatic failure take place at Azure's end.
To this day, the app does not respond to any external requests, but I believe this is due to the binding issue - in theory it is otherwise all working fine.
We deployed the same code to a new app and it is working fine, but I've kept the collapsed one for investigation as I need to be able to understand why these would suddenly go offline, so I can avoid this happening to the current version (this is actually the 3rd time it has happened to this particular project, though I have witnessed it happening to others).
Anyone have any suggestions?