WebSocket connections via Application Gateway Ingress Controller reset on Horizontal Pod Scaling events
We have an AKS cluster where we have implemented WebSocket services. We need clients to stay connected to our services and have utilized Azure Application Gateway Ingress Controller for load balancing needs. However, we have observed that any (horizontal pod) scaling operation (scale-in and scale-out) results in all current WebSocket connections being dropped off. Our understanding is that any change in the back-end pool results in this behaviour. (There is a GitHub issue referring to the same problem). We expect many active clients (which is expected to increase to many 100k connection), and while the scaling capabilities of Azure Application Gateway fill our needs, this issue which is undocumented, creates a problem when guaranteeing high availability.
- Is there any plan to address this issue within Azure Application Gateway?
- We have considered using Azure Load Balancer instances instead. Would a change in the backend pool for the Load Balancer not impact current connections? And what is the scaling limit or capacity for Azure Load Balancer when connected to an AKS cluster?
- Is there any other architecture that we should be considering?