Volume Snapshots Fail on AKS with 502 and/or timeout or HTML error page

Jeff Mealo 0 Reputation points
2025-02-13T20:17:49.34+00:00

Volume snapshots are failing repeatedly, sometimes with 502 or HTML-based error pages from Azure, here's an example:

  Warning  Error           6m24s  cloudnative-pg-backup  snapshot backup failed: Failed to create snapshot: failed to take snapshot of the volume /subscriptions/19ea338e-67f7-4a2f-a165-6ccdb9c1aecb/resourceGroups/mc_application_production-app-cluster_eastus/providers/Microsoft.Compute/disks/pvc-37f06647-f7f6-4838-bf05-2a2f1e054a63: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,292 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Markapuram Sudheer Reddy 990 Reputation points Microsoft Vendor
    2025-02-24T21:30:15.34+00:00

    Hi Jeff Mealo,

    Thank You for the response.

    Please check the api server logs errors if there are issues with the CSI driver's interaction with the API server; kubectl logs -f <kube-apiserver-pod-name> -n kube-system

    Try to inspect snapshot controller logs for any errors related to snapshot operations:

    kubectl get pods -n kube-system | grep snapshot-controller
    kubectl logs -f <snapshot-controller-pod-name> -n kube-system
    
    

    Please check below documentation for more information- https://github.com/kubernetes-csi/external-snapshotter/issues/346

    If you have any queries, please do let us know, we will help you. If you find any logs, please share here, it will help us to investigate further.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.