Troubleshoot Azure Key Vault Secrets Provider add-on in AKS

This article discusses how to troubleshoot issues that you might experience when using the Azure Key Vault Secrets Provider add-on in Azure Kubernetes Service (AKS).

Note

This article applies to the AKS managed add-on version of the Azure Key Vault Secrets Provider. If you use the helm installed (self-managed) version, go to the Azure Key Vault Provider for Secrets Store CSI Driver GitHub documentation.

Prerequisites

Troubleshooting checklist

Step 1: Confirm that Azure Key Vault Secrets Provider add-on is enabled on your cluster

Run the az aks show command to confirm that the add-on is enabled on your cluster:

az aks show -g <aks-resource-group-name> -n <aks-name> --query 'addonProfiles.azureKeyvaultSecretsProvider'

The command output should be similar to the following text:

{
  "config": null,
  "enabled": true,
  "identity": {
    "clientId": "<client-id>",
    "objectId": "<object-id>",
    "resourceId": "/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<azure-key-vault-secrets-provider-identity-name>"
  }
}

If the enabled flag is shown as false in the preceding output, the Azure Key Vault Secrets Provider add-on isn't enabled on your cluster. In this case, refer to Azure Key Vault Provider for Secrets Store CSI Driver GitHub documentation for further troubleshooting.

If the enabled flag is shown as true in the preceding output, the Azure Key Vault Secrets Provider add-on is enabled on your cluster. In this case, go to next steps in this article.

Step 2: Check the Secrets Store Provider and CSI Driver pod logs

Azure Key Vault Secrets Provider add-on logs are generated by both provider and driver pods. To troubleshoot issues that affect the provider or driver, examine the logs from the pod that's running on the same node as your application pod.

  1. Run the kubectl get command to find the Secrets Store Provider and CSI Driver pods that run on the same node that your application pod runs on:

    kubectl get pod -l 'app in (secrets-store-provider-azure, secrets-store-csi-driver)' -n kube-system -o wide
    
  2. Run the kubectl logs command to view logs from the Secrets Store Provider pod:

    kubectl logs -n kube-system <provider-pod-name> --since=1h | grep ^E
    
  3. Run the kubectl logs command to view logs from the Secrets Store CSI Driver pod:

    kubectl logs -n kube-system <csi-driver-pod-name> -c secrets-store --since=1h | grep ^E
    

Once you collect the Secrets Store Provider and CSI Driver pod logs, analyze these logs against the causes mentioned in the following sections to identify the issue and corresponding solution.

Note

If you open a support request, it's a good idea to include the relevant logs from the Azure Key Vault Provider and the Secrets Store CSI Driver.

Cause 1: Couldn't retrieve the key vault token

You might see the following error entry in the logs or event messages:

Warning FailedMount 74s kubelet MountVolume.SetUp failed for volume "secrets-store-inline" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = Unknown desc = failed to mount secrets store objects for pod default/test, err: rpc error: code = Unknown desc = failed to mount objects, error: failed to get keyvault client: failed to get key vault token: nmi response failed with status code: 404, err: <nil>

This error occurs because a Node Managed Identity (NMI) component in aad-pod-identity returned an error message about a token request.

Solution 1: Check the NMI pod logs

For more information about this error and how to resolve it, check the NMI pod logs, and refer to the Microsoft Entra pod identity troubleshooting guide.

Cause 2: The provider pod can't access the key vault instance

You might see the following error entry in the logs or event messages:

E1029 17:37:42.461313 1 server.go:54] failed to process mount request, error: keyvault.BaseClient#GetSecret: Failure sending request: StatusCode=0 -- Original Error: context deadline exceeded

This error occurs because the provider pod can't access the key vault instance. Access might be prevented for any of the following reasons:

  • A firewall rule is blocking egress traffic from the provider.

  • Network policies that are configured in the AKS cluster are blocking egress traffic.

  • The provider pods run on the host network. A failure might occur if a policy is blocking this traffic or if network jitters occur on the node.

Solution 2: Check network policies, allowlist, and node connection

To fix the issue, take the following actions:

  • Put the provider pods on the allowlist.

  • Check for policies that are configured to block traffic.

  • Make sure that the node has connectivity to Microsoft Entra ID and your key vault.

To test the connectivity to your Azure key vault from the pod that's running on the host network, follow these steps:

  1. Create the pod:

    cat <<EOF | kubectl apply --filename -
    apiVersion: v1
    kind: Pod
    metadata:
      name: curl
    spec:
      hostNetwork: true
      containers:
      - args:
        - tail
        - -f
        - /dev/null
        image: curlimages/curl:7.75.0
        name: curl
      dnsPolicy: ClusterFirst
      restartPolicy: Always
    EOF
    
  2. Run kubectl exec to run a command in the pod that you created:

    kubectl exec --stdin --tty  curl -- sh
    
  3. Authenticate by using your Azure key vault:

    curl -X POST 'https://login.microsoftonline.com/<aad-tenant-id>/oauth2/v2.0/token' \
         -d 'grant_type=client_credentials&client_id=<azure-client-id>&client_secret=<azure-client-secret>&scope=https://vault.azure.net/.default'
    
  4. Try to get a secret that's already created in your Azure key vault:

    curl -X GET 'https://<key-vault-name>.vault.azure.net/secrets/<secret-name>?api-version=7.2' \
         -H "Authorization: Bearer <access-token-acquired-above>"
    

Cause 3: The user-assigned managed identity is incorrect in the SecretProviderClass custom resource

If you encounter an HTTP error code "400" instance that's accompanied by an "Identity not found" error description, the user-assigned managed identity is incorrect in your SecretProviderClass custom resource. The full response resembles the following text:

MountVolume.SetUp failed for volume "<volume-name>" :  
  rpc error:  
    code = Unknown desc = failed to mount secrets store objects for pod <namespace>/<pod>,  
    err: rpc error: code = Unknown desc = failed to mount objects,  
    error: failed to get objectType:secret, objectName:<key-vault-secret-name>, objectVersion:: azure.BearerAuthorizer#WithAuthorization:  
      Failed to refresh the Token for request to https://<key-vault-name>.vault.azure.net/secrets/<key-vault-secret-name>/?api-version=2016-10-01:  
        StatusCode=400 -- Original Error: adal: Refresh request failed.  
        Status Code = '400'.  
        Response body: {"error":"invalid_request","error_description":"Identity not found"}  
        Endpoint http://169.254.169.254/metadata/identity/oauth2/token?api-version=2018-02-01&client_id=<userAssignedIdentityID>&resource=https%!!(MISSING)A(MISSING)%!!(MISSING)F(MISSING)%!!(MISSING)F(MISSING)vault.azure.net

Solution 3: Update SecretProviderClass by using the correct userAssignedIdentityID value

Find the correct user-assigned managed identity, and then update the SecretProviderClass custom resource to specify the correct value in the userAssignedIdentityID parameter. To find the correct user-assigned managed identity, run the following az aks show command in Azure CLI:

az aks show --resource-group <resource-group-name> \
    --name <cluster-name> \
    --query addonProfiles.azureKeyvaultSecretsProvider.identity.clientId \
    --output tsv

For information about how to set up a SecretProviderClass custom resource in YAML format, see the Use a user-assigned managed identity section of the Provide an identity to access the Azure Key Vault Provider for Secrets Store CSI Driver article.

Cause 4: The Key Vault private endpoint is on a different virtual network than the AKS nodes

Public network access isn't allowed at the Azure Key Vault level, and the connectivity between AKS and Key Vault is made through a private link. However, the AKS nodes and the private endpoint of the Key Vault are on different virtual networks. This scenario generates a message that resembles the following text:

MountVolume.SetUp failed for volume "<volume>" :  
  rpc error:  
    code = Unknown desc = failed to mount secrets store objects for pod <namespace>/<pod>,  
    err: rpc error: code = Unknown desc = failed to mount objects,  
    error: failed to get objectType:secret, objectName: :<key-vault-secret-name>, objectVersion:: keyvault.BaseClient#GetSecret:  
      Failure responding to request:  
        StatusCode=403 -- Original Error: autorest/azure: Service returned an error.  
        Status=403 Code="Forbidden"  
        Message="Public network access is disabled and request is not from a trusted service nor via an approved private link.\r\n  
        Caller: appid=<application-id>;oid=<object-id>;iss=https://sts.windows.net/<id>/;xms_mirid=/subscriptions/<subscription-id>/resourcegroups/<aks-infrastructure-resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/aks-<nodepool-name>-<nodepool-id>-vmss;xms_az_rid=/subscriptions/<subscription-id>/resourcegroups/<aks-infrastructure-resource-group>/providers/Microsoft.Compute/virtualMachineScaleSets/aks-<nodepool-name>-<nodepool-id>-vmss \r\n  
        Vault: <keyvaultname>;location=<location>" InnerError={"code":"ForbiddenByConnection"}

Fixing the connectivity issue is generally a two-step process:

These steps are described in more detail in the following sections.

Connect to the AKS cluster nodes to determine whether the fully qualified domain name (FQDN) of the Key Vault is resolved through a public IP address or a private IP address. If you receive the "Public network access is disabled and request is not from a trusted service nor via an approved private link" error message, the Key Vault endpoint is probably resolved through a public IP address. To check for this scenario, run the nslookup command:

nslookup <key-vault-name>.vault.azure.net

If the FQDN is resolved through a public IP address, the command output resembles the following text:

root@aks-<nodepool-name>-<nodepool-id>-vmss<scale-set-instance>:/# nslookup <key-vault-name>.vault.azure.net
Server:         168.63.129.16
Address:        168.63.129.16#53

Non-authoritative answer:
<key-vault-name>.vault.azure.net  canonical name = <key-vault-name>.privatelink.vaultcore.azure.net.
<key-vault-name>.privatelink.vaultcore.azure.net  canonical name = data-prod.weu.vaultcore.azure.net.
data-prod-weu.vaultcore.azure.net  canonical name = data-prod-weu-region.vaultcore.azure.net.
data-prod-weu-region.vaultcore.azure.net  canonical name = azkms-prod-weu-b.westeurope.cloudapp.azure.com.
Name:   azkms-prod-weu-b.westeurope.cloudapp.azure.com
Address: 20.1.2.3

In this case, create a virtual network link for the virtual network of the AKS cluster at the private DNS zone level. (A virtual network link is already created automatically for the virtual network of the Key Vault private endpoint.)

To create the virtual network link, follow these steps:

  1. In the Azure portal, search for and select Private DNS zones.

  2. In the list of private DNS zones, select the name of your private DNS zone. In this example, the private DNS zone is privatelink.vaultcore.azure.net.

  3. In the navigation pane of the private DNS zone, locate the Settings heading, and then select Virtual network links.

  4. In the list of virtual network links, select Add.

  5. In the Add virtual network link page, complete the following fields.

    Field name Action
    Link name Enter a name to use for the virtual network link.
    Subscription Select the name of the subscription that you want to contain the virtual network link.
    Virtual network Select the name of the virtual network of the AKS cluster.
  6. Select the OK button.

After you finish the link creation procedure, run the nslookup command. The output should now resemble the following text that shows a more direct DNS resolution:

root@aks-<nodepool-name>-<nodepool-id>-vmss<scale-set-instance>:/# nslookup <key-vault-name>.vault.azure.net
Server:         168.63.129.16
Address:        168.63.129.16#53

Non-authoritative answer:
<key-vault-name>.vault.azure.net  canonical name = <key-vault-name>.privatelink.vaultcore.azure.net.
Name:   <key-vault-name>.privatelink.vaultcore.azure.net
Address: 172.20.0.4

After the virtual network link is added, the FQDN should be resolvable through a private IP address.

Step 2: Add virtual network peering between virtual networks

If you're using a private endpoint, you've probably disabled public access at the Key Vault level. Therefore, no connectivity exists between AKS and the Key Vault. You can test that configuration by using the following Netcat (nc) command:

nc -v -w 2 <key-vault-name>.vault.azure.net 443

If connectivity isn't available between AKS and the Key Vault, you see output that resembles the following text:

nc: connect to <key-vault-name>.vault.azure.net port 443 (tcp) timed out: Operation now in progress

To establish connectivity between AKS and the Key Vault, add virtual network peering between the virtual networks by following these steps:

  1. Go to the Azure portal.

  2. Use one of the following options to follow the instructions from the Create virtual network peer section of the Tutorial: Connect virtual networks with virtual network peering using the Azure portal article to peer the virtual networks and verify that the virtual networks are connected (from one end):

    • Go to your AKS virtual network, and peer it to the virtual network of the Key Vault private endpoint.

    • Go to the virtual network of the Key Vault private endpoint, and peer it to the AKS virtual network.

  3. In the Azure portal, search for and select the name of the other virtual network (the virtual network that you peered to in the previous step).

  4. In the virtual network navigation pane, locate the Settings heading, and then select Peerings.

  5. In the virtual network peering page, verify that the Name column contains the Peering link name of the Remote virtual network that you specified in step 2. Also, make sure that the Peering status column for that peering link has a value of Connected.

After you complete this procedure, you can run the Netcat command again. The DNS resolution and connectivity between AKS and the Key Vault should now succeed. Also, make sure that the Key Vault secrets are successfully mounted and work as expected, as shown by the following output:

Connection to <key-vault-name>.vault.azure.net 443 port [tcp/https] succeeded!

Solution 4b: Troubleshoot error code 403

Troubleshoot error code "403" by reviewing the HTTP 403: Insufficient Permissions section of the Azure Key Vault REST API Error Codes reference article.

Cause 5: The secrets-store.csi.k8s.io driver is missing from the list of registered CSI drivers

If you receive the following error message about a missing secrets-store.csi.k8s.io driver in the pod events, then the Secrets Store CSI Driver pods aren't running on the node in which the application is running:

Warning FailedMount 42s (x12 over 8m56s) kubelet, akswin000000 MountVolume.SetUp failed for volume "secrets-store01-inline" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers

Solution 5: Troubleshoot the Secret Store CSI Driver pod running on the same node

Retrieve the status of the Secret Store CSI Driver pod running on the same node by running the following command:

kubectl get pod -l app=secrets-store-csi-driver -n kube-system -o wide

If pod status isn't Running or any of the containers in this pod isn't in Ready state, then proceed to check the logs for this pod by following the steps in Check the Secrets Store Provider and CSI Driver pod logs.

Cause 6: SecretProviderClass not found

You might see the following event when describing your application pod:

Events:
  Type     Reason       Age               From               Message
  ----     ------       ----              ----               -------
  Warning  FailedMount  2s (x5 over 10s)  kubelet            MountVolume.SetUp failed for volume "xxxxxxx" : rpc error: code = Unknown desc = failed to get secretproviderclass xxxxxxx/xxxxxxx, error: SecretProviderClass.secrets-store.csi.x-k8s.io "xxxxxxxxxxxxx" not found

This event indicates that the SecretProviderClass referenced in your pod's volume specification doesn't exist in the same namespace as your application pod.

Solution 6a: Create the missing SecretProviderClass resource

Make sure that the SecretProviderClass resource referenced in your pod's volume specification exists in the same namespace where your application pod is running.

Solution 6b: Modify your application pod's volume specification to reference the correct SecretProviderClass resource name

Edit your application pod's volume specification to reference the correct SecretProviderClass resource name:

...
spec:
  containers:
  ...
  volumes:
    - name: my-volume
      csi:
        driver: secrets-store.csi.k8s.io
        readOnly: true
        volumeAttributes:
          secretProviderClass: "xxxxxxxxx"

Cause 7: The request is unauthenticated

The request is unauthenticated for Key Vault, as indicated by a "401" error code.

Solution 7: Troubleshoot error code 401

Troubleshoot error code "401" by reviewing the "HTTP 401: Unauthenticated Request" section of the Azure Key Vault REST API Error Codes reference article.

Cause 8: The number of requests exceeds the stated maximum

The number of requests exceeds the stated maximum for the timeframe, as indicated by a "429" error code.

Solution 8: Troubleshoot error code 429

Troubleshoot error code "429" by reviewing the "HTTP 429: Too Many Requests" section of the Azure Key Vault REST API Error Codes reference article.

Third-party information disclaimer

The third-party products that this article discusses are manufactured by companies that are independent of Microsoft. Microsoft makes no warranty, implied or otherwise, about the performance or reliability of these products.

Contact us for help

If you have questions or need help, create a support request, or ask Azure community support. You can also submit product feedback to Azure feedback community.