Error occurred when using ODBC driver with Workload Identity to connect to Azure Databricks Workspace from an AKS Pod deployment

Dragos Ionita 0 Reputation points
2024-11-04T15:55:37.3633333+00:00

Hello everybody.

As per the title, I am trying to achieve the followings: Connect to an Azure Databricks Workspace, in order to execute an SQL Query, from an Azure Function running on an AKS Pod. Pod deployment is made using CI/CD pipelines with Terraform and Helm charts.

Steps that I made so far and tutorials that I used:

  1. Constructed DSN-less connection string like in this docs: https://learn.microsoft.com/en-us/azure/databricks/integrations/odbc/authentication#authentication-azure-mi, where Auth_Client_ID is Client ID of the Managed Identity and Azure_workspace_resource_id is provided from Databricks Admin Workspace.
  2. Made Step 1, Step 2, Step 3, Step 4 from this docs: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/azure-mi-auth for creating User Managed Identity and assigning it to the Databricks Account & Workspace + assign necessary roles to the Managed Identity for accessing Databricks Workspace.
  3. From the first part of this youtube video from Microsoft AKS Team: https://www.youtube.com/watch?v=i2GobU0Wg48, I enabled Workload Identity and created Service Account on the AKS Pod, created Federated Credential that pairs Managed Identity and Service Account with the Kubernetes Cluster OIDC Issuer URL (needed for enabling and using Workload Identity on AKS Pod).
  4. Deployed on TEST environment with those configurations above, resources were successfully created and Azure Function correctly deployed. Tested the ODBC connection to Databricks with the connection string from point 1 above, using OdbcConnection and OdbcCommand classes from System.Data.Odbc namespace:
            var connString = <dsn-less-conn-string>; // injected at runtime from KeyVault secrets, from point 1 above.
       
       
            using var connection = new OdbcConnection(connString);
       
       
            connection.Open();
       
       
            using var command = new OdbcCommand("select * from my_namespace.my_table", connection);
       
       
            using var reader = await command.ExecuteReaderAsync();
       ```1. Received **ERROR**: _**[Simba][DriverOAuthSupport] (8720) Can't get AAD token for managed identity: invalid_request: Identity not found.**_
    
    

Tried several approaches constructing the DSN-less connection string from point 1 above. Used Client ID from Managed Identity, as well as principal ID and resource ID. All approaches gave the same error as above.

Also consulted ChatGPT several times, all the times chatGPT told that the configuration I made is correct and should work, but the error remained the same everytime I tested.

I am suspecting there might be the possibility that ODBC Driver for Databricks is NOT compatible with Workload identity for AKS Pods (I also saw a thread of discussions regarding this on Microsoft Q&A forum, someone said that perhaps ODBC driver can't connect to Databricks from Docker containers using Managed Identities), but I am not sure of this because the resources and discussions about this on the Internet are very scarse or outdated.

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,125 questions
ASP.NET Core
ASP.NET Core
A set of technologies in the .NET Framework for building web applications and XML web services.
4,618 questions
Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,221 questions
Azure Kubernetes Service (AKS)
Azure Kubernetes Service (AKS)
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,155 questions
Microsoft Entra ID
Microsoft Entra ID
A Microsoft Entra identity service that provides identity management and access control capabilities. Replaces Azure Active Directory.
22,155 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,011 Reputation points
    2024-11-04T17:57:40.5933333+00:00

    Hello Dragos Ionita,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having error when using ODBC driver with Workload Identity to connect to Azure Databricks Workspace from an AKS Pod deployment.

    The error shows that you can't get AAD token for managed identity: invalid_request: Identity not found`) typically arises when the Azure Managed Identity cannot be authenticated in the context of your Azure Kubernetes Service (AKS) setup, particularly when using Workload Identity.

    1. One way to start troubleshooting is by isolation, isolate the issue to confirm that the managed identity is accessible and that tokens can be issued for it from within the AKS pod by using the az account get-access-token command within the AKS pod to get an access token for the managed identity. If this command fails, the issue is likely with the identity's assignment or permissions. Then, check the service account and pod identity logs for any federated identity issues, especially around the Identity not found error - https://learn.microsoft.com/en-us/azure/aks/troubleshoot-managed-identity
    2. With all you've tried, if the federated credential on the managed identity includes the correct audience (typically the AKS OIDC issuer URL) and subject (which should match the Kubernetes service account you created for the Azure Function in the AKS namespace), and confirm that the Kubernetes service account is annotated correctly with the managed identity client ID such as:
              metadata:
                annotations:
                  azure.workload.identity/client-id: "<your-managed-identity-client-id>"
      
    3. Your DSN-less connection string is critical. Double-check the format and fields you’re passing based on the Databricks documentation. This is a generic format that should work:
         var connString = "Driver={Simba Spark ODBC Driver};Server=<databricks-server-url>;HTTPPath=<http-path>;Auth_Client_ID=<client-id>;Auth_Resource=<resource-id>;Auth_Access_Token=<access-token>";
      
    4. Also, the managed identity must have adequate permissions in Databricks and Azure to access resources such as Contributor or Reader roles in the Databricks workspace and Reader role or any necessary roles in the resource group that houses your Databricks instance - https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/users

    NOTE:

    • Some users have reported challenges with the Databricks ODBC driver and managed identities in containerized environments. While there isn’t official documentation confirming compatibility issues, consider testing a similar setup without Workload Identity as a workaround (such as directly in Azure Functions or using a token-based connection) to see if managed identity access is successful outside AKS.
    • Try using Azure Service Principal if the issue persists, consider switching temporarily to an Azure Service Principal for authentication. This can help rule out potential limitations with the managed identity + workload identity configuration in AKS.
    • If these troubleshooting steps still don’t resolve the issue, this may indeed indicate a limitation with using managed identity + workload identity in AKS specifically with the Databricks ODBC driver. Then, contact Azure Support via your Azure Portal.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.