How to access secrets from Key Vault from Synapse Spark Job Definition PySpark file

manigandan 0 Reputation points
2025-03-04T07:32:09.6066667+00:00

Hello Everyone

I'm trying to read the data source credentials as a secrets from the Key Vault using the Python SDK on a PySpark Job. I've gone through many articles and used different ways to read the secrets from Key Vault, but nothing worked out. Getting error on while calling the get_secret("secret_name").value() function.

Code used to retrieve secrets:

import os
from azure.keyvault.secrets import SecretClient
from azure.identity import DefaultAzureCredential

keyVaultName = os.environ["KEY_VAULT_NAME"]
KVUri = f"https://{keyVaultName}.vault.azure.net"

credential = DefaultAzureCredential()
client = SecretClient(vault_url=KVUri, credential=credential)

secretName = "my_secret_name"

print(f"Retrieving your secret from {keyVaultName}.")

retrieved_secret = client.get_secret(secretName)

print(f"Your secret is '{retrieved_secret.value}'.")

print(" done.")

Received Error:

DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
	AzureCliCredential: Azure CLI not found on path
	AzurePowerShellCredential: PowerShell is not installed
	AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then,once installed, authenticate to your Azure account using 'azd auth login'.
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.
An error occurred: DefaultAzureCredential failed to retrieve a token from the included credentials.
Attempted credentials:
	EnvironmentCredential: EnvironmentCredential authentication unavailable. Environment variables are not fully configured.
Visit https://aka.ms/azsdk/python/identity/environmentcredential/troubleshoot to troubleshoot this issue.
	ManagedIdentityCredential: ManagedIdentityCredential authentication unavailable, no response from the IMDS endpoint.
	SharedTokenCacheCredential: SharedTokenCacheCredential authentication unavailable. No accounts were found in the cache.
	AzureCliCredential: Azure CLI not found on path
	AzurePowerShellCredential: PowerShell is not installed
	AzureDeveloperCliCredential: Azure Developer CLI could not be found. Please visit https://aka.ms/azure-dev for installation instructions and then,once installed, authenticate to your Azure account using 'azd auth login'.
To mitigate this issue, please refer to the troubleshooting guidelines here at https://aka.ms/azsdk/python/identity/defaultazurecredential/troubleshoot.
---------------------------------------------------------------------------

Access configuration:

  • Azure Synapse Managed Identity is given Azure Key Vault Admin, Key Vault Secret User, Key Vault Secret Officer level access.
  • Linked service is created on Synapse for the given Azure Key Vault where secrets are stored.

Python: 3.10+

Spark Version: 3.4+

Appreciate your help on this.

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,233 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Chandra Boorla 9,985 Reputation points Microsoft External Staff
    2025-03-04T17:33:43.78+00:00

    Hi @manigandan

    Thank you for posting you query!

    As I understand that you are facing an issue while accessing Azure Key Vault secrets from a Synapse Spark job using PySpark, the error you're encountering indicates a problem with the authentication process. Issue occurs because DefaultAzureCredential() is unable to authenticate, likely due to Managed Identity not being properly used or missing permissions on Key Vault.

    Here are the steps to troubleshoot and resolve this issue:

    Verify Managed Identity Configuration - Ensure that Azure Synapse's Managed Identity has the necessary permissions to access the Key Vault. You mentioned that the managed identity has been given roles such as Key Vault Admin, Key Vault Secret User, and Key Vault Secret Officer, which should generally suffice. Confirm that the Managed Identity is enabled for your Synapse workspace.

    Use Managed Identity for Authentication - Since you are using Synapse, leveraging Managed Identity is usually the simplest and most secure way to authenticate. Ensure that your Synapse Spark environment is configured to allow this. You might need to explicitly specify the use of ManagedIdentityCredential if DefaultAzureCredential does not automatically detect it.

    from azure.identity import ManagedIdentityCredential
    credential = ManagedIdentityCredential()
    

    Here’s a modified version of your code that explicitly uses ManagedIdentityCredential:

    import os
    from azure.keyvault.secrets import SecretClient
    from azure.identity import ManagedIdentityCredential
    
    # Get the Key Vault name from environment variables
    keyVaultName = os.environ["KEY_VAULT_NAME"]
    KVUri = f"https://{keyVaultName}.vault.azure.net"
    
    # Use Managed Identity for authentication
    credential = ManagedIdentityCredential()
    client = SecretClient(vault_url=KVUri, credential=credential)
    
    secretName = "my_secret_name"
    
    print(f"Retrieving your secret from {keyVaultName}.")
    
    try:
        retrieved_secret = client.get_secret(secretName)
        print(f"Your secret is '{retrieved_secret.value}'.")
    except Exception as e:
        print(f"An error occurred: {e}")
    
    print(" done.")
    
    

    Environment Variables Issue - The error mentions missing AZURE_CLIENT_ID, AZURE_TENANT_ID, and AZURE_CLIENT_SECRET, but these are not needed for Managed Identity. Ensure KEY_VAULT_NAME is correctly set in your Synapse Spark environment.

    For more details, please refer to the following similar threads that may provide useful insights.

    I hope this information helps. Please do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.