Getting TokenNotFoundInConfigurationException error when uploading a Python wheel as workspace package in Synpase

manigandan 0 Reputation points
2024-12-25T05:25:33.01+00:00

Hello Everyone

I'm using Azure Synapse to run my Spark pipelines. I've build my pipeline to use a custom library code from utils package of the same project to perform some operation. To access the utils.pipeline file, have packaged my entire project in a Python wheel and uploaded as a workspace package to the Spark serverless pool. When I tried to import using ***from utils.pipeline import **** statement, am getting an error as No 'utils' module found.

Any help would be really appreciated. Below are the details of the used configurations.

Project Hierarchy:


-code_repo

--src

---xyz

----main.py (Entry file)

---utils

----pipeline.py (Reusable functions are available)

-setup.py

-requirements.txt

setup.py -Used to package the project files.


from setuptools import setup, find_packages

setup(

    name="Data_Project",

    version="0.1.0",

    author="Maniganda",

    packages=find_packages(),

    include_package_data=True,

    description='Data Engineering Project'

)

Spark Pool:


Spark version: 3.4

Python: 3.10

Error on executing the Spark pipeline:


2024-12-24 20:55:40,730 INFO SignalUtils [main]: Registering signal handler for TERM

2024-12-24 20:55:41,249 INFO SignalUtils [main]: Registering signal handler for HUP

2024-12-24 20:55:41,250 INFO SignalUtils [main]: Registering signal handler for INT

2024-12-24 20:55:42,200 WARN NativeCodeLoader [AsyncAppender-Dispatcher-Thread-3]: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

2024-12-24 20:55:42,438 INFO ApplicationMaster [main]: ApplicationAttemptId: appattempt_1735073652210_0001_000001

2024-12-24 20:55:43,858 WARN MetricsConfig [main]: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties

2024-12-24 20:55:43,873 INFO MetricsSystemImpl [main]: Scheduled Metric snapshot period at 10 second(s).

2024-12-24 20:55:43,873 INFO MetricsSystemImpl [main]: azure-file-system metrics system started

2024-12-24 20:55:44,218 INFO ApplicationMaster [main]: Starting the user application in a separate Thread

2024-12-24 20:55:44,233 INFO ApplicationMaster [main]: Waiting for spark context initialization...

2024-12-24 20:55:44,381 INFO PythonRunner$ [Driver]: Initialized PythonRunnerOutputStream plugin org.apache.spark.microsoft.tools.api.plugin.MSToolsPythonRunnerOutputStreamPlugin.

2024-12-24 20:55:52,826 ERROR ApplicationMaster [Driver]: User application exited with status 1, error msg: Traceback (most recent call last):

  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1735073652210_0001/container_1735073652210_0001_01_000001/annotation.py", line 18, in <module>

    from utils.pipeline import *

ModuleNotFoundError: No module named 'utils'

2024-12-24 20:55:52,831 INFO ApplicationMaster [Driver]: Final app status: FAILED, exitCode: 13, (reason: User application exited with status 1, error msg: Traceback (most recent call last):

  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1735073652210_0001/container_1735073652210_0001_01_000001/annotation.py", line 18, in <module>

    from utils.pipeline import *

ModuleNotFoundError: No module named 'utils'

)

2024-12-24 20:55:52,839 ERROR ApplicationMaster [main]: Uncaught exception: 

org.apache.spark.SparkException: Exception thrown in awaitResult: 

	at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:322)

	at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:525)

	at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:284)

	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:967)

	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:966)

	at java.base/java.security.AccessController.doPrivileged(Native Method)

	at java.base/javax.security.auth.Subject.doAs(Subject.java:423)

	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)

	at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:966)

	at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

Caused by: org.apache.spark.PySparkUserAppException: User application exited with 1 : Traceback (most recent call last):

  File "/mnt/var/hadoop/tmp/nm-local-dir/usercache/trusted-service-user/appcache/application_1735073652210_0001/container_1735073652210_0001_01_000001/annotation.py", line 18, in <module>

    from utils.pipeline import *

ModuleNotFoundError: No module named 'utils'

	at org.apache.spark.deploy.PythonRunner$.runPythonProcess(PythonRunner.scala:124)

	at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:103)

	at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)

	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

	at java.base/java.lang.reflect.Method.invoke(Method.java:566)

	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:757)

2024-12-24 20:55:52,885 INFO ShutdownHookManager [shutdown-hook-0]: Shutdown hook called

2024-12-24 20:55:53,051 INFO MetricsSystemImpl [shutdown-hook-0]: Stopping azure-file-system metrics system...

2024-12-24 20:55:53,052 INFO MetricsSystemImpl [shutdown-hook-0]: azure-file-system metrics system stopped.

2024-12-24 20:55:53,052 INFO MetricsSystemImpl [shutdown-hook-0]: azure-file-system metrics system shutdown complete.

End of LogType:stderr

***********************************************************************

Additionally I found another error TokenNotFoundInConfigurationException on when uploading the workspace package and apply to the Spark serverless pool. I've created linked service with the SAS authentication to the Azure ADLS account where Synapse pools used as storage.

Error:


2024-12-24 20:48:28,430 ERROR TokenLibrary$ [Thread-38]: No SasToken found in Configuration for conf: fs.azure.sas.python.gqxzenxanqd46y8etk1vhpob.blob.core.windows.net

java.lang.RuntimeException: No SasToken found in Configuration for conf: fs.azure.sas.python.gqxzenxanqd46y8etk1vhpob.blob.core.windows.net

	at com.microsoft.azure.synapse.tokenlibrary.TokenLibrary$.getSystemSasToken(TokenLibrary.scala:120)

	at com.microsoft.azure.synapse.tokenlibrary.TokenLibrary$.getSystemSasToken(TokenLibrary.scala:87)

	at com.microsoft.azure.synapse.tokenlibrary.TokenLibrary.getSystemSasToken(TokenLibrary.scala)

	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

	at java.base/java.lang.reflect.Method.invoke(Method.java:566)

	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)

	at py4j.Gateway.invoke(Gateway.java:282)

	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

	at py4j.commands.CallCommand.execute(CallCommand.java:79)

	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)

	at py4j.ClientServerConnection.run(ClientServerConnection.java:106)

	at java.base/java.lang.Thread.run(Thread.java:829)

Caused by: com.microsoft.azure.synapse.tokenlibrary.util.TokenNotFoundInConfigurationException: No SasToken found in conf for Key: fs.azure.sas.python.gqxzenxanqd46y8etk1vhpob.blob.core.windows.net

	at com.microsoft.azure.synapse.tokenlibrary.TokenLibraryInternal.getSasTokenOnlyFromConfiguration(TokenLibraryInternal.scala:593)

	at com.microsoft.azure.synapse.tokenlibrary.TokenLibraryInternal.getSasTokenFromCacheOrConfiguration(TokenLibraryInternal.scala:561)

	at com.microsoft.azure.synapse.tokenlibrary.TokenLibrary$.getSystemSasToken(TokenLibrary.scala:105)

	... 14 more

2024-12-24 21:00:06,862 INFO TokenLibrary$ [Thread-38]: Getting SasToken for confKey: fs.azure.sas.library.gqxzenxanqd46y8etk1vhpob.blob.core.windows.net

	... 14 more
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,521 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,117 questions
Azure Data Lake Analytics
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 26,936 Reputation points MVP
    2024-12-25T14:22:36.56+00:00

    The TokenNotFoundInConfigurationException error in Azure Synapse typically arises when the system cannot locate the necessary authentication token in the configuration settings. To resolve this issue, consider the following steps:

    1. Ensure that all required environment variables for authentication are correctly set and accessible. Missing or misconfigured variables can lead to this exception.
    2. If you're using a managed identity, confirm that it is properly assigned to the resource and has the necessary permissions. An unassigned or incorrectly configured managed identity can cause authentication failures.
    3. If relying on Azure CLI for authentication, make sure you are logged in (az login) and that the CLI is correctly configured. An unauthenticated or misconfigured CLI can result in token retrieval errors.
    4. Examine your application's code to ensure that the authentication methods are correctly implemented and that the configuration settings align with your Azure environment.

    By systematically checking these areas, you can identify and rectify the configuration issues leading to the TokenNotFoundInConfigurationException error.

    0 comments No comments

  2. Sina Salam 15,011 Reputation points
    2024-12-25T16:25:55.37+00:00

    Hello manigandan,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having TokenNotFoundInConfigurationException error when uploading a Python wheel as workspace package in Synpase.

    There are two major errors ModuleNotFoundError and TokenNotFoundInConfigurationException and without introduce unnecessary troubleshooting steps for these two distinct errors, authentication alone might not solve your issue because ModuleNotFoundError relates to Python's inability to locate the utils module within the uploaded wheel package and TokenNotFoundInConfigurationException is pertained to missing SAS token configurations for Azure Synapse storage.

    STEP 1:

    1. For the ModuleNotFoundError Error:
      1. Ensure the setup.py file is configured to correctly package the utils module. You can modify the setup.py if necessary:
                   from setuptools import setup, find_packages
                   setup(
                       name="Data_Project",
                       version="0.1.0",
                       author="Maniganda",
                       packages=find_packages(where="src"),
                       package_dir={"": "src"},  # Indicating the source directory
                       include_package_data=True,
                       description="Data Engineering Project",
                   )
        
        Run the following command to build the wheel using bash: python setup.py bdist_wheel
      2. Upload the Correctly Packaged Wheel by Navigate to Azure Synapse Studio > Manage > Workspace Packages.
      3. Upload the .whl file under the workspace package section.
    2. Check Spark Pool Environment to make sure that the workspace package is linked to the Spark serverless pool by: Go to Manage > Apache Spark Pools > Select the pool > Packages > Add the uploaded package.
    3. Verify the Import Statement by using the fully qualified package path in your code:
              from Data_Project.utils.pipeline import *
         
         
      
    4. Restart the Spark session after uploading the package and submit the Spark pipeline and check if the module is correctly recognized.

    STEP 2: For the TokenNotFoundInConfigurationException Error:

    1. Verify SAS Token Configuration by ensure that correct SAS token is added to the linked service configuration for Azure Data Lake Storage (ADLS). For an example SAS token key in the configuration: fs.azure.sas.<container-name>.<account-name>.blob.core.windows.net
    2. If using managed identity, verify that:
      1. The managed identity is assigned the Storage Blob Data Contributor role.
      2. The linked service in Synapse uses Managed Identity Authentication.
    3. Validate access using Azure CLI using bash command:
         az storage blob list --container-name <container-name> --account-name <account-name> --sas-token <sas-token>
      
    4. Then Synchronize Token Library:
    5. If SAS tokens are used, ensure the TokenLibrary in Synapse is synchronized with your configuration:
              from synapse.ml.core.platform import TokenLibrary
              TokenLibrary.getToken("<linked-service-name>")
      

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.