Retrieving files created by Databricks Jobs

Gabriel-2005 425 Reputation points
2025-01-21T13:46:50.5133333+00:00

I am currently working with a Databricks workspace through Azure and using the Databricks Jobs API to execute Python scripts. These scripts create files during their execution.

I would like to understand how to retrieve these files after the job completes.

Here’s the scenario:

In Azure, I see a storage account associated with the Databricks workspace. Within the containers, there is a job directory.

When I attempt to access this job directory, I encounter the following error:

DenyAssignmentAuthorizationFailed

 

I am an organization admin, so assigning the right permissions should not be a problem. However, I am unsure why I lack access to this directory by default. Is this a specific Databricks configuration, or something I need to change in Azure?

Additionally, I noticed that documentation on Databricks Jobs and their data storage is limited. Any pointers or guidance would be appreciated.

Below is the code for the job, which is executed via the Jobs API:

 

Parse job info

info = dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

info = json.loads(info)

RUN_ID = info.get("tags").get("multitaskParentRunId", "no_run_id")

 

Create directory

run_directory = f"/databricks/driver/training_runs/{RUN_ID}"

dbutils.fs.mkdirs(f"file:{run_directory}")

 

with open(f"{run_directory}/file.txt", "w") as file_:

    file_.write("Hello world :)")

 

 

Questions:

  1. How can I retrieve files created by Databricks Jobs after the job completes?
  2. Is there a specific reason for the DenyAssignmentAuthorizationFailed error when accessing the job directory in the associated storage account?
  3. Are there any best practices or documentation you can recommend for managing Databricks Job outputs in Azure?

Thanks

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,315 questions
{count} votes

Accepted answer
  1. Smaran Thoomu 19,310 Reputation points Microsoft Vendor
    2025-01-21T16:15:23.0566667+00:00

    Hi @Gabriel-2005
    Welcome to Microsoft Q&A platform.

    Thank you for sharing your query! Based on your provided code, it seems that the directory path you are using (file:/databricks/driver/training_runs) is not part of the Databricks File System (DBFS) but rather a general storage path.

    Accessing Files in General Storage Path

    Databricks does not provide UI support for viewing files in general storage paths like this. However, you can programmatically list the contents of the path using one of the following approaches:

    # Option 1: Using dbutils
    dbutils.fs.ls("file:/databricks/driver/training_runs/<your_run_id>/")
    # Option 2: Using os module
    import os
    print(os.listdir("/databricks/driver/training_runs/<your_run_id>/"))
    
    
    

    User's image

    Viewing Files in the Databricks UI

    If you prefer to browse and manage these files through the Databricks UI, consider writing them to the DBFS instead.

    Step 1: Enable the DBFS Browser

    1. Navigate to Admin ConsoleWorkspace SettingsDBFS File Browser and enable it.
    2. Refresh your workspace to apply the changes.

    enter image description here

    Step 2: Update Your Code to Use DBFS Path

    Replace your current file path with a DBFS-compatible path, as shown below:

    run_directory = f"/FileStore/training_runs/demo"
    dbutils.fs.mkdirs(f"dbfs:{run_directory}")
    with open(f"/dbfs{run_directory}/file.txt", "w") as file_:
        file_.write("Hello world :)")
    
    
    

    User's image

    Once the file is written, navigate to DataBrowse DBFSFileStore, and you should see the folder and its contents.

    enter image description here

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.