Python script task for jobs
Use the Python script task to run a Python file.
Configure a Python script task
Before you begin, you must upload your Python script to a location accessible to the user configuring the job. Databricks recommends using workspace files for Python scripts. See What are workspace files?.
Note
The jobs UI displays options dynamically based on other configured settings.
Databricks recommends against storing code or data using DBFS root or mounts. Instead, you can migrate Python scripts to workspace files or volumes or use URIs to access cloud object storage.
To begin the flow to configure a Python script
task:
- Navigate to the Tasks tab in the Jobs UI.
- In the Type drop-down menu, select
Python script
.
Configure the source
In the Source drop-down menu, select a location for the Python script using one of the following options.
Workspace
Use Workspace to configure a Python script stored using workspace files.
- Click the Path field. The Select Python File dialog appears.
- Browse to the Python script, click to highlight the file, and click Confirm.
Note
You can use this option to configure a task on a Python script stored in a Databricks Git folder. Databricks recommends using the Git provider option and a remote Git repository to version assets scheduled with jobs.
DBFS/ADLS
Use DBFS/ADLS to configure a Python script stored in a volume, cloud object storage location, or the DBFS root.
Databricks recommends storing Python scripts in Unity Catalog volumes or cloud object storage.
In the Path field, enter the URI to your Python script. For example, /Volumes/path/to/script.py
or abfss://container-name@storage-account-name.dfs.core.windows.net/path/to/script.py
.
Git provider
Use Git provider to configure a Python script stored in a remote Git repository.
The options displayed by the UI depend on whether or not you have already configured a Git provider elsewhere. Only one remote Git repository can be used for all tasks in a job. See Use Git with jobs.
The Path field appears after you have configured a git reference.
Enter the relative path for your Python script, such as etl/bronze/ingest.py
.
Important
When you enter the relative path, don’t begin with /
or ./
. For example, if the absolute path for the Python code you want to access is /etl/bronze/ingest.py
, enter etl/bronze/ingest.py
in the Path field.
Configure compute and dependent libraries
- Use Compute to select or configure a cluster that supports the logic in your script.
- If you use
Serverless
compute, use the Environment and Libraries field to select, edit, or add a new environment. See Install notebook dependencies. - For all other compute configurations, click + Add under Dependent libraries. The Add dependent library dialogue appears.
- You can select an existing library or upload a new library.
- You can only use libraries stored in a location supported by your compute configurations. See Python library support.
- Each Library Source has a different flow for selecting or uploading a library. See Libraries.
Finalize job configuration
- (Optional) Configure Parameters as a list of strings passed as CLI arguments to the Python script. See Configure task parameters.
- Click Save task.