แชร์ผ่าน


Notebook task for jobs

Use the notebook task to deploy Databricks notebooks.

Configure a notebook task

Before you begin, you must have your notebook in a location accessible by the user configuring the job.

Note

The jobs UI displays options dynamically based on other configured settings.

To begin the flow to configure a Notebook task:

  1. Navigate to the Tasks tab in the Jobs UI.
  2. In the Type drop-down menu, select Notebook.

Configure the source

In the Source drop-down menu, select a location for the Python script using one of the following options.

Workspace

Use Workspace to configure a notebook stored in the workspace by completing the following steps:

  1. Click the Path field. The Select Notebook dialog appears.
  2. Browse to the notebook, click to highlight the file, and click Confirm.

Note

You can use this option to configure a task for a notebook stored in a Databricks Git folder. Databricks recommends using the Git provider option and a remote Git repository for versioning assets scheduled with jobs.

Git provider

Use Git provider to configure a notebook in a remote Git repository.

The options displayed by the UI depend on whether or not you have already configured a Git provider elsewhere. Only one remote Git repository can be used for all tasks in a job. See Use Git with jobs.

Important

Notebooks created by Azure Databricks jobs that run from remote Git repositories are ephemeral and cannot be relied upon to track MLflow runs, experiments, or models. When creating a notebook from a job, use a workspace MLflow experiment (instead of a notebook MLflow experiment) and call mlflow.set_experiment("/path/to/experiment") in the workspace notebook before running any MLflow tracking code. For more details, see Prevent data loss in MLflow experiments.

The Path field appears after you have configured a git reference.

Enter the relative path for your notebook, such as etl/bronze/ingest.py.

Important

When you enter the relative path, don’t begin with / or ./. For example, if the absolute path for the notebook you want to access is /etl/bronze/ingest.py, enter etl/bronze/ingest.py in the Path field.

Configure compute and dependent libraries

  1. Use Compute to select or configure a cluster that supports the logic in your notebook.
  2. If you use Serverless compute, use the Environment and Libraries field to select, edit, or add a new environment. See Install notebook dependencies.
  3. For all other compute configurations, click + Add under Dependent libraries. The Add dependent library dialogue appears.
    • You can select an existing library or upload a new library.
    • You can only use libraries stored in a location supported by your compute configurations. See Python library support.
    • Each Library Source has a different flow for selecting or uploading a library. See Libraries.

Finalize job configuration

  1. (Optional) Configure Parameters as key-value pairs that can be accessed in the notebook using dbutils.widgets. See Configure task parameters.
  2. Click Save task.

Limitations

Total notebook cell output (the combined output of all notebook cells) is subject to a 20MB size limit. Additionally, individual cell output is subject to an 8MB size limit. If total cell output exceeds 20MB in size, or if the output of an individual cell is larger than 8MB, the run is canceled and marked as failed.

If you need help finding cells near or beyond the limit, run the notebook against an all-purpose cluster and use this notebook autosave technique.