Jaa


Run a file on a cluster or a file or notebook as a job in Azure Databricks using the Databricks extension for Visual Studio Code

The Databricks extension for Visual Studio Code allows you to run your Python code on a cluster or your Python, R, Scala, or SQL code or notebook as a job in Azure Databricks.

This information assumes that you have already installed and set up the Databricks extension for Visual Studio Code. See Install the Databricks extension for Visual Studio Code.

Run a Python file on a cluster

To run a Python file on an Azure Databricks cluster using the Databricks extension for Visual Studio Code, with the extension and your project opened:

  1. Open the Python file that you want to run on the cluster.
  2. Do one of the following:
    • In the file editor’s title bar, click the Run on Databricks icon, and then click Upload and Run File.

      Upload and run file from from icon

    • In Explorer view (View > Explorer), right-click the file, and then select Run on Databricks > Upload and Run File from the context menu.

      Upload and run file from context menu

The file runs on the cluster, and output is available in the Debug Console (View > Debug Console).

Run a Python file as a job

To run a Python file as an Azure Databricks job using the Databricks extension for Visual Studio Code, with the extension and your project opened:

  1. Open the Python file that you want to run as a job.
  2. Do one of the following:
    • In the file editor’s title bar, click the Run on Databricks icon, and then click Run File as Workflow.

      Run file as workflow from icon

    • In Explorer view (View > Explorer), right-click the file, and then select Run on Databricks > Run File as Workflow from the context menu.

      Run file as workflow from context menu

A new editor tab appears, titled Databricks Job Run. The file runs as a job in the workspace, and any output is printed to the new editor tab’s Output area.

To view information about the job run, click the Task run ID link in the new Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.

Run a Python, R, Scala, or SQL notebook as a job

To run a notebook as an Azure Databricks job using the Databricks extension for Visual Studio Code, with the extension and your project opened:

  1. Open the notebook that you want to run as a job.

    Tip

    To turn a Python, R, Scala, or SQL file into an Azure Databricks notebook, add the comment # Databricks notebook source to the beginning of the file, and add the comment # COMMAND ---------- before each cell. For more information, see Import a file and convert it to a notebook.

    A Python code file formatted as a Databricks notebook1

  2. Do one of the following:

    • In the notebook file editor’s title bar, click the Run on Databricks icon, and then click Run File as Workflow.

    Note

    If Run on Databricks as Workflow is not available, see Create a custom run configuration.

    • In Explorer view (View > Explorer), right-click the notebook file, and then select Run on Databricks > Run File as Workflow from the context menu.

A new editor tab appears, titled Databricks Job Run. The notebook runs as a job in the workspace. The notebook and its output are displayed in the new editor tab’s Output area.

To view information about the job run, click the Task run ID link in the Databricks Job Run editor tab. Your workspace opens and the job run’s details are displayed in the workspace.

Create a custom run configuration

A custom run configuration for the Databricks extension for Visual Studio Code allows you to pass custom arguments to a job or a notebook, or create different run settings for different files.

To create a custom run configuration, click Run > Add Configuration from the main menu in Visual Studio Code. Then select either Databricks for a cluster-based run configuration or Databricks: Workflow for a job-based run configuration.

For example, the following custom run configuration modifies the Run File as Workflow launch command to pass the --prod argument to the job:

{
  "version": "0.2.0",
  "configurations": [
    {
      "type": "databricks-workflow",
      "request": "launch",
      "name": "Run on Databricks as Workflow",
      "program": "${file}",
      "parameters": {},
      "args": ["--prod"]
    }
  ]
}

Tip

Add "databricks": true to your "type": "python" configuration if you want to use Python configuration but take advantage of the Databricks Connect authentication that is part of the extension setup.

Using custom run configurations, you can also pass in command-line arguments and run your code just by pressing F5. For more information, see Launch configurations in the Visual Studio Code documentation.