Κοινή χρήση μέσω


Databricks SDK for Java

Note

Databricks recommends Databricks Asset Bundles for creating, developing, deploying, and testing jobs and other Databricks resources as source code. See What are Databricks Asset Bundles?.

In this article, you learn how to automate Azure Databricks operations and accelerate development with the Databricks SDK for Java. This article supplements the Databricks SDK for Java README, API reference, and examples.

Note

This feature is in Beta and is okay to use in production.

During the Beta period, Databricks recommends that you pin a dependency on the specific minor version of the Databricks SDK for Java that your code depends on. For example, you can pin dependencies in files such as pom.xml for Maven. For more information about pinning dependencies, see Introduction to the Dependency Mechanism.

Before you begin

Before you begin to use the Databricks SDK for Java, your development machine must have:

  • Azure Databricks authentication configured.
  • A Java Development Kit (JDK) that is compatible with Java 8 or above. Continuous integration (CI) testing with the Databricks SDK for Java is compatible with Java versions 8, 11, 17, and 20.
  • A Java-compatible integrated development environment (IDE) is recommended. Databricks recommends IntelliJ IDEA.

Get started with the Databricks SDK for Java

  1. In your project’s pom.xml file, instruct your build system to take a dependency on the Databricks SDK for Java. To do this, add the following <dependency> to the pom.xml file’s existing <dependencies> section. If the <dependencies> section does not already exist within the pom.xml file, you must also add the <dependencies> parent element to the pom.xml file.

    For example, to open your project’s pom.xml file in IntelliJ IDEA, click View > Tool Windows > Project, and then double-click to open your-project-name > src > pom.xml.

    <dependencies>
      <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>databricks-sdk-java</artifactId>
        <version>0.0.1</version>
      </dependency>
    </dependencies>
    

    Note

    Be sure to replace 0.0.1 with the latest version of the Databricks SDK for Java. You can find the latest version in the Maven central repository.

  2. Instruct your project to take the declared dependency on the Databricks SDK for Java. For example, in IntelliJ IDEA, in your project’s Project tool window, right-click your project’s root node, and then click Reload Project.

  3. Add code to import the Databricks SDK for Java and to list all of the clusters in your Azure Databricks workspace. For example, in a project’s Main.java file, the code might be as follows:

    import com.databricks.sdk.WorkspaceClient;
    import com.databricks.sdk.service.compute.ClusterInfo;
    import com.databricks.sdk.service.compute.ListClustersRequest;
    
    public class Main {
      public static void main(String[] args) {
        WorkspaceClient w = new WorkspaceClient();
    
        for (ClusterInfo c : w.clusters().list(new ListClustersRequest())) {
          System.out.println(c.getClusterName());
        }
      }
    }
    

    Note

    By not setting any arguments in the preceding call to WorkspaceClient w = new WorkspaceClient(), the Databricks SDK for Java uses its default process for trying to perform Azure Databricks authentication. To override this default behavior, see the following authentication section.

  4. Build your project. For example, to do this in IntelliJ IDEA, from the main menu, click Build > Build Project.

  5. Run your main file. For example, to do this in IntelliJ IDEA for a project’s Main.java file, from the main menu, click Run > Run ‘Main’.

  6. The list of clusters appears. For example, in IntelliJ IDEA, this is in the Run tool window. To display this tool window, from the main menu, click View > Tool Windows > Run.

Authenticate the Databricks SDK for Java with your Azure Databricks account or workspace

The Databricks SDK for Java implements the Databricks client unified authentication standard, a consolidated and consistent architectural and programmatic approach to authentication. This approach helps make setting up and automating authentication with Azure Databricks more centralized and predictable. It enables you to configure Databricks authentication once and then use that configuration across multiple Databricks tools and SDKs without further authentication configuration changes. For more information, including more complete code examples in Java, see Databricks client unified authentication.

Note

The Databricks SDK for Java has not yet implemented Azure managed identities authentication.

Some of the available coding patterns to initialize Databricks authentication with the Databricks SDK for Java include:

  • Use Databricks default authentication by doing one of the following:

    • Create or identify a custom Databricks configuration profile with the required fields for the target Databricks authentication type. Then set the DATABRICKS_CONFIG_PROFILE environment variable to the name of the custom configuration profile.
    • Set the required environment variables for the target Databricks authentication type.

    Then instantiate for example a WorkspaceClient object with Databricks default authentication as follows:

    import com.databricks.sdk.WorkspaceClient;
    // ...
    WorkspaceClient w = new WorkspaceClient();
    // ...
    
  • Hard-coding the required fields is supported but not recommended, as it risks exposing sensitive information in your code, such as Azure Databricks personal access tokens. The following example hard-codes Azure Databricks host and access token values for Databricks token authentication:

    import com.databricks.sdk.WorkspaceClient;
    import com.databricks.sdk.core.DatabricksConfig;
    // ...
    DatabricksConfig cfg = new DatabricksConfig()
      .setHost("https://...")
      .setToken("...");
    WorkspaceClient w = new WorkspaceClient(cfg);
    // ...
    

See also Authentication in the Databricks SDK for Java README.

Use Databricks Utilities and Java with the Databricks SDK for Java

Databricks Utilities provides several helper functions to make it easy to work with object storage efficiently, chain and parameterize notebooks, and work with secrets. Databricks provides a Databricks Utilities for Scala library, which you can call with Java code, to enable you to programmatically access Databricks Utilities.

To use Java code to call the Databricks Utilities for Scala, do the following:

  1. In your Java project, declare a dependency on the Databricks SDK for Java, as described in the previous section.

  2. Declare a dependency on the Databricks Utilities for Scala library. To do this, add the following <dependency> to the pom.xml file’s existing <dependencies> section:

    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>databricks-dbutils-scala_2.12</artifactId>
      <version>0.1.4</version>
    </dependency>
    

    Note

    Be sure to replace 0.1.4 with the latest version of the Databricks Utilities for Scala library. You can find the latest version in the Maven central repository.

  3. Instruct your project to take the declared dependency on the Databricks Utilities for Scala. For example, in IntelliJ IDEA, in your project’s Project tool window, click your project’s root node, and then click Maven > Reload Project.

  4. Add code to import and then call the Databricks Utility for Scala. For example, the following code automates a Unity Catalog volume. This example creates a file named zzz_hello.txt in the volume’s path within the workspace, reads the data from the file, and then deletes the file:

    import com.databricks.sdk.core.DatabricksConfig;
    import com.databricks.sdk.scala.dbutils.DBUtils;
    
    public class Main {
      public static void main(String[] args) {
        String filePath = "/Volumes/main/default/my-volume/zzz_hello.txt";
        String fileData = "Hello, Databricks!";
        DBUtils dbutils = DBUtils.getDBUtils(new DatabricksConfig().setProfile("DEFAULT"));
    
        dbutils.fs().put(filePath, fileData, true);
    
        System.out.println(dbutils.fs().head(filePath, 18));
    
        dbutils.fs().rm(filePath, false);
      }
    }
    
  5. Build your project and run your main file.

Code examples

The following code examples demonstrate how to use the Databricks SDK for Java to create and delete clusters, create jobs, and list account-level groups. These code examples use the Databricks SDK for Java’s default Azure Databricks authentication process.

For additional code examples, see the examples folder in the Databricks SDK for Java repository in GitHub.

Create a cluster

This code example creates a cluster with the specified Databricks Runtime version and cluster node type. This cluster has one worker, and the cluster will automatically terminate after 15 minutes of idle time.

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.compute.CreateCluster;
import com.databricks.sdk.service.compute.CreateClusterResponse;

public class Main {
  public static void main(String[] args) {
    WorkspaceClient w = new WorkspaceClient();

    CreateClusterResponse c = w.clusters().create(
      new CreateCluster()
        .setClusterName("my-cluster")
        .setSparkVersion("12.2.x-scala2.12")
        .setNodeTypeId("Standard_DS3_v2")
        .setAutoterminationMinutes(15L)
        .setNumWorkers(1L)
    ).getResponse();

    System.out.println("View the cluster at " +
      w.config().getHost() +
      "#setting/clusters/" +
      c.getClusterId() +
      "/configuration\n");
  }
}

Create a cluster that uses JDK 17

Note

For Databricks Runtime 16.0 or above, JDK 17 is generally available and the default. For Databricks Runtime versions 13.1 to 15.4, JDK 8 is the default, and JDK 17 is in Public Preview.

This section provides a guide to creating a cluster using the Java Development Kit (JDK). Learn how to create a cluster with JDK 17 to use Java in your notebooks and jobs.

When you create a cluster, specify that the cluster uses JDK 17 for both the driver and executor by adding the following environment variable to Advanced Options > Spark > Environment Variables:

JNAME=zulu17-ca-amd64

Permanently delete a cluster

This code example permanently deletes the cluster with the specified cluster ID from the workspace.

import com.databricks.sdk.WorkspaceClient;
import java.util.Scanner;

public class Main {
  public static void main(String[] args) {
    System.out.println("ID of cluster to delete (for example, 1234-567890-ab123cd4):");

    Scanner in = new Scanner(System.in);
    String c_id = in.nextLine();
    WorkspaceClient w = new WorkspaceClient();

    w.clusters().permanentDelete(c_id);
  }
}

Create a job

This code example creates a Azure Databricks job that can be used to run the specified notebook on the specified cluster. As this code runs, it gets the existing notebook’s path, the existing cluster ID, and related job settings from the user at the terminal.

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.jobs.JobTaskSettings;
import com.databricks.sdk.service.jobs.NotebookTask;
import com.databricks.sdk.service.jobs.NotebookTaskSource;
import com.databricks.sdk.service.jobs.CreateResponse;
import com.databricks.sdk.service.jobs.CreateJob;

import java.util.Scanner;
import java.util.Map;
import java.util.Collection;
import java.util.Arrays;

public class Main {
  public static void main(String[] args) {
    System.out.println("Some short name for the job (for example, my-job):");
    Scanner in = new Scanner(System.in);
    String jobName = in.nextLine();

    System.out.println("Some short description for the job (for example, My job):");
    String description = in.nextLine();

    System.out.println("ID of the existing cluster in the workspace to run the job on (for example, 1234-567890-ab123cd4):");
    String existingClusterId = in.nextLine();

    System.out.println("Workspace path of the notebook to run (for example, /Users/someone@example.com/my-notebook):");
    String notebookPath = in.nextLine();

    System.out.println("Some key to apply to the job's tasks (for example, my-key): ");
    String taskKey = in.nextLine();

    System.out.println("Attempting to create the job. Please wait...");

    WorkspaceClient w = new WorkspaceClient();

    Map<String, String> map = Map.of("", "");

    Collection<JobTaskSettings> tasks = Arrays.asList(new JobTaskSettings()
      .setDescription(description)
      .setExistingClusterId(existingClusterId)
      .setNotebookTask(new NotebookTask()
        .setBaseParameters(map)
        .setNotebookPath(notebookPath)
        .setSource(NotebookTaskSource.WORKSPACE))
      .setTaskKey(taskKey)
    );

    CreateResponse j = w.jobs().create(new CreateJob()
      .setName(jobName)
      .setTasks(tasks)
    );

    System.out.println("View  the job at " +
      w.config().getHost() +
      "/#job/" +
      j.getJobId()
    );
  }
}

Manage files in Unity Catalog volumes

This code example demonstrates various calls to files functionality within WorkspaceClient to access a Unity Catalog volume.

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.files.DirectoryEntry;
import com.databricks.sdk.service.files.DownloadResponse;
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;

public class Main {
  public static void main(String[] args) throws IOException {
    String catalog          = "main";
    String schema           = "default";
    String volume           = "my-volume";
    String volumePath       = "/Volumes/" + catalog + "/" + schema + "/" + volume; // /Volumes/main/default/my-volume
    String volumeFolder     = "my-folder";
    String volumeFolderPath = volumePath + "/" + volumeFolder; // /Volumes/main/default/my-volume/my-folder
    String volumeFile       = "data.csv";
    String volumeFilePath   = volumeFolderPath + "/" + volumeFile; // /Volumes/main/default/my-volume/my-folder/data.csv
    String uploadFilePath   = "./data.csv";

    WorkspaceClient w = new WorkspaceClient();

    // Create an empty folder in a volume.
    w.files().createDirectory(volumeFolderPath);

    // Upload a file to a volume.
    try {
      File uploadFile = new File(upload_file_path);
      InputStream uploadInputStream = Files.newInputStream(Paths.get(upload_file_path));
      w.files().upload(volumeFilePath, uploadInputStream);
    } catch (java.io.IOException e) {
      System.out.println(e.getMessage());
      System.exit(-1);
    }

    // List the contents of a volume.
    Iterable<DirectoryEntry> volumeItems = w.files().listDirectoryContents(volumePath);
    for (DirectoryEntry volumeItem: volumeItems) {
      System.out.println(volumeItem.getPath());
    }

    // List the contents of a folder in a volume.
    Iterable<DirectoryEntry> volumeFolderItems = w.files().listDirectoryContents(volumeFolderPath);
    for (DirectoryEntry volumeFolderItem: volumeFolderItems) {
      System.out.println(volumeFolderItem.getPath());
    }

    // Print the contents of a file in a volume.
    DownloadResponse resp = w.files().download(volumeFilePath);
    InputStream downloadedFile = resp.getContents();

    try {
      BufferedReader reader = new BufferedReader(new InputStreamReader(downloadedFile));
      String line;
      while ((line = reader.readLine()) != null) {
          System.out.println(line);
      }
    } catch (java.io.IOException e) {
      System.out.println(e.getMessage());
      System.exit(-1);
    }

    // Delete a file from a volume.
    w.files().delete(volumeFilePath);

    // Delete a folder from a volume.
    w.files().deleteDirectory(volumeFolderPath);
  }
}

List account-level groups

This code example lists the display names for all of the available groups within the Azure Databricks account.

import com.databricks.sdk.AccountClient;
import com.databricks.sdk.core.DatabricksConfig;
import com.databricks.sdk.service.iam.Group;
import com.databricks.sdk.service.iam.ListAccountGroupsRequest;

public class Main {
  public static void main(String[] args) {
    AccountClient a = new AccountClient();

    for (Group g : a.groups().list((new ListAccountGroupsRequest()))) {
      System.out.println(g.getDisplayName());
    }
  }
}

Use Scala with the Databricks SDK for Java

You can use Scala projects with the Databricks SDK for Java. Before you begin, your development machine must have:

  • Azure Databricks authentication configured.
  • A Scala-compatible integrated development environment (IDE) is recommended. Databricks recommends IntelliJ IDEA with the Scala plugin. These instructions were tested with IntelliJ IDEA Community Edition 2023.3.6. If you use a different version or edition of IntelliJ IDEA, the following instructions might vary.
  • A Java Development Kit (JDK) that is compatible with Java 8 or above. If you want to run your applications or use your libraries on an Azure Databricks cluster, Databricks recommends that you use a version of JDK that matches the JDK version on the cluster. To find the JDK version that is included with a specific Databricks Runtime, see Databricks Runtime release notes versions and compatibility. If you use IntelliJ IDEA, you can choose an existing local JDK installation or install a new JDK locally during Scala project creation.
  • A Scala build tool. Databricks recommends sbt. If you use IntelliJ IDEA, you can choose the sbt version to use during Scala project creation.
  • Scala. If you want to run your applications or use your libraries on an Azure Databricks cluster, Databricks recommends that you use a version of Scala that matches the Scala version on the cluster. To find the Scala version that is included with a specific Databricks Runtime, see Databricks Runtime release notes versions and compatibility. If you use IntelliJ IDEA, you can choose the Scala version to use during Scala project creation.

To configure, build, and run your Scala project:

  1. In your project’s build.sbt file, take a dependency on the Databricks SDK for Java library by adding the following line to the end of the file, and then save the file:

    libraryDependencies += "com.databricks" % "databricks-sdk-java" % "0.2.0"
    

    Note

    Be sure to replace 0.2.0 with the latest version of the Databricks SDK for Java library. You can find the latest version in the Maven central repository.

  2. Instruct your project to take the declared dependency on the Databricks SDK for Java. For example, in IntelliJ IDEA, click the Load sbt changes notification icon.

  3. Add code to import the Databricks SDK for Java and to list all of the clusters in your Azure Databricks workspace. For example, in a project’s Main.scala file, the code might be as follows:

    import com.databricks.sdk.WorkspaceClient
    import com.databricks.sdk.service.compute.ListClustersRequest
    
    object Main {
      def main(args: Array[String]): Unit = {
        val w = new WorkspaceClient()
    
        w.clusters().list(new ListClustersRequest()).forEach{
          elem => println(elem.getClusterName)
        }
      }
    }
    

    Note

    By not setting any arguments in the preceding call to val w = new WorkspaceClient(), the Databricks SDK for Java uses its default process for trying to perform Azure Databricks authentication. To override this default behavior, see the following authentication section.

  4. Build your project. For example, to do this in IntelliJ IDEA, from the main menu, click Build > Build Project.

  5. Run your main file. For example, to do this in IntelliJ IDEA for a project’s Main.scala file, from the main menu, click Run > Run ‘Main.scala’.

  6. The list of clusters appears. For example, in IntelliJ IDEA, this is in the Run tool window. To display this tool window, from the main menu, click View > Tool Windows > Run.

Use Databricks Utilities and Scala with the Databricks SDK for Java

Databricks Utilities provides several helper functions to make it easy to work with object storage efficiently, chain and parameterize notebooks, and work with secrets. Databricks provides a Databricks Utilities for Scala library to enable you to programmatically access Databricks Utilities with Scala.

To call the Databricks Utilities for Scala, do the following:

  1. In your Scala project, declare a dependency on the Databricks SDK for Java, as described in the previous section.

  2. Declare a dependency on the Databricks Utilities for Scala library. For example, in your project’s build.sbt file, add the following line to the end of the file, and then save the file:

    libraryDependencies += "com.databricks" % "databricks-dbutils-scala_2.12" % "0.1.4"
    

    Note

    Be sure to replace 0.1.4 with the latest version of the Databricks Utilities for Scala library. You can find the latest version in the Maven central repository.

  3. Instruct your project to take the declared dependency on the Databricks Utilities for Scala. For example, in IntelliJ IDEA, click the Load sbt changes notification icon.

  4. Add code to import and then call the Databricks Utility for Scala. For example, the following code automates a Unity Catalog volume. This example creates a file named zzz_hello.txt in the volume’s path within the workspace, reads the data from the file, and then deletes the file:

    import com.databricks.sdk.scala.dbutils.DBUtils
    
    object Main {
      def main(args: Array[String]): Unit = {
        val filePath = "/Volumes/main/default/my-volume/zzz_hello.txt"
        val fileData = "Hello, Databricks!"
        val dbutils = DBUtils.getDBUtils()
    
        dbutils.fs.put(
          file = filePath,
          contents = fileData,
          overwrite = true
        )
    
        println(dbutils.fs.head(filePath))
    
        dbutils.fs.rm(filePath)
      }
    }
    

    Note

    By not setting any arguments in the preceding call to val dbutils = DBUtils.getDBUtils(), Databricks Utilities for Scala uses its default process for trying to perform Azure Databricks authentication.

    To override this default behavior, pass an instantiated DatabricksCfg object as an argument to getDBUtils. For more information, see the preceding authentication section.

    Note, however, that if your code is running inside of the Databricks Runtime, this DatabricksCfg object is ignored. This is because the Databricks Utilities for Scala delegates to the built-in Databricks Utilities when running inside of the Databricks Runtime.

  5. Build your project and run your main file.

To access Unity Catalog volumes, use files within WorkspaceClient. See Manage files in Unity Catalog volumes. You cannot use DBUtils.getDBUtils() to access volumes.

Testing

To test your code, use Java test frameworks such as JUnit. To test your code under simulated conditions without calling Azure Databricks REST API endpoints or changing the state of your Azure Databricks accounts or workspaces, use Java mocking libraries such as Mockito.

For example, given the following file named Helpers.java containing a createCluster function that returns information about the new cluster:

// Helpers.java

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.compute.CreateCluster;
import com.databricks.sdk.service.compute.CreateClusterResponse;

public class Helpers {
  static CreateClusterResponse createCluster(
    WorkspaceClient w,
    CreateCluster   createCluster,
    String          clusterName,
    String          sparkVersion,
    String          nodeTypeId,
    Long            autoTerminationMinutes,
    Long            numWorkers
  ) {
    return w.clusters().create(
      createCluster
        .setClusterName(clusterName)
        .setSparkVersion(sparkVersion)
        .setNodeTypeId(nodeTypeId)
        .setAutoterminationMinutes(autoTerminationMinutes)
        .setNumWorkers(numWorkers)
    ).getResponse();
  }
}

And given the following file named Main.java that calls the createCluster function:

// Main.java

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.service.compute.CreateCluster;
import com.databricks.sdk.service.compute.CreateClusterResponse;

public class Main {
  public static void main(String[] args) {
    WorkspaceClient w = new WorkspaceClient();
    // Replace <spark-version> with the target Spark version string.
    // Replace <node-type-id> with the target node type string.
    CreateClusterResponse c = Helpers.createCluster(
      w,
      new CreateCluster(),
      "My Test Cluster",
      "<spark-version>",
      "<node-type-id>",
      15L,
      1L
    );
    System.out.println(c.getClusterId());
  }
}

The following file named HelpersTest.java tests whether the createCluster function returns the expected response. Rather than creating a cluster in the target workspace, this test mocks a WorkspaceClient object, defines the mocked object’s settings, and then passes the mocked object to the createCluster function. The test then checks whether the function returns the new mocked cluster’s expected ID.

// HelpersTest.java

import com.databricks.sdk.WorkspaceClient;
import com.databricks.sdk.mixin.ClustersExt;
import com.databricks.sdk.service.compute.ClusterDetails;
import com.databricks.sdk.service.compute.CreateCluster;
import com.databricks.sdk.support.Wait;
import com.databricks.sdk.service.compute.CreateClusterResponse;
import org.junit.jupiter.api.Test;
import org.mockito.Mockito;
import static org.junit.jupiter.api.Assertions.assertEquals;

public class HelpersTest {
  @Test
  public void testCreateCluster() {
    WorkspaceClient mockWorkspaceClient = Mockito.mock(WorkspaceClient.class);
    ClustersExt mockClustersExt = Mockito.mock(ClustersExt.class);
    CreateCluster mockCreateCluster = new CreateCluster();
    Wait<ClusterDetails, CreateClusterResponse> mockWait = Mockito.mock(Wait.class);
    CreateClusterResponse mockResponse = Mockito.mock(CreateClusterResponse.class);

    Mockito.when(mockWorkspaceClient.clusters()).thenReturn(mockClustersExt);
    Mockito.when(mockClustersExt.create(Mockito.any(CreateCluster.class))).thenReturn(mockWait);
    Mockito.when(mockWait.getResponse()).thenReturn(mockResponse);

    // Replace <spark-version> with the target Spark version string.
    // Replace <node-type-id> with the target node type string.
    CreateClusterResponse response = Helpers.createCluster(
      mockWorkspaceClient,
      mockCreateCluster,
      "My Test Cluster",
      "<spark-version>",
      "<node-type-id>",
      15L,
      1L
    );
    assertEquals(mockResponse, response);
  }
}

Additional resources

For more information, see: