Use Java to manage directories and files in Azure Data Lake Storage

This article shows you how to use Java to create and manage directories and files in storage accounts that have a hierarchical namespace.

To learn about how to get, set, and update the access control lists (ACL) of directories and files, see Use .Java to manage ACLs in Azure Data Lake Storage.

Package (Maven) | Samples | API reference | Gen1 to Gen2 mapping | Give Feedback

Prerequisites

Set up your project

To get started, open this page and find the latest version of the Java library. Then, open the pom.xml file in your text editor. Add a dependency element that references that version.

If you plan to authenticate your client application by using Microsoft Entra ID, then add a dependency to the Azure Identity library. For more information, see Azure Identity client library for Java.

Next, add these imports statements to your code file.

import com.azure.identity.*;
import com.azure.storage.common.StorageSharedKeyCredential;
import com.azure.core.http.rest.PagedIterable;
import com.azure.core.util.BinaryData;
import com.azure.storage.file.datalake.*;
import com.azure.storage.file.datalake.models.*;
import com.azure.storage.file.datalake.options.*;

Note

Multi-protocol access on Data Lake Storage enables applications to use both Blob APIs and Data Lake Storage Gen2 APIs to work with data in storage accounts with hierarchical namespace (HNS) enabled. When working with capabilities unique to Data Lake Storage Gen2, such as directory operations and ACLs, use the Data Lake Storage Gen2 APIs, as shown in this article.

When choosing which APIs to use in a given scenario, consider the workload and the needs of your application, along with the known issues and impact of HNS on workloads and applications.

Authorize access and connect to data resources

To work with the code examples in this article, you need to create an authorized DataLakeServiceClient instance that represents the storage account. You can authorize a DataLakeServiceClient object using Microsoft Entra ID, an account access key, or a shared access signature (SAS).

You can use the Azure identity client library for Java to authenticate your application with Microsoft Entra ID.

Create a DataLakeServiceClient instance and pass in a new instance of the DefaultAzureCredential class.

static public DataLakeServiceClient GetDataLakeServiceClient(String accountName){
    DefaultAzureCredential defaultCredential = new DefaultAzureCredentialBuilder().build();

    DataLakeServiceClient dataLakeServiceClient = new DataLakeServiceClientBuilder()
        .endpoint("https://" + accountName + ".dfs.core.windows.net")
        .credential(defaultCredential)
        .buildClient();

    return dataLakeServiceClient;
}

To learn more about using DefaultAzureCredential to authorize access to data, see Azure Identity client library for Java.

Create a container

A container acts as a file system for your files. You can create a container by using the following method:

The following code example creates a container and returns a DataLakeFileSystemClient object for later use:

public DataLakeFileSystemClient CreateFileSystem(
        DataLakeServiceClient serviceClient,
        String fileSystemName) {

    DataLakeFileSystemClient fileSystemClient = serviceClient.createFileSystem(fileSystemName);

    return fileSystemClient;
}

Create a directory

You can create a directory reference in the container by using the following method:

The following code example adds a directory to a container, then adds a subdirectory and returns a DataLakeDirectoryClient object for later use:

public DataLakeDirectoryClient CreateDirectory(
        DataLakeFileSystemClient fileSystemClient,
        String directoryName,
        String subDirectoryName) {

    DataLakeDirectoryClient directoryClient = fileSystemClient.createDirectory(directoryName);

    return directoryClient.createSubdirectory(subDirectoryName);
}

Rename or move a directory

You can rename or move a directory by using the following method:

Pass the path of the desired directory as a parameter. The following code example shows how to rename a subdirectory:

public DataLakeDirectoryClient RenameDirectory(
        DataLakeFileSystemClient fileSystemClient,
        String directoryPath,
        String subdirectoryName,
        String subdirectoryNameNew) {

    DataLakeDirectoryClient directoryClient = fileSystemClient
            .getDirectoryClient(String.join("/", directoryPath, subdirectoryName));

    return directoryClient.rename(
            fileSystemClient.getFileSystemName(),
            String.join("/", directoryPath, subdirectoryNameNew));
}

The following code example shows how to move a subdirectory from one directory to a different directory:

public DataLakeDirectoryClient MoveDirectory(
        DataLakeFileSystemClient fileSystemClient,
        String directoryPathFrom,
        String directoryPathTo,
        String subdirectoryName) {

    DataLakeDirectoryClient directoryClient = fileSystemClient
            .getDirectoryClient(String.join("/", directoryPathFrom, subdirectoryName));

    return directoryClient.rename(
            fileSystemClient.getFileSystemName(),
            String.join("/", directoryPathTo, subdirectoryName));
}

Upload a file to a directory

You can upload content to a new or existing file by using the following method:

The following code example shows how to upload a local file to a directory using the uploadFromFile method:

public void UploadFile(
        DataLakeDirectoryClient directoryClient,
        String fileName) {

    DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);

    fileClient.uploadFromFile("filePath/sample-file.txt");
}

You can use this method to create and upload content to a new file, or you can set the overwrite parameter to true to overwrite an existing file.

Append data to a file

You can upload data to be appended to a file by using the following method:

The following code example shows how to append data to the end of a file using these steps:

  • Create a DataLakeFileClient object to represent the file resource you're working with.
  • Upload data to the file using the DataLakeFileClient.append method.
  • Complete the upload by calling the DataLakeFileClient.flush method to write the previously uploaded data to the file.
public void AppendDataToFile(
        DataLakeDirectoryClient directoryClient) {

    DataLakeFileClient fileClient = directoryClient.getFileClient("sample-file.txt");
    long fileSize = fileClient.getProperties().getFileSize();

    String sampleData = "Data to append to end of file";
    fileClient.append(BinaryData.fromString(sampleData), fileSize);

    fileClient.flush(fileSize + sampleData.length(), true);
}

Download from a directory

The following code example shows how to download a file from a directory to a local file using these steps:

  • Create a DataLakeFileClient object to represent the file that you want to download.
  • Use the DataLakeFileClient.readToFile method to read the file. This example sets the overwrite parameter to true, which overwrites an existing file.
public void DownloadFile(
        DataLakeDirectoryClient directoryClient,
        String fileName) {

    DataLakeFileClient fileClient = directoryClient.getFileClient(fileName);

    fileClient.readToFile("filePath/sample-file.txt", true);
}

List directory contents

You can list directory contents by using the following method and enumerating the result:

Enumerating the paths in the result may make multiple requests to the service while fetching the values.

The following code example prints the names of each file that is located in a directory:

public void ListFilesInDirectory(
        DataLakeFileSystemClient fileSystemClient,
        String directoryName) {

    ListPathsOptions options = new ListPathsOptions();
    options.setPath(directoryName);

    PagedIterable<PathItem> pagedIterable = fileSystemClient.listPaths(options, null);

    java.util.Iterator<PathItem> iterator = pagedIterable.iterator();
    PathItem item = iterator.next();

    while (item != null) {
        System.out.println(item.getName());

        if (!iterator.hasNext()) {
            break;
        }
        item = iterator.next();
    }

}

Delete a directory

You can delete a directory by using one of the following methods:

The following code example uses deleteWithResponse to delete a nonempty directory and all paths beneath the directory:

public void DeleteDirectory(
        DataLakeFileSystemClient fileSystemClient,
        String directoryName) {

    DataLakeDirectoryClient directoryClient = fileSystemClient.getDirectoryClient(directoryName);

    // Set to true to delete all paths beneath the directory
    boolean recursive = true;

    directoryClient.deleteWithResponse(recursive, null, null, null);
}

See also