Dela via


Azure Storage Blob NIO FileSystemProvider

This package allows you to interact with Azure Blob Storage through the standard Java NIO Filesystem APIs.

Source code | API reference documentation | REST API documentation | Product documentation | Samples

Getting started

Prerequisites

Include the package

<dependency>
    <groupId>com.azure</groupId>
    <artifactId>azure-storage-blob-nio</artifactId>
    <version>12.0.0-beta.28</version>
</dependency>

Create a Storage Account

To create a Storage Account you can use the Azure Portal or Azure CLI.

az storage account create \
    --resource-group <resource-group-name> \
    --name <storage-account-name> \
    --location <location>

Authenticate the client

The simplest way to interact with the Storage Service is to create an instance of the FileSystem class using the FileSystems API. To make this possible you'll need the Account SAS (shared access signature) string of the Storage Account or a Shared Key. Learn more at SAS Token and Shared Key

Get credentials

SAS Token

a. Use the Azure CLI snippet below to get the SAS token from the Storage Account.

az storage blob generate-sas \
    --account-name {Storage Account name} \
    --container-name {container name} \
    --name {blob name} \
    --permissions {permissions to grant} \
    --expiry {datetime to expire the SAS token} \
    --services {storage services the SAS allows} \
    --resource-types {resource types the SAS allows}

Example:

CONNECTION_STRING=<connection-string>

az storage blob generate-sas \
    --account-name MyStorageAccount \
    --container-name MyContainer \
    --name MyBlob \
    --permissions racdw \
    --expiry 2020-06-15

b. Alternatively, get the Account SAS Token from the Azure Portal.

  1. Go to your Storage Account
  2. Select Shared access signature from the menu on the left
  3. Click on Generate SAS and connection string (after setup)
Shared Key Credential

Use Account name and Account key. Account name is your Storage Account name.

  1. Go to your Storage Account
  2. Select Access keys from the menu on the left
  3. Under key1/key2 copy the contents of the Key field

Key concepts

NIO on top of Blob Storage is designed for:

  • Working with Blob Storage as though it were a local file system
  • Random access reads on large blobs without downloading the entire blob
  • Uploading full files as blobs
  • Creating and navigating a directory structure within an account
  • Reading and setting attributes on blobs

Design Notes

It is important to recognize that Azure Blob Storage is not a true FileSystem, nor is it the goal of this project to force Azure Blob Storage to act like a full-fledged FileSystem. While providing FileSystem APIs on top of Azure Blob Storage can offer convenience and ease of access in certain cases, trying to force the Storage service to work in scenarios it is not designed for is bound to introduce performance and stability problems.

To that end, this project will only offer APIs that can be sensibly and cleanly built on top of Azure Blob Storage APIs. We recognize that this will leave some scenarios unsupported indefinitely, but we would rather offer a product that works predictably and reliably in its well defined scenarios than eagerly support all possible scenarios at the expense of quality. Even still, supporting some fundamentally required use cases, such as directories, can result in unexpected behavior due to the difference between blob storage and a file system. The javadocs on each type and method should therefore be read and understood for ways in which they may diverge from the standard specified by the JDK.

Moreover, even from within a given application, it should be remembered that using a remote FileSystem introduces higher latency. Because of this, particular care must be taken when managing concurrency. Race conditions are more likely to manifest, network failures occur more frequently than disk failures, and other such distributed application scenarios must be considered when working with this FileSystem. While the AzureFileSystem will ensure it takes appropriate steps towards robustness and reliability, the application developer must also design around these failure scenarios and have fallback and retry options available.

The view of the FileSystem from within an instance of the JVM will be consistent, but the AzureFileSystem makes no guarantees on behavior or state should other processes operate on the same data. The AzureFileSystem will assume that it has exclusive access to the resources stored in Azure Blob Storage and will behave without regard for potential interfering applications.

Finally, this implementation has currently chosen to always read/write directly to/from Azure Storage without a local cache. Our team has determined that with the tradeoffs of complexity, correctness, safety, performance, debuggability, etc. one option is not inherently better than the other and that this choice most directly addresses the current known use cases for this project. While this has consequences for every API, of particular note is the limitations on writing data. Data may only be written as an entire file (i.e. random IO or appends are not supported), and data is not committed or available to be read until the write stream is closed.

Examples

The following sections provide several code snippets covering some of the most common Azure Storage Blob NIO tasks, including:

URI format

URIs are the fundamental way of identifying a resource. This package defines its URI format as follows:

The scheme for this provider is "azb", and the format of the URI to identify an AzureFileSystem is "azb://?endpoint=<endpoint>". The endpoint of the Storage account is used to uniquely identify the filesystem.

The root component, if it is present, is the first element of the path and is denoted by a ':' as the last character. Hence, only one instance of ':' may appear in a path string, and it may only be the last character of the first element in the path. The root component is used to identify which container a path belongs to.

All other path elements, including separators, are considered as the blob name. AzurePath#fromBlobUrl may be used to convert a typical http url pointing to a blob into an AzurePath object pointing to the same resource.

Create a FileSystem

Create a FileSystem using the shared key retrieved above.

Note that you can further configure the file system using constants available in AzureFileSystem. Please see the docs for AzureFileSystemProvider for a full explanation of initializing and configuring a filesystem

Map<String, Object> config = new HashMap<>();
String stores = "<container_name>,<another_container_name>"; // A comma separated list of container names
StorageSharedKeyCredential credential = new StorageSharedKeyCredential("<account_name", "account_key");
config.put(AzureFileSystem.AZURE_STORAGE_SHARED_KEY_CREDENTIAL, credential);
config.put(AzureFileSystem.AZURE_STORAGE_FILE_STORES, stores);
FileSystem myFs = FileSystems.newFileSystem(new URI("azb://?endpoint=<account_endpoint"), config);

Create a directory

Create a directory using the Files api

Path dirPath = myFs.getPath("dir");
Files.createDirectory(dirPath);

Iterate over directory contents

Iterate over a directory using a DirectoryStream

for (Path p : Files.newDirectoryStream(dirPath)) {
    System.out.println(p.toString());
}

Read a file

Read the contents of a file using an InputStream. Skipping, marking, and resetting are all supported.

Path filePath = myFs.getPath("file");
try (InputStream is = Files.newInputStream(filePath)) {
    is.read();
}

Write to a file

Write to a file. Only writing whole files is supported. Random IO is not supported. The stream must be closed in order to guarantee that the data is available to be read.

try (OutputStream os = Files.newOutputStream(filePath)) {
    os.write(0);
}

Copy a file

Path destinationPath = myFs.getPath("destinationFile");
Files.copy(filePath, destinationPath, StandardCopyOption.COPY_ATTRIBUTES);

Delete a file

Files.delete(filePath);

Read attributes on a file

Read attributes of a file through the AzureBlobFileAttributes.

AzureBlobFileAttributes attr = Files.readAttributes(filePath, AzureBlobFileAttributes.class);
BlobHttpHeaders headers = attr.blobHttpHeaders();

Or read attributes dynamically by specifying a string of desired attributes. This will not improve performance as a call to retrieve any attribute will always retrieve all of them as an atomic bulk operation. You may specify "*" instead of a list of specific attributes to have all attributes returned in the map.

Map<String, Object> attributes = Files.readAttributes(filePath, "azureBlob:metadata,headers");

Write attributes to a file

Set attributes of a file through the AzureBlobFileAttributeView.

AzureBlobFileAttributeView view = Files.getFileAttributeView(filePath, AzureBlobFileAttributeView.class);
view.setMetadata(Collections.emptyMap());

Or set an attribute dynamically by specifying the attribute as a string.

Files.setAttribute(filePath, "azureBlob:blobHttpHeaders", new BlobHttpHeaders());

Troubleshooting

When using the NIO implementation for Azure Blob Storage, errors returned by the service are manifested as an IOException which wraps a BlobStorageException having the same HTTP status codes returned for REST API requests. For example, if you try to read a file that doesn't exist in your Storage Account, a 404 error is returned, indicating Not Found.

Default HTTP Client

All client libraries by default use the Netty HTTP client. Adding the above dependency will automatically configure the client library to use the Netty HTTP client. Configuring or changing the HTTP client is detailed in the HTTP clients wiki.

Default SSL library

All client libraries, by default, use the Tomcat-native Boring SSL library to enable native-level performance for SSL operations. The Boring SSL library is an uber jar containing native libraries for Linux / macOS / Windows, and provides better performance compared to the default SSL implementation within the JDK. For more information, including how to reduce the dependency size, refer to the performance tuning section of the wiki.

Continued development

This project is still actively being developed in an effort to move from preview to GA. Below is a list of features that are not currently supported but are under consideration and may be added before GA. We welcome feedback and input on which of these may be most useful and are open to suggestions for items not included in this list. While all of these items are being considered, they have not been investigated and designed and therefore we cannot confirm their feasibility within Azure Blob Storage. Therefore, it may be the case that further investigation reveals a feature may not be possible or otherwise may conflict with established design goals and therefor will not ultimately be supported.

  • Symbolic links
  • Hard links
  • Hidden files
  • Random writes
  • File locks
  • Read only files or file stores
  • Watches on directory events
  • Support for other Azure Storage services such as ADLS Gen 2 (Datalake) and Azure Files (shares)
  • Token authentication
  • Multi-account filesystems
  • Delegating access to single files
  • Normalizing directory structure of data upon loading a FileSystem
  • Local caching
  • Other OpenOptions such as append or dsync
  • Flags to toggle certain behaviors such as FileStore (container) creation, etc.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Impressions