Muokkaa

Jaa


Copy a blob with asynchronous scheduling using Python

This article shows how to copy a blob with asynchronous scheduling using the Azure Storage client library for Python. You can copy a blob from a source within the same storage account, from a source in a different storage account, or from any accessible object retrieved via HTTP GET request on a given URL. You can also abort a pending copy operation.

The client library methods covered in this article use the Copy Blob REST API operation, and can be used when you want to perform a copy with asynchronous scheduling. For most copy scenarios where you want to move data into a storage account and have a URL for the source object, see Copy a blob from a source object URL with Python.

Prerequisites

Set up your environment

If you don't have an existing project, this section shows you how to set up a project to work with the Azure Blob Storage client library for Python. For more details, see Get started with Azure Blob Storage and Python.

To work with the code examples in this article, follow these steps to set up your project.

Install packages

Install the following packages using pip install:

pip install azure-storage-blob azure-identity

Add import statements

Add the following import statements:

import datetime
from azure.identity import DefaultAzureCredential
from azure.storage.blob import (
    BlobServiceClient,
    BlobClient,
    BlobLeaseClient,
    BlobSasPermissions,
    generate_blob_sas
)

Authorization

The authorization mechanism must have the necessary permissions to perform a copy operation, or to abort a pending copy. For authorization with Microsoft Entra ID (recommended), the least privileged Azure RBAC built-in role varies based on several factors. To learn more, see the authorization guidance for Copy Blob (REST API) or Abort Copy Blob (REST API).

Create a client object

To connect an app to Blob Storage, create an instance of BlobServiceClient. The following example shows how to create a client object using DefaultAzureCredential for authorization:

# TODO: Replace <storage-account-name> with your actual storage account name
account_url = "https://<storage-account-name>.blob.core.windows.net"
credential = DefaultAzureCredential()

# Create the BlobServiceClient object
blob_service_client = BlobServiceClient(account_url, credential=credential)

You can also create client objects for specific containers or blobs, either directly or from the BlobServiceClient object. To learn more about creating and managing client objects, see Create and manage client objects that interact with data resources.

About copying blobs with asynchronous scheduling

The Copy Blob operation can finish asynchronously and is performed on a best-effort basis, which means that the operation isn't guaranteed to start immediately or complete within a specified time frame. The copy operation is scheduled in the background and performed as the server has available resources. The operation can complete synchronously if the copy occurs within the same storage account.

A Copy Blob operation can perform any of the following actions:

  • Copy a source blob to a destination blob with a different name. The destination blob can be an existing blob of the same blob type (block, append, or page), or it can be a new blob created by the copy operation.
  • Copy a source blob to a destination blob with the same name, which replaces the destination blob. This type of copy operation removes any uncommitted blocks and overwrites the destination blob's metadata.
  • Copy a source file in the Azure File service to a destination blob. The destination blob can be an existing block blob, or can be a new block blob created by the copy operation. Copying from files to page blobs or append blobs isn't supported.
  • Copy a snapshot over its base blob. By promoting a snapshot to the position of the base blob, you can restore an earlier version of a blob.
  • Copy a snapshot to a destination blob with a different name. The resulting destination blob is a writeable blob and not a snapshot.

The source blob for a copy operation may be one of the following types: block blob, append blob, page blob, blob snapshot, or blob version. The copy operation always copies the entire source blob or file. Copying a range of bytes or set of blocks isn't supported.

If the destination blob already exists, it must be of the same blob type as the source blob, and the existing destination blob is overwritten. The destination blob can't be modified while a copy operation is in progress, and a destination blob can only have one outstanding copy operation.

To learn more about the Copy Blob operation, including information about properties, index tags, metadata, and billing, see Copy Blob remarks.

Copy a blob with asynchronous scheduling

This section gives an overview of methods provided by the Azure Storage client library for Python to perform a copy operation with asynchronous scheduling.

The following methods wrap the Copy Blob REST API operation, and begin an asynchronous copy of data from the source blob:

The start_copy_from_url returns a dictionary containing copy_status and copy_id. The copy_status property is success if the copy completed synchronously or pending if the copy has been started asynchronously.

Copy a blob from a source within Azure

If you're copying a blob within the same storage account, the operation can complete synchronously. Access to the source blob can be authorized via Microsoft Entra ID, a shared access signature (SAS), or an account key. For an alterative synchronous copy operation, see Copy a blob from a source object URL with Python.

If the copy source is a blob in a different storage account, the operation can complete asynchronously. The source blob must either be public or authorized via SAS token. The SAS token needs to include the Read ('r') permission. To learn more about SAS tokens, see Delegate access with shared access signatures.

The following example shows a scenario for copying a source blob from a different storage account with asynchronous scheduling. In this example, we create a source blob URL with an appended user delegation SAS token. The example shows how to generate the SAS token using the client library, but you can also provide your own. The example also shows how to lease the source blob during the copy operation to prevent changes to the blob from a different client. The Copy Blob operation saves the ETag value of the source blob when the copy operation starts. If the ETag value is changed before the copy operation finishes, the operation fails.

def copy_from_source_in_azure_async(self, source_blob: BlobClient, destination_blob: BlobClient, blob_service_client: BlobServiceClient):
    # Lease the source blob during copy to prevent other clients from modifying it
    lease = BlobLeaseClient(client=source_blob)

    sas_token = self.generate_user_delegation_sas(blob_service_client=blob_service_client, source_blob=source_blob)
    source_blob_sas_url = source_blob.url + "?" + sas_token

    # Create an infinite lease by passing -1 as the lease duration
    lease.acquire(lease_duration=-1)

    # Start the copy operation - specify False for the requires_sync parameter
    copy_operation = dict()
    copy_operation = destination_blob.start_copy_from_url(source_url=source_blob_sas_url, requires_sync=False)
    
    # If start_copy_from_url returns copy_status of 'pending', the operation has been started asynchronously
    # You can optionally add logic here to wait for the copy operation to complete

    # Release the lease on the source blob
    lease.break_lease()

def generate_user_delegation_sas(self, blob_service_client: BlobServiceClient, source_blob: BlobClient):
    # Get a user delegation key
    delegation_key_start_time = datetime.datetime.now(datetime.timezone.utc)
    delegation_key_expiry_time = delegation_key_start_time + datetime.timedelta(hours=1)
    key = blob_service_client.get_user_delegation_key(
        key_start_time=delegation_key_start_time,
        key_expiry_time=delegation_key_expiry_time
    )

    # Create a SAS token that's valid for one hour, as an example
    sas_token = generate_blob_sas(
        account_name=blob_service_client.account_name,
        container_name=source_blob.container_name,
        blob_name=source_blob.blob_name,
        account_key=None,
        user_delegation_key=key,
        permission=BlobSasPermissions(read=True),
        expiry=datetime.datetime.now(datetime.timezone.utc) + datetime.timedelta(hours=1),
        start=datetime.datetime.now(datetime.timezone.utc)
    )

    return sas_token

Note

User delegation SAS tokens offer greater security, as they're signed with Microsoft Entra credentials instead of an account key. To create a user delegation SAS token, the Microsoft Entra security principal needs appropriate permissions. For authorization requirements, see Get User Delegation Key.

Copy a blob from a source outside of Azure

You can perform a copy operation on any source object that can be retrieved via HTTP GET request on a given URL, including accessible objects outside of Azure. The following example shows a scenario for copying a blob from an accessible source object URL.

def copy_from_external_source_async(self, source_url: str, destination_blob: BlobClient):
    # Start the copy operation - specify False for the requires_sync parameter
    copy_operation = dict()
    copy_operation = destination_blob.start_copy_from_url(source_url=source_url, requires_sync=False)
    
    # If start_copy_from_url returns copy_status of 'pending', the operation has been started asynchronously
    # You can optionally add logic here to wait for the copy operation to complete

Check the status of a copy operation

To check the status of an asynchronous Copy Blob operation, you can poll the get_blob_properties method and check the copy status.

The following code example shows how to check the status of a pending copy operation:

def check_copy_status(self, destination_blob: BlobClient):
    # Get the copy status from the destination blob properties
    copy_status = destination_blob.get_blob_properties().copy.status

    return copy_status

Abort a copy operation

Aborting a pending Copy Blob operation results in a destination blob of zero length. However, the metadata for the destination blob has the new values copied from the source blob or set explicitly during the copy operation. To keep the original metadata from before the copy, make a snapshot of the destination blob before calling one of the copy methods.

To abort a pending copy operation, call the following operation:

This method wraps the Abort Copy Blob REST API operation, which cancels a pending Copy Blob operation. The following code example shows how to abort a pending Copy Blob operation:

def abort_copy(self, destination_blob: BlobClient):
    # Get the copy operation details from the destination blob properties
    copy_status = destination_blob.get_blob_properties().copy.status
    copy_id = destination_blob.get_blob_properties().copy.id

    # Check the copy status and abort if pending
    if copy_status == 'pending':
        destination_blob.abort_copy(copy_id)
        print(f"Copy operation {copy_id} has been aborted")

Resources

To learn more about copying blobs with asynchronous scheduling using the Azure Blob Storage client library for Python, see the following resources.

Code samples

REST API operations

The Azure SDK for Python contains libraries that build on top of the Azure REST API, allowing you to interact with REST API operations through familiar Python paradigms. The client library methods covered in this article use the following REST API operations:

Client library resources

  • This article is part of the Blob Storage developer guide for Python. To learn more, see the full list of developer guide articles at Build your Python app.