We want to loop all folder and file recursively of BLOB Storage for given container using python in azure data factory custom activity and write data in in blob in .CSV file

manish verma 501 Reputation points
2025-01-02T07:49:43.5366667+00:00

Hi All,

We want to loop all folder and file recursively of BLOB Storage for given container using python in azure data factory custom activity and write data in in blob in .CSV file.

also when we connect blob storage using ADF custom activity using SAS Key instead of blob access key

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,521 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,020 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,100 questions
{count} votes

1 answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 33,986 Reputation points Microsoft Employee
    2025-01-02T18:01:53.44+00:00

    Hi manish verma ,

    Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

    To use SAS key for connection with blob, kindly generate a SAS Token for your Blob Storage container with list and write permissions. Include the SAS token in the connection string for authentication.

    Here is the sample code you can try to iterate through the folders in blob :

    from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
    import csv
    import io
    
    # Parameters
    sas_token = "<YOUR_SAS_TOKEN>"  # Replace with your SAS token
    storage_account_url = "https://<YOUR_STORAGE_ACCOUNT_NAME>.blob.core.windows.net/"
    container_name = "<YOUR_CONTAINER_NAME>"
    output_blob_name = "output.csv"
    
    # Initialize BlobServiceClient
    blob_service_client = BlobServiceClient(account_url=storage_account_url, credential=sas_token)
    container_client = blob_service_client.get_container_client(container_name)
    
    # Function to list all blobs in the container recursively
    def list_blobs_recursively(container_client):
        blob_data = []
        for blob in container_client.list_blobs():
            blob_data.append({"name": blob.name, "size": blob.size, "last_modified": blob.last_modified})
        return blob_data
    
    # Fetch all blobs
    blobs = list_blobs_recursively(container_client)
    
    # Write blob details to a CSV
    output = io.StringIO()
    csv_writer = csv.DictWriter(output, fieldnames=["name", "size", "last_modified"])
    csv_writer.writeheader()
    csv_writer.writerows(blobs)
    
    # Upload CSV to Blob Storage
    blob_client = container_client.get_blob_client(output_blob_name)
    blob_client.upload_blob(output.getvalue(), overwrite=True)
    
    print(f"CSV file '{output_blob_name}' created successfully and uploaded to Blob Storage.")
    

    Hope it helps. Thankyou


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.