fastest way to rename/copy 4 millions files from one azure blob storage to another (or within same) with different destination folders and file names from source (based on a mapping file or table records for each file)

Keyur Kachhadia 0 Reputation points
2025-02-21T20:06:22.6466667+00:00

we need to copy/rename more then 4 million small files in Azure blob storage.

we have the source folder and file name and mapping to target folder and file name in a table or can have it in mapping text file as well.

we already have the source files in Azure blob storage.

We have an data migration activity for an application post which this mapping info will be generated. thus before actual go live, in case its faster to copy within same blob, then we can precopy the source files as is in target blob. However the actual mapping and hence the renaming or copying activty can happen only after migration is done and for which we have limited time window of go live.

what could be the fastest way to do this copy/rename.

currenly I am evaluating to create and run a shell script with 4 million azcopy commands. and spilt it into 100s of files and run them in parallel. I havent tested but i have a feeling issuing 4 million times commands which involves connections and authentications strings, will be definitely slow.

Is there a better and faster way like data factory etc.

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,101 questions
{count} votes

3 answers

Sort by: Most helpful
  1. Syed Aaqid Ali 180 Reputation points Microsoft Vendor
    2025-02-22T00:33:07.1+00:00

    Hi Keyur Kachhadia,

    The approach you're considering—using shell scripts with azcopy commands—can work, but you need to consider Optimize the performance of AzCopy with Azure Storage.

    See alternative options as well here:

    https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage?tabs=data-factory

    https://learn.microsoft.com/en-us/azure/data-factory/quickstart-hello-world-copy-data-tool


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.            User's image

    If you have any other questions or are still running into more issues, let me know in the "comments" and I would be happy to help you.


  2. Keyur Kachhadia 0 Reputation points
    2025-02-26T13:59:22.7733333+00:00

    For the benifits of others, who has similar case sharing my observation in case usefull.

    1. data factory method:
      1. extremly costly. see on of my other comment
    2. Creating a 100s of shell script with total of 4 million commands and running 100s of shell scripts in parallel from a VM in azure:
      1. tested for a small batch of files using az commands in a shell script. it generated lots of overheads in terms of logging and generating job plan etc.. over all approx 5-7 sec for each transfer (avg file size was between few kb and a mb)
    3. Download all data to disk P80 on the VM in azure and rename and upload to blob again:
      1. downloaded with az copy the full directory (approx 8 commands for 8 directories) - it downloaded 40 million files, 7 TB in apporx 4 hrs.
      2. rename the required 4 million files in seperate directory - approx 1 hr with 20 parallel rename scripts
      3. upload the renamed files to target blob storage - hr
      4. seems like we will use this method during our go live window. at least the bullet "a" can be pre-done and further saves time.

    I know its lame, but seems like downloading to a disk and uploading it again seemed to be only feasible solution which is economical and fast. Complete apposite to what i had assumed when i started to address this task.

    0 comments No comments

  3. Venkatesan S 420 Reputation points Microsoft Vendor
    2025-02-26T14:12:02.7166667+00:00

    @Keyur Kachhadia

    I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others", I'll repost your solution in case you'd like to "Accept" the answer.

    Issue:

    currenly you are evaluating to create and run a shell script with 4 million azcopy commands. and spilt it into 100s of files and run them in parallel. I havent tested but i have a feeling issuing 4 million times commands which involves connections and authentications strings, will be definitely slow. Is there a better and faster way like data factory etc.

    Solution:
    For the benefits of others, who has similar case sharing my observation in case useful.

    1. data factory method:
      1. extremely costly. see one of my other comments.
    2. Creating a 100s of shell script with total of 4 million commands and running 100s of shell scripts in parallel from a VM in azure:
      1. tested for a small batch of files using az commands in a shell script. it generated lots of overheads in terms of logging and generating job plan etc.. over all approx 5-7 sec for each transfer (avg file size was between few kb and a mb)
    3. Download all data to disk P80 on the VM in azure and rename and upload to blob again:
      1. downloaded with az copy the full directory (approx 8 commands for 8 directories) - it downloaded 40 million files, 7 TB in apporx 4 hrs.
      b. rename the required 4 million files in seperate directory - approx 1 hr with 20 parallel rename scripts c. upload the renamed files to target blob storage - hr d. seems like we will use this method during our go live window. at least the bullet "a" can be pre-done and further saves time.I know its lame but seems like downloading to a disk and uploading it again seemed to be only feasible solution which is economical and fast. Complete apposite to what i had assumed when i started to address this task.

    Thank you again for your time and patience throughout this issue.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.