how to effectively Unzip the files in Blob Container and use it Dataflow directly?

muntazir abbas 65 Reputation points
2024-02-06T13:15:32.6833333+00:00

Screenshot 2024-02-06 140427

The Problem Statement: I have session logs in session.zip which are stored in a Azure Blob Storage. These logs are structured and in are in JSON format. These logs contains some critical information that needs to be filtered before we ingest it to another blob storage in different azure subscription. I have implemented Dataflow filter logic that does the critical information filtering on files but my challenge is to unzip first and then read "dd-mm-yyy-sessionID.json" files in that zip and process them as soon as they arrive in the blob container. It seems unzip is only possible in Copy Activity not in DataFlow directly or am I missing something. Now is it possible to achieve the desired results using dataflow only or unzipping with Copy Activity will be the only solution? or anything way of achieving the desired results? Many regards,

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,049 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,171 questions
0 comments No comments
{count} votes

Accepted answer
  1. Anand Prakash Yadav 7,815 Reputation points Microsoft Vendor
    2024-02-07T10:46:42.9333333+00:00

    Hello muntazir abbas,

    Thank you for posting your query here!

    Adding on to the previous response, since Azure Data Factory Data Flow does not directly support processing binary data, you would need to use a Copy Data Activity first to unzip the files and then call the Data Flow activity to process the desired JSON files.

    You may create two folders at the storage container. One folder will have the zip folder files in compressed format and the sink folder will store the copied data.

    1. Create a Linked Service as a connection string to establish the connection from the source and target.
    2. Create a dataset as CSV to point to the source and the target.
      User's image

    (SOURCE DATASET)

    Note: Mark the source compression type based on the compressed file.

    User's image

    (TARGET DATASET)

    3. Select the copy activity to copy data from the source to the destination.
    User's image

    (AT THE SOURCE COPY ACTIVITY)

    User's image

    (AT THE SINK COPY ACTIVITY)

    User's image

    (COPY ACTIVITY)
    User's image

    (COPIED DATA AT THE SINK FOLDER)

    Apart from this, you can create an Azure Function with blob trigger, and the function would need to handle the logic to unzip the zip file. This solution is dependent on your code.

    Or you can create an Azure Logic App, with Blob trigger, then use connectors to unzip the zipped files.

    Reference: https://www.frankysnotes.com/2019/02/how-to-unzip-automatically-your-files.html

    And then configure the Data Flow Activity to process the unzipped JSON files from the folder where they were extracted by the previous Copy Data Activity.

    Please let us know if you have any further queries. I’m happy to assist you further.  

    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members. 


1 additional answer

Sort by: Most helpful
  1. Harun Raseed Basheer 160 Reputation points MVP
    2024-02-06T17:06:24.01+00:00

    Hi muntazir abbas,
    To unzip in Copy Data Activity, you will choose the Dataset as Binary type and use the sessions.zip as your input file. But in case of dataflow it is meant for data transfomation and the Binary format is not supported over there.

    As per Microsoft Document https://learn.microsoft.com/en-us/azure/data-factory/format-binary You can use Binary dataset in Copy activity, GetMetadata activity, or Delete activity. You can see in the below attached image the Binary format is not available for Dataflow Dataset. User's image

    One possible way to achieve the solution is to use a Copy Data Activity first to Unzip files and keep it in a folder and then call the DataFlow activity to process your desired json files. -- Hope this helps, Kindly consider hitting Accept Answer button.

    1 person found this answer helpful.

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.