How to read multiple files from different folders in azure data flow?

Sulagno Roy 40 Reputation points
2023-05-26T05:30:51.0533333+00:00

I have multiple folders from different source systems in my landing area of ADLS GEN2. Let's say I have folders named as X,Y,Z. Inside X, I again have subfolders based on the tables from where the data has been extracted. For example X has 3 subfolders named 1A,2A,3A.(based on the table names) Like wise I have the same subfolders for Y and Z.

Now I want to do same sort of transformation in AZURE DATAFLOW on 1A, 2A,3A and the subfolders of Y and Z before finally merging all of them.

Is there any way where I can have a generic activity where I can do the required transformations instead of creating different sources for individual subfolders?

For example 1A,2A,3A and subfolders or Y and Z will have the same sort of transformation. I don't want to have individual activities for the transformation. Probably we can send some parameterized values from ADF pipeline or something like that.

Please let me know.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,510 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,044 questions
{count} votes

Accepted answer
  1. AnnuKumari-MSFT 33,976 Reputation points Microsoft Employee
    2023-05-29T11:41:40.4366667+00:00

    Hi Sulagno Roy ,

    Welcome to Microsoft Q&A platform and thanks for posting your question here.

    As per my understanding, you want to perform data transformation for multiple files present in different folders using same dataflow. Please let me know if that is not the correct understanding.

    You can use a single data flow activity to perform the same transformation on multiple files from different folders. To do that, you can use the wildcard paths in the source transformation to specify the pattern of the folder path you want to process. For example, you can use the pattern X/*/*A to process all subfolders of X that end with A. Similarly, you can use the pattern Y/*/*A and Z/*/*A to process all subfolders of Y and Z that end with A.

    You can pass the folder names as parameters to the pipeline and use them in the wildcard paths. For example, you can define a parameter folderName in the pipeline and use it in the wildcard path as @concat(folderName, '/*/*A'). This way, you can reuse the same data flow activity for multiple subfolders.

    After processing the subfolders, you can use the union transformation to merge the output of the data flow activity.

    Hope it helps. Please accept the answer by clicking on Accept answer button , or else revert back with follow up query. Thankyou

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.