Azure Data Factory: I'm trying to understand how to get MetaData information from CSV Files used in a Data Flow to get only the latest file created?

Marty Scherr 20 Reputation points
2025-03-05T18:20:17.9666667+00:00

I am using the Filter by last modified criteria in the Copy Data task with Azure Data Factory to determine the latest data file (one file created per day). The challenge I'm facing is that the file in Azure are calculated in UTC time but the files created at in EST (Eastern Time). The process works as long as the latest file is uploaded to Azure at a consistent time each day.

What I would prefer to be able to read the Azure folder and order the files by date descending order and just import the latest file that way. I haven't been able to figure out how to accomplish this preferred method. Just curious if anyone has determined a way to do that? Any advice would be greatly appreciated.

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,323 questions
0 comments No comments
{count} votes

Accepted answer
  1. Suwarna S Kale 1,186 Reputation points
    2025-03-05T19:59:07.9466667+00:00

    Hello Marty Scherr,

    Thank you for posting your question in the Microsoft Q&A forum.

    To address your challenge of importing the latest file from an Azure folder based on the file's date in Eastern Time (EST), you can use Azure Data Factory (ADF) to list files in the folder, sort them by their last modified timestamp, and then process the latest file. While ADF does not natively support sorting files by last modified timestamp in a descending order, you can achieve this by combining ADF with Azure Functions or a custom logic in a pipeline.

    By combining the Get Metadata activity with Azure Functions or custom pipeline logic, you can sort files by their last modified timestamp and process the latest file. This approach addresses the time zone difference issue and ensures that the correct file is processed regardless of the upload time. If you prefer a serverless solution, Azure Functions is the recommended approach. Otherwise, you can implement custom logic within the pipeline using ADF activities.

    Another way, if you prefer not to use Azure Functions, implement custom logic within the pipeline using ADF's ForEach and Set Variable activities. Initialize a variable to store the latest file name, iterate through the file list, and update the variable if a newer file is found.

    You may refer below related documentation:

    https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity

    https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity

    If the above answer helped, please do not forget to "Accept Answer" as this may help other community members to refer the info if facing a similar issue. Your contribution to the Microsoft Q&A community is highly appreciated.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Marty Scherr 20 Reputation points
    2025-03-06T16:17:25.7066667+00:00

    I figured out a solution using the Filter by Date criteria of the Source Dataset. What I hadn't accounted for was the correct calculation of from Eastern Time to UTC Time. I adjusted the trigger executing my pipeline to a UTC time that would accommodate the time when the files get uploaded to Azure.

    Having said I did research the other methods suggested in the Answer provided, but I came to the conclusion that I get needed to get the date criteria correct knowing that the files are uploaded daily at the same time.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.