How to read the last ingested date partitioned file in ADLS

Sulagno Roy 40 Reputation points
2023-05-26T05:37:49.1633333+00:00

Hi All,

How to read the last ingested file in ADLS.

ADLS folder structure: XXX/year=yyyy/month=MM/day=dd

Now in the day=dd I will have two different files on a given day. I want to read the last ingested file based on the timestamp as a source in my dataflow. Is there any way we can achieve this maybe by passing parameter from ADF pipeline with a foreach loop activity or any other easier way?

Thank you!

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,522 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,100 questions
{count} votes

Accepted answer
  1. QuantumCache 20,346 Reputation points
    2023-06-01T00:40:23.6733333+00:00

    Hello @Sulagno Roy Just checking if we are still connected on this discussion? Please let us know if you need to add more info so that we better assist you!

    Pre-req: Create 2 Source Datasets for the same Gen2 Folder with different names

    ds_Gen2Folder_FolderScan (Will be used in Step1)

    ds_Gen2Folder_FileScan Will be used in Step3)-->Create a Parameter for this Dataset called as 'FileName' (will be used in Step3)
    User's image

    Create 2 Global pipeline variables as shown below:
    User's image

    Step1: Use GetMetadata to Fetch all files inside the folder. Make sure to output the Field List as shown in the below image. ChildItems and LastModified
    I have used the Gen2 folder path as mentioned by your query.

    User's image

    Source Gen2 folder connection: ADLS folder structure: XX/year=yyyy/month=MM/day=dd
    User's image

    Step:2: Use For Each to Loop through each file and get its metadata again using the Get Metadata Activity. Use below expression for the 'Items' as shown in below image!

    @activity('GetMetadataOfAllFiles').output.childItems
    

    Make Sure the For-Each is set for Sequential processing!
    User's image

    Step3: Inside the ForEach, add GetMetadata and If-Condition activities as shown below

    Inside the ForEach-->GetMetadata
    Make sure to include the file name dynamically as shown below: and add the Field List as shown in below image.

    @item().name
    

    User's image

    Inside If-Condition: Use the below condition to check the Timestamp of each file and find out which timestamp is greater (Our Requirement for capturing the latest timestamp).

    @greater(activity('GetMetadataOfEachFile').output.lastModified,variables('varPreviousModifiedDateTime') )
    

    User's image

    Inside the If-Condition: Add 2 Set-Variable activities to the True Condition:

    @activity('GetMetadataOfEachFile').output.itemName
    
    @activity('GetMetadataOfEachFile').output.lastModified
    

    User's image

    Finally you can see the Variables will capture the FileName which is latest timestamp.

    User's image

    If the response is helpful, please click "Accept Answer" and upvote it. So that we can close this thread.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.