Data flow gives different output in debug and trigger.

Bansal, Nimish 60 Reputation points
2025-03-11T13:16:38.3566667+00:00

I have a few dataflows which uses sorter and aggregator. I sort my data on a column, then drop duplicates using aggregate transformation. I am using last($$) to select the last occurrence of the primary key. On using data preview or executing pipeline using debug mode, it gives the expected output. However, when I am triggering the pipeline for same dataset, I am not getting expected output. The data flow is not returning the last row, it appears to return a random row. I am not using any joins or anything else. My data is read from adls, it gets sorted, duplicates are removed and some columns are modified using date functions. Nothing else is being done.

Any suggestions on what I might be doing wrong?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,339 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Alex Burlachenko 1,755 Reputation points
    2025-03-11T14:19:51.73+00:00

    Dear Nimish,

    The issue you’re facing is likely due to inconsistent sorting in distributed processing.

    Add a Sort Transformation, explicitly sort your data by the primary key before using last($$ in the Aggregate Transformation. Use Single Partitioning, in the Optimize Tab, set partitioning to Single Partition to ensure consistent results.

    just in case - Azure Data Factory documentation.

    Best regards,
    Alex

    p.s. If you found the answer helpful, please click on Upvote and Accept Answer. This will help other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.