Jaa


How to determine if Spark is rewriting data

First open the SQL DAG for your write stage. Scroll up to the top of the job’s page and click on the Associated SQL Query:

Stage to SQL

You should now see the DAG. If not, scroll around a bit and you should see it:

SQL DAG

If you’re doing a Delete or Update operation, look at the amount of data being written by the writer versus what you expect. If you’re seeing a lot more data being written than you expect, you’re probably rewriting data:

Write Stats

If you’re doing a merge, the merge node has explicit statistics about how much data it’s rewriting.