how can we perform the Data Quality checks for null values in multiple files

vasa chakradhar 0 Reputation points
2025-01-15T10:18:18.8566667+00:00

I have 69 files in my gen2 and In those file there are null values, I want to send the null values to one folder and qualified data into another folder in gen2.
the problem is we cannot give the column names manually for each file as I have 69 files how can we do it?

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
3,036 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,120 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 27,601 Reputation points
    2025-01-15T12:36:20.98+00:00

    Since you can't specify column names manually for all 69 files, you can use mapping data flow or data transformations in ADF.

    You can create a pipeline where you use the Get Metadata activity to retrieve the list of files from the source folder in your ADLS Gen2.

    Configure the Field List property to include Child Items in this way you will have the list of all files in the directory.

    Then add a For Each activity to iterate through the files obtained from the Get Metadata activity.

    Within the loop, use a dynamic expression to pass the file path and name to subsequent activities.

    Inside the loop, add a Data Flow Activity:

    1. Source Transformation:
      • Use the file path dynamically for the source dataset.
      • Enable Schema Drift to infer schema dynamically (avoiding manual column specification).
    2. Derived Column (Optional):
      • Add a column that flags rows with null values in any field.
    3. Filter Transformation:
      • Create two filters:
      • Rows with null values in any column.
      • Rows without null values.
    4. Sink Transformation:
      • Route the two filtered outputs to different sinks:
      • One sink for files with null values.
      • Another sink for qualified files.

    In the sink configurations you can use expressions to dynamically create output file paths based on whether the data passed the null check or not.

    Qualified Data Path: <destination-folder>/qualified/<file-name>

    Null Data Path: <destination-folder>/null/<file-name>


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.