pre-data validation in Azure

azure_learner 340 Reputation points
2024-08-09T06:57:26.85+00:00

When the data is CSV or database we could do pre-data load validation such as number of rows, size of the tables and no of the table, etc but when the source is given as report as data and consumed by RaaS through API, how the pre-data load validation should be done to ensure the data loaded in ADLS is consistent to source data. Please suggest

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,485 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,843 questions
0 comments No comments
{count} votes

Accepted answer
  1. Pinaki Ghatak 4,610 Reputation points Microsoft Employee
    2024-08-09T08:21:37.7433333+00:00

    Hello @azure_learner

    Based on the information you provided, it seems like you are looking for a way to validate the data that is being loaded into ADLS from a report as data source, which is consumed by RaaS through API.

    One way to ensure the data loaded in ADLS is consistent with the source data is to perform data validation checks after the data has been loaded into ADLS.

    You can use tools like Azure Data Factory or Azure Databricks to perform data validation checks.

    For example, you can use Azure Data Factory to create a pipeline that loads the data from the report as data source into ADLS, and then performs data validation checks using activities like Data Flow or Databricks Notebook. These activities can be used to compare the data in ADLS with the source data and check for any discrepancies.

    Alternatively, you can also use Azure Databricks to perform data validation checks. You can write a script in Python or Scala that reads the data from ADLS and compares it with the source data.

    You can also use Databricks Delta Lake to perform data validation checks, which provides features like schema enforcement and data integrity checks.

    To summarize, you can perform data validation checks after the data has been loaded into ADLS using tools like Azure Data Factory or Azure Databricks.


    I hope that this response has addressed your query and helped you overcome your challenges. If so, please mark this response as Answered. This will not only acknowledge our efforts, but also assist other community members who may be looking for similar solutions.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. AnnuKumari-MSFT 33,401 Reputation points Microsoft Employee
    2024-08-13T08:05:23.2433333+00:00

    Hi azure_learner ,

    Thankyou for posting your query in Microsoft Q&A platform.

    I understand that you are trying to understand how the pre-data load validation should be done to ensure the data loaded in ADLS is consistent to source data.

    When the source data is a report consumed by API , it can be challenging to perform pre-data load validation to ensure that the data loaded in ADLS is consistent with the source data. However, there are several approaches that you can take to perform some level of validation:

    Before loading the data into ADLS, validate the API response to ensure that it contains the expected data. You can use tools like Postman or Fiddler to inspect the API response and verify that it contains the expected data .

    Additionally, Validate the data schema of the API response to ensure that it matches the expected schema.

    Hope it helps. Kindly accept the answer by clicking on Accept answer button. Thankyou

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.