Azure Data Factory skips column when reading data from Azure Table Storage (Copy Activity)

Lim, Alwyn John 0 Reputation points
2025-01-29T21:09:36.32+00:00

Hi,

I have a use case wherein I want to export Azure Table Storage to CSV/Txt file using ADF. (Copy Activity)
However, I have an issue where it skips some of the columns if the first 10 data of the table are null on that specific column.

For Example:
I have a table with 100 rows of data and I have a ColumnA wherein it has data on some of the rows of those 100 rows of data, but the first 10 rows of data is null on that ColumnA.
What happens here is that ColumnA will be skipped during reading in ADF, even if ColumnA has data on the rest of the rows.

Is this expected because Table Storage is Schema less? Seems like ADF only checks that first 10 rows of the table to determine which column is included. Thus, causing potential issues of skipping some of the columns.

Is there a workaround for this?

I want to make sure all data is exported fully and does not skip any columns if at least 1 row has a data on that column.

Regards,
Alwyn

Azure Table Storage
Azure Table Storage
An Azure service that stores structured NoSQL data in the cloud.
176 questions
Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
11,206 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Chandra Boorla 7,810 Reputation points Microsoft Vendor
    2025-01-30T00:04:10.3733333+00:00

    @Lim, Alwyn John

    Greetings & Welcome to Microsoft Q&A forum! Thanks for posting your query!

    The behavior you are experiencing with Azure Data Factory when reading from Azure Table Storage is indeed related to the schema-less nature of Azure Table Storage. In Azure Table Storage, each entity (row) can have a different set of properties (columns), and ADF's Copy Activity may infer the schema based on the first few rows it reads. If the first 10 rows of a specific column are null, ADF may determine that the column does not exist and subsequently skip it during the export process.

    Explanation of the Issue

    Schema Inference - ADF uses a sample of the data (typically the first 10 rows) to infer the schema. If it finds that a column is null in those rows, it may decide that the column is not relevant and exclude it from the output schema.

    Schema-less Nature - Azure Table Storage is designed to be schema-less, meaning that different rows can have different sets of properties. This flexibility can lead to situations where some columns are not recognized if they don't appear in the initial sample.

    Is this expected because Table Storage is Schema less?

    This is a known limitation when dealing with sparse data in schema-less data stores like Table Storage. ADF's current schema inference mechanism isn't robust enough to handle cases where a column has predominantly null values at the beginning of the data. Unfortunately, there isn't a direct setting in the Copy Activity to force the inclusion of a column regardless of initial nulls.

    Is there a workaround for this?

    Data Transformation (Recommended) - The most reliable solution is to pre-process your data to ensure the column is present, even if initially filled with nulls. You can achieve this using a Data Flow activity before your Copy Activity.

    Lookup Activity (Optional but Recommended for Efficiency) - First, use a Lookup Activity to check if the column exists and if it's entirely null. This can help you avoid unnecessary processing if the column already has some values in the first few rows. You can look up a row where the column is expected to have a value.

    Data Flow Activity - Use a Data Flow activity to transform your Table Storage data. Within the data flow, you can use a Derived Column transformation. Create a new column (or use the existing one if the Lookup Activity confirmed its existence) and use a conditional expression. For example, if your column is called ColumnA, your expression could be: iif(isNull(ColumnA), null(), ColumnA)

    This will ensure that ColumnA is always part of the schema, even if it initially contains nulls. The Data Flow will effectively "materialize" the column.

    I hope this information helps. Please do let us know if you have any further queries.

    Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

    Thank you.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.