Integer typed column with NULL values in Analytical Store getting converted to 0 in Azure Synapse

Nandu S Raj 80 Reputation points
2024-12-09T17:02:57.8666667+00:00

I have a container A- container A has a column colA with integer as it's datatype in the AnalyticalStore , how do I know that? When I import the projection for the source it get's imported as integer. But I have overwritten the schema and set colA to string. The value for colA is null in the document. Now when I run Azure Synapse pipeline to load the table , value for colA is loaded as 0, even though it is null.

I tried executing the following script in serverlesspool with hopes of recreating this but it returned null , which is what the expected behavior is I assume.

select *
FROM OPENROWSET( 
                PROVIDER = 'CosmosDB', 
                CONNECTION = 'connection',
                OBJECT = 'A', 
                CREDENTIAL = 'credential' 
)WITH (colA varchar(10))  AS ContainerAWithcolA
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,123 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,720 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 27,531 Reputation points
    2024-12-09T18:43:09.9966667+00:00

    It is either a data type mismatch or default value behavior in Synapse.

    If the Analytical Store in Cosmos DB has colA as an integer, any NULL value in the document might be treated as 0 when mapped to an integer column in Synapse.

    This happens because the default value for integers in Synapse is 0.

    When you overwrite the schema in Synapse to varchar (string) but the original datatype in Analytical Store is integer, inconsistencies may arise if Synapse still interprets the underlying data as integer.

    You can modify your pipeline to explicitly cast colA as varchar in your SQL or pipeline transformation logic before it gets written to Synapse.

    When reading the data, apply a SQL transformation to handle NULL values :

    SELECT 
        CASE WHEN colA IS NULL THEN NULL ELSE CAST(colA AS VARCHAR(10)) END AS colA
    FROM OPENROWSET(
        PROVIDER = 'CosmosDB',
        CONNECTION = 'connection',
        OBJECT = 'A',
        CREDENTIAL = 'credential'
    ) AS SourceData
    

    Or use a Synapse Dataflow or SQL script to explicitly handle NULL values for integer columns :

    SELECT 
        CASE WHEN colA = 0 THEN NULL ELSE colA END AS colA
    FROM TargetTable
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.