Synapse Spark shared Metadata not syncing to serverless sql

Finn Schmidt 86 Reputation points
2024-08-02T08:20:12.4366667+00:00

I have created a spark delta table using


create table if not exists db_name.table_name(
    id int,
    propertyA string
)
using delta


and then later on changed a column name:


alter table db_name.table_name 
RENAME COLUMN propertyA TO propertyB

This change is reflected immediately when i query the delta table within the spark context.

However, when I try to query the table using the serverless sql pool, the new column name has not been synced, and the old column name is still being displayed. Obviously the leads to the

Invalid column name 'propertyB'

error when explicitly including the new column name in the select statement.

Based on the documentation at https://learn.microsoft.com/en-us/azure/synapse-analytics/metadata/overview

in particular the section stating

If a metadata object is deleted or changed with Spark, the changes are picked up and propagated to the serverless SQL pool. Synchronization is asynchronous and changes are reflected in the SQL engine after a short delay.

I would have expected the change to have been propagated by now (it's been more than 12 hours). Is my understanding of the documentation correct, and the external table in the serverless sql pool should reflect the name change? Are some additional steps neccessary to make this happen

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,157 questions
0 comments No comments
{count} votes

Accepted answer
  1. Vinodh247 27,281 Reputation points MVP
    2024-08-05T00:58:31.4266667+00:00

    Hi Finn Schmidt,

    Thanks for reaching out to Microsoft Q&A.

    Did you try refreshing the table metadata?

    REFRESH TABLE db_name.table_name;

    According to the documentation you referenced, changes made to Delta tables using Spark should be propagated to the serverless SQL pool after a short delay. However, this does not seem to be the case for column renames or additions when using ALTER TABLE statement.

    https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand?wt.mc_id=knwlserapi_inproduct_azportal&tabs=x80070002#lake-database

    The workaround is to avoid using ALTER TABLE statements to modify Delta tables that are accessed via the serverless SQL pool. Instead, you can use the mergeSchema option when writing the DataFrame to the Delta table to evolve the schema. Alternatively, you can drop and re-create the Delta table in Spark if possible.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.