Apache Spark SQL Query Returns No Results for a Column in Azure Synapse Notebook.

madmax 0 Reputation points
2025-02-20T21:18:01.15+00:00

I'm running an Apache Spark SQL query in an Azure Synapse Notebook to retrieve data from an Azure Synapse table.

%%pyspark
df = spark.sql(""" 
    SELECT scheduledend 
    FROM `database`.`email` 
    WHERE scheduledend IS NOT NULL
""")
df.show(truncate = False)

The Pyspark query is returning zero rows.

Datatype of the column in database is datetime2, I tried casting column to different datatypes with in pyspark code.

However, the query works fine when queried directly in the database (in SSMS by connecting to synapse) and returns the expected results (4569 rows).

SELECT scheduledend FROM database.email WHERE scheduledend IS NOT NULL
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,200 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Chandra Boorla 8,795 Reputation points Microsoft Vendor
    2025-02-21T04:00:46.19+00:00

    Hi @madmax

    Thank you for posting your query!

    As I understand you are encountering an issue where your Apache Spark SQL query in Azure Synapse Notebook is returning zero rows, even though the same query works fine in SSMS.

    Here are some considerations that might help you.

    The Table is in a Dedicated SQL Pool - If the email table is in a Dedicated SQL Pool, Spark cannot query it directly. Instead, use a JDBC connection.

    df = spark.read \
        .format("jdbc") \
        .option("url", "jdbc:sqlserver://<your-synapse-workspace>.database.windows.net:1433;database=<your_database>") \
        .option("dbtable", "dbo.email") \
        .option("user", "<your_username>") \
        .option("password", "<your_password>") \
        .load()
    
    df.show()
    

    Database Context Mismatch - Your Synapse Notebook may not be connected to the correct database. Try explicitly setting the database before running the query.

    spark.sql("USE your_database")
    

    Also, confirm the correct schema and use the fully qualified table name.

    df = spark.sql(""" 
        SELECT scheduledend 
        FROM `your_database`.`dbo`.`email` 
        WHERE scheduledend IS NOT NULL
    """)
    df.show(truncate = False)
    

    Check for Data Partitioning or Filters - If the data is partitioned or if there are any filters applied in the Spark environment, it might lead to zero results. Ensure that there are no additional filters or partitioning that could be affecting the results.

    I hope this information helps. Please do let us know if you have any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.