Synapse notebook script runs ok but stops on time out and stuck in queued in pipelines

Question

I have a python script in a notebook that performs a parquet file schema correction. It is working fine, runs in less than 10 seconds, depending on the number of files to process.

Now I need to run it in a pipeline that correct schemas and then send the data to another place. So I used a Notebook activity and linked it to the notebook I already have and configure the pool in the same way.
User's image

And running it lasts more than 30 minutes and the Spark application appears to be stucked in QUEUED state. I guess it will time out some day. I actually didn't allow it to time out, 30 minutes for a script that runs under 10 seconds is clear sign of something wrong.

I run the notebook directly again, and it goes along the states pretty well. however, even when the script finishes, it is still showing as running in the apache spark applications page, and it keeps running until it is stopped because of a time out. "this application failed due to the total number of errors:1.

Error details This application failed due to the total number of errors: 1. Error code 1 LIVY_JOB_TIMED_OUT Message Job failed during run time with state=[dead]. Source Unknown

I don't know what needs to be done for the job to finish when the script runs and prevent the application to time out. May be that has something to do with the behaviour of the pipeline? How can I move forward and make the notebook run correctly alone and in the pipeline?

Answer

Hello Ricker Silva,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that your Synapse notebook script runs ok but stops on time out and stuck in queued in pipelines.

Since the notebook does not explicitly stop the Spark session, causing it to remain active in the Spark UI until a timeout occurs. I will suggest, ensure that spark.stop() is called at the end of the script and configure the Notebook Activity to handle session termination properly. For an example mssparkutils.session.stop() This will end the session / spark application and release the resources.

from pyspark.sql import SparkSession
# Get the active Spark session
spark = SparkSession.builder.getOrCreate()
# Stop the session explicitly
spark.stop()

This will ensure that the Spark session is closed when the notebook completes execution.

Also, you can review Livy session settings, timeout configurations, and resource allocation in the Spark pool. The link below should help you for more details: https://faizchachiya.medium.com/how-to-handle-azure-databricks-and-synapse-session-timeout-issues-bce25ef719a4

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Synapse notebook script runs ok but stops on time out and stuck in queued in pipelines

1 answer

Your answer