After long running synapse spark notebook failed with HTTP status code: 400 error

heta desai 357 Reputation points
2022-09-30T07:21:37.717+00:00

Hi,

I have created synapse notebook in which using pyspark I am trying to join multiple delta lake tables and writing it to Azure SQL table. The no. of records in delta lake table are 142 million. Executing notebook from synapse pipeline and it is giving below error:

{
"errorCode": "BadRequest",
"message": "Operation on target Write_AzureSQL failed: InvalidHttpRequestToLivy: Submission failed due to error content =[\"requirement failed: Session isn't active.\"] HTTP status code: 400. Trace ID: 9f6d18b6-3af0-432c-a5a2-1339fa4c55c7.",
"failureType": "UserError",
"target": "Phase 3",
"details": ""
}

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,098 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 26,451 Reputation points MVP
    2022-09-30T07:31:32.56+00:00

    Hi

    Thanks for reaching out to Microsoft Q&A.

    This is a spark error. Please see the below link, it has explained with settings that has to be done for avoiding this error.

    "livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution:
    An error was encountered:
    Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]

    https://towardsdatascience.com/how-to-set-up-a-cost-effective-aws-emr-cluster-and-jupyter-notebooks-for-sparksql-552360ffd4bc

    Please Upvote and Accept as answer if the reply was helpful, this will be helpful to other community members.

    1 person found this answer helpful.

  2. Daniel Wang 0 Reputation points
    2024-12-30T05:18:26.16+00:00

    From the article:

    [{"classification":"spark","properties":{"maximizeResourceAllocation":"true"}},{"classification":"spark-defaults","properties":{"spark.network.timeout":"1500"}},{"classification":"hdfs-site","properties":{"dfs.replication":"2"}},{"classification":"livy-conf","properties":{"livy.server.session.timeout":"10h"}},{"classification":"emrfs-site","properties":{"fs.s3.maxConnections":"100"}}]

    explanation:

    "maximizeResourceAllocation":"true" -- Configures your executors to utilize the maximum resources possible on each node in a cluster. This EMR-specific option calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information. [reference]"livy.server.session.timeout":"10h" -- addresses errors from long-running Spark tasks in a Jupyter/EMR notebook that die after an hour of execution: 
    An error was encountered:
    Invalid status code '400' from https://xxx.xx.x.xxx:18888/sessions/0/statements/20 with error payload: "requirement failed: Session isn't active." [reference]"fs.s3.maxConnections":"100" -- addresses the “Timeout waiting for connection from pool” error [reference]
    

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.