Failure starting repl. Try detaching and re-attaching the notebook.

Mark Lui 0 Reputation points
2024-11-20T13:22:33.03+00:00

I am using Azure Databricks for a couple of months already. I didn't change any code or configuration recently but all notebook are failed to run. The error is :

Failure starting repl. Try detaching and re-attaching the notebook.

It failed for simple "1+1" or "print('hello')" with same error message.

I have tried restart the cluster and recreate the cluster. The error is still the same. I create a new cluster with latest driver and higher memory still the same.

What else we could perform to troubleshoot? Why it sudden fail without change any settings?

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,236 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Sina Salam 12,816 Reputation points
    2024-11-20T16:59:34.18+00:00

    Hello Mark Lui,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    I understand that you are having error: Failure starting repl. Try detaching and re-attaching the notebook.

    Since you said you haven't changed any settings, and no user changes were made, automatic updates or changes in Azure Databricks infrastructure might have introduced this issue. This is not uncommon in managed services. I will suggest the Best Practice is to regularly test and validate configurations with every Databricks runtime update or when scaling workloads. Maintain documentation of cluster configurations to enable faster issue resolution.

    The lists below is generic to guide on how you can resolve the error:

    • If the notebook's state is corrupted, you might need detaching and reattaching of the notebook to the cluster often resolves such issues.
    • Confirm that the cluster has adequate CPU, memory, and disk resources for running the workload. If possible, temporarily upgrade to a larger instance.
    • Review the logs in the cluster UI under "Driver Logs" and "Event Logs" for errors or warnings and look for library dependency errors, memory allocation failures, or REPL-specific issues.
    • Use the %pip list command to verify installed library versions and make sure there are no conflicts, particularly with critical libraries like Pandas, NumPy, or PySpark. Reinstall problematic libraries if necessary using %pip uninstall and %pip install.
    • If initialization scripts are configured, disable them temporarily to rule out conflicts. Then, Go to the cluster configuration under "Advanced Options > Init Scripts" and uncheck any enabled scripts.
    • Make sure your cluster is using the latest Databricks runtime version, which may include bug fixes or compatibility updates.
    • Create a new notebook and run simple commands like 1+1 or print("hello") to determine if the issue is specific to the original notebook.
    • Make sure there are no firewall or connectivity issues between your Databricks workspace and the underlying infrastructure. For instance, verify that the workspace can access Azure Storage if required.
    • Temporarily create a minimal cluster configuration (e.g., a small, single-node cluster with no custom libraries) and test if the problem persists.

    If none of the above resolves the issue, submit a support request through the Azure portal. Provide logs and steps to reproduce the issue for faster assistance.

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.


    Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.