Library management in Spark

Note

We will retire Azure HDInsight on AKS on January 31, 2025. Before January 31, 2025, you will need to migrate your workloads to Microsoft Fabric or an equivalent Azure product to avoid abrupt termination of your workloads. The remaining clusters on your subscription will be stopped and removed from the host.

Only basic support will be available until the retirement date.

Important

This feature is currently in preview. The Supplemental Terms of Use for Microsoft Azure Previews include more legal terms that apply to Azure features that are in beta, in preview, or otherwise not yet released into general availability. For information about this specific preview, see Azure HDInsight on AKS preview information. For questions or feature suggestions, please submit a request on AskHDInsight with the details and follow us for more updates on Azure HDInsight Community.

The purpose of Library Management is to make open-source or custom code available to notebooks and jobs running on your clusters. You can upload Python libraries from PyPI repositories. This article focuses on managing libraries in the cluster UI. Azure HDInsight on AKS already includes many common libraries in the cluster. To see which libraries are included in HDI on AKS cluster, review the library management page.

Install libraries

You can install libraries in two modes:

  • Cluster-installed
  • Notebook-scoped

Cluster Installed

All notebooks running on a cluster can use cluster libraries. You can install a cluster library directly from a public repository such as PyPi. Upload from Maven repositories, upload custom libraries from cloud storage are in the roadmap.

Screenshot showing cluster installed library manager page.

Notebook-scoped

Notebook-scoped libraries, available for Python and Scala, which allow you to install libraries and create an environment scoped to a notebook session. These libraries don't affect other notebooks running on the same cluster. Notebook-scoped libraries don't persist and must be reinstalled for each session.

Note

Use notebook-scoped libraries when you need a custom environment for a specific notebook.

Modes of Library Installation

PyPI: Fetch libraries from open source PyPI repository by mentioning the library name and version in the installation UI.

View the installed libraries

  1. From Overview page, navigate to Library Manager.

    Screenshot showing library manager page.

  2. From Spark Cluster Manager, click on Library Manager.

  3. You can view the list of installed libraries from here.

    Screenshot showing how to view installed libraries.

Add library widget

PyPI

  1. From the PyPI tab, enter the Package Name and Package Version..

  2. Click Install.

    Screenshot showing how to install PyPI.

Uninstalling Libraries

If you decide not to use the libraries anymore, then you can easily delete the libraries packages through the uninstall button in the library management page.

  1. Select and click on the library name

    Screenshot showing how to select library.

  2. Click on Uninstall in the widget

    Screenshot showing how to uninstall library.

    Note

    • Packages installed from Jupyter notebook can only be deleted from Jupyter Notebook.
    • Packages installed from library manager can only be uninstalled from library manager.
    • For upgrading a library/package, uninstall the current version of the library and reinstall the required version of the library.
    • Installation of libraries from Jupyter notebook is particular to the session. It is not persistent.
    • Installing heavy packages may take some time due to their size and complexity.