Use the Livy API to submit and execute Spark jobs

Note

The Livy API for Fabric Data Engineering is in preview.

Applies to: ✅ Data Engineering and Data Science in Microsoft Fabric

Get started with Livy API for Fabric Data Engineering by creating a Lakehouse; authenticating with a Microsoft Entra app token; submit either batch or session jobs from a remote client to Fabric Spark compute. You'll discover the Livy API endpoint; submit jobs; and monitor the results.

Prerequisites

Choosing a REST API client

You can use various programming languages or GUI clients to interact with REST API endpoints. In this article, we use Visual Studio Code. Visual Studio Code needs to be configured with Jupyter Notebooks, PySpark, and the Microsoft Authentication Library (MSAL) for Python

How to authorize the Livy API requests

To work with Fabric APIs including the Livy API, you first need to create a Microsoft Entra application and obtain a token. Your application needs to be registered and configured adequately to perform API calls against Fabric. For more information, see Register an application with the Microsoft identity platform.

There are many Microsoft Entra scope permissions required to execute Livy jobs. This example uses simple Spark code + storage access + SQL:

  • Code.AccessAzureDataExplorer.All
  • Code.AccessAzureDataLake.All
  • Code.AccessAzureKeyvault.All
  • Code.AccessFabric.All
  • Code.AccessStorage.All
  • Item.ReadWrite.All
  • Lakehouse.Execute.All
  • Lakehouse.Read.All
  • Workspace.ReadWrite.All

Screenshot showing Livy API permissions in the Microsoft Entra admin center.

Note

During public preview we will be adding a few additional granular scopes, and if you use this approach, when we add these additional scopes your Livy app will break. Please check this list as it will be updated with the additional scopes.

Some customers want more granular permissions than the prior list. You could remove Item.ReadWrite.All and replacing with these more granular scope permissions:

  • Code.AccessAzureDataExplorer.All
  • Code.AccessAzureDataLake.All
  • Code.AccessAzureKeyvault.All
  • Code.AccessFabric.All
  • Code.AccessStorage.All
  • Lakehouse.Execute.All
  • Lakehouse.ReadWrite.All
  • Workspace.ReadWrite.All
  • Notebook.ReadWrite.All
  • SparkJobDefinition.ReadWrite.All
  • MLModel.ReadWrite.All
  • MLExperiment.ReadWrite.All
  • Dataset.ReadWrite.All

When you've registered your application, you'll need both the Application (client) ID and the Directory (tenant) ID.

Screenshot showing Livy API app overview in the Microsoft Entra admin center.

The authenticated user calling the Livy API needs to be a workspace member where both the API and data source items are located with a Contributor role. For more information, see Give users access to workspaces.

How to discover the Fabric Livy API endpoint

A Lakehouse artifact is required to access the Livy endpoint. Once the Lakehouse is created, the Livy API endpoint can be located within the settings panel.

Screenshot showing Livy API endpoints in Lakehouse settings.

The endpoint of the Livy API would follow this pattern:

https://api.fabric.microsoft.com/v1/workspaces/<ws_id>/lakehouses/<lakehouse_id>/livyapi/versions/2023-12-01/

The URL is appended with either <sessions> or <batches> depending on what you choose.

Integration with Fabric Environments

For each Fabric workspace, a default starter pool is provisioned, the execution of all the spark code use this starter pool by default. You can use Fabric Environments to customize the Livy API Spark jobs.

Download the Livy API Swagger files

The full swagger files for the Livy API are available here.

Submit a Livy API jobs

Now that setup of the Livy API is complete, you can choose to submit either batch or session jobs.

How to monitor the request history

You can use the Monitoring Hub to see your prior Livy API submissions, and debug any submissions errors.

Screenshot showing previous Livy API submissions in the Monitoring hub.