Muokkaa

Jaa


Create a parameterized notebook by using Papermill

Parameterization in Azure Data Studio is running the same notebook with a different set of parameters.

This article shows you how to create and run a parameterized notebook in Azure Data Studio by using the Python kernel.

Note

Currently, you can use parameterization with Python, PySpark, PowerShell, and .NET Interactive kernels.

Prerequisites

Install and set up Papermill in Azure Data Studio

All the steps in this section run inside an Azure Data Studio notebook.

  1. Create a new notebook. Change Kernel to Python 3:

    Screenshot that shows the New notebook menu option and setting the Kernel value to Python 3.

  2. If you're prompted to upgrade your Python packages when your packages need updating, select Yes:

    Screenshot that shows the dialog prompt to update Python packages.

  3. Install Papermill:

    import sys
    !{sys.executable} -m pip install papermill --no-cache-dir --upgrade
    

    Verify that Papermill is installed:

    import sys
    !{sys.executable} -m pip list
    

    Screenshot that shows selecting Papermill in a list of application names.

  4. To verify that Papermill installed correctly, check the version of Papermill:

    import papermill
    papermill
    

    Screenshot that shows installation validation for Papermill.

Parameterization example

You can use an example notebook file to go through the steps in this article:

  1. Go to the notebook file in GitHub. Select Raw.
  2. Select Ctrl+S or right-click, and then save the file with the .ipynb extension.
  3. Open the file in Azure Data Studio.

Set up a parameterized notebook

You can begin with the example notebook open in Azure Data Studio or complete the following steps to create a notebook. Then, try using different parameters. All the steps run inside an Azure Data Studio notebook.

  1. Verify that Kernel is set to Python 3:

    Screenshot that shows the Kernel value to Python 3.

  2. Make a new code cell. Select Parameters to tag the cell as a parameters cell.

    x = 2.0
    y = 5.0
    

    Screenshot that shows creating a new parameters cell with Parameters selected.

  3. Add other cells to test different parameters:

    addition = x + y
    multiply = x * y
    
    print("Addition: " + str(addition))
    print("Multiplication: " + str(multiply))
    

    After all cells are run, the output will look similar to this example:

    Screenshot that shows the output of cells added to test new parameters.

  4. Save the notebook as Input.ipynb:

    Screenshot that shows saving the notebook file.

Execute a Papermill notebook

You can execute Papermill in two ways:

  • Command-line interface (CLI)
  • Python API

Parameterized CLI execution

To execute a notebook by using the CLI, in the terminal, enter the papermill command with the input notebook, the location for the output notebook, and options.

Note

To learn more, see the Papermill CLI documentation.

  1. Execute the input notebook with new parameters:

    papermill Input.ipynb Output.ipynb -p x 10 -p y 20
    

    This command executes the input notebook with new values for parameters x and y.

  2. A new cell labeled # Injected-Parameters contains the new parameter values that were passed in via the CLI. The new # Injected-Parameters values are used for the new output that's shown in the last cell:

    Screenshot that shows the output for new parameters.

Parameterized Python API execution

Note

To learn more, see the Papermill Python documentation.

  1. Create a new notebook. Change Kernel to Python 3:

    Screenshot that shows the New notebook menu option and setting the Kernel value to Python 3.

  2. Add a new code cell. Then, use the Papermill Python API to execute and generate the output parameterized notebook:

    import papermill as pm
    
    pm.execute_notebook(
    '/Users/vasubhog/GitProjects/AzureDataStudio-Notebooks/Demo_Parameterization/Input.ipynb',
    '/Users/vasubhog/GitProjects/AzureDataStudio-Notebooks/Demo_Parameterization/Output.ipynb',
    parameters = dict(x = 10, y = 20)
    )
    

    Screenshot that shows the Python API execution.

  3. A new cell labeled # Injected-Parameters contains the new parameter values that were passed in. The new # Injected-Parameters values are used for the new output that's shown in the last cell:

    Screenshot that shows the output for new parameters.

Next steps

Learn more about notebooks and parameterization: