Επεξεργασία

Κοινή χρήση μέσω


Use a Jupyter Notebook and kqlmagic extension to analyze data in Azure Data Explorer

Jupyter Notebook is an open-source web application that allows you to create and share documents containing live code, equations, visualizations, and narrative text. It's useful for a wide range of tasks, such as data cleaning and transformation, numerical simulation, statistical modeling, data visualization, and machine learning.

Kqlmagic extends the capabilities of the Python kernel in Jupyter Notebook so you can run Kusto Query Language (KQL) queries natively. You can combine Python and KQL to query and visualize data using the rich Plot.ly library integrated with the render operator. The kqlmagic extension is compatible with Jupyter Lab, Visual Studio Code Jupyter extension, and Azure Data Studio, and supported data sources include Azure Data Explorer, Azure Monitor logs, and Application Insights.

In this article, you'll learn how to use kqlmagic in a Jupyter Notebook to connect to and query data stored in Azure Data Explorer.

Prerequisites

  • A Microsoft account or a Microsoft Entra user identity. An Azure subscription isn't required.
  • Jupyter Notebook installed on your local machine. Otherwise, use Azure Data Studio.
  • Python 3.6. To change the Jupyter Notebook kernel version to Python 3.6, select Kernel > Change Kernel > Python 3.6.

Install kqlmagic

Once you install and load the kqlmagic extension, you can write KQL queries in your notebook. If the kernel stops or the results aren't as expected, reload the kqlmagic extension.

  1. To install kqlmagic, run the following command:

    !pip install Kqlmagic --no-cache-dir  --upgrade
    
  2. To load the kqlmagic extension, run the following command:

    %reload_ext Kqlmagic
    

Connect to a cluster

Select the tab for your preferred method to connect to your cluster.

Note

We recommend using the Certificate method of authentication when possible.

The Microsoft Entra certificate should be stored in a file accessible from the notebook. This file can be referenced in the connection string.

%kql AzureDataExplorer://tenant='<tenant-id>';certificate='<certificate>';certificate_thumbprint='<thumbprint>';cluster='<cluster-name>';database='<database-name>'

Tip

  • To parameterize the connection string, use unquoted values as they are interpreted as Python expressions.
  • To simplify the process of getting credentials, see Connection options.

Example of cluster connection

The following command uses the Microsoft Entra code method to authenticate to the Samples database hosted on the help cluster. For non-Microsoft Entra users, replace the tenant name Microsoft.com with your Microsoft Entra tenant.

%kql AzureDataExplorer://tenant="Microsoft.com";code;cluster='help';database='Samples'

Connection options

To simplify the process of getting credentials, you can add one of the following option flags after the connection string.

Option Description Example syntax
try_azcli_login Attempt to get authentication credentials from Azure CLI. -try_azcli_login
try_azcli_login_subscription Attempt to get authentication credentials from Azure CLI based on the specified subscription. -try_azcli_login_subscription=<subscription_id>
try_vscode_login Attempt to get authentication credentials from Visual Studio Code Azure account sign-in. -try_vscode_login
try_msi Attempt to get authentication credentials from the MSI local endpoint. Expects a dictionary with the optional MSI parameters: resource, client_id/object_id/mis_res_id, cloud_environment, timeout. -try_msi={"client_id":<id>}
try_token Authenticate with a specified token. Expects a dictionary with Azure AD v1 or v2 token properties. -try_token={"tokenType":"bearer","accessToken":"<token>"}

Example of connection option

Any of the options described in the previous table can be added after a connection string. The following example uses the Azure CLI sign-in option:

%kql azureDataExplorer://code;cluster='help';database='Samples' -try_azcli_login

Display connection information

To see all existing connections, run the following command:

%kql --conn

To check the details of a specific connection, run the following command:

%kql --conn <database-name>@<cluster-name>

Query and visualize

Query data using the render operator and visualize data using the ploy.ly library. This query and visualization supplies an integrated experience that uses native KQL. Kqlmagic supports most charts except timepivot, pivotchart, and ladderchart. Render is supported with all attributes except kind, ysplit, and accumulate.

Query and render piechart

%%kql
StormEvents
| summarize statecount=count() by State
| sort by statecount 
| take 10
| render piechart title="My Pie Chart by State"

Query and render timechart

%%kql
StormEvents
| summarize count() by bin(StartTime,7d)
| render timechart

Note

These charts are interactive. Select a time range to zoom into a specific time.

Customize the chart colors

If you don't like the default color palette, customize the charts using palette options. The available palettes can be found here: Choose colors palette for your kqlmagic query chart result

  1. For a list of palettes:

    %kql --palettes -popup_window
    
  2. Select the cool color palette and render the query again:

    %%kql -palette_name "cool"
    StormEvents
    | summarize statecount=count() by State
    | sort by statecount
    | take 10
    | render piechart title="My Pie Chart by State"
    

Parameterize a query with Python

Kqlmagic allows for simple interchange between Kusto Query Language and Python. To learn more: Parameterize your kqlmagic query with Python

Use a Python variable in your KQL query

You can use the value of a Python variable in your query to filter the data:

statefilter = ["TEXAS", "KANSAS"]
%%kql
let _state = statefilter;
StormEvents 
| where State in (_state) 
| summarize statecount=count() by bin(StartTime,1d), State
| render timechart title = "Trend"

Convert query results to Pandas DataFrame

You can access the results of a KQL query in Pandas DataFrame. Access the last executed query results by variable _kql_raw_result_ and easily convert the results into Pandas DataFrame as follows:

df = _kql_raw_result_.to_dataframe()
df.head(10)

Example

In many analytics scenarios, you may want to create reusable notebooks that contain many queries and feed the results from one query into subsequent queries. The example below uses the Python variable statefilter to filter the data.

  1. Run a query to view the top 10 states with maximum DamageProperty:

    %%kql
    StormEvents
    | summarize max(DamageProperty) by State
    | order by max_DamageProperty desc
    | take 10
    
  2. Run a query to extract the top state and set it into a Python variable:

    df = _kql_raw_result_.to_dataframe()
    statefilter =df.loc[0].State
    statefilter
    
  3. Run a query using the let statement and the Python variable:

    %%kql
    let _state = statefilter;
    StormEvents 
    | where State in (_state)
    | summarize statecount=count() by bin(StartTime,1d), State
    | render timechart title = "Trend"
    
  4. Run the help command:

    %kql --help "help"
    

Tip

To receive information about all available configurations use %config Kqlmagic. To troubleshoot and capture Kusto errors, such as connection issues and incorrect queries, use %config Kqlmagic.short_errors=False

Sample notebooks