Read from semantic models and write data consumable by Power BI using python

In this article, you learn how to read data and metadata and evaluate measures in semantic models using the SemPy python library in Microsoft Fabric. You also learn how to write data that semantic models can consume.

Prerequisites

  • Visit the Data Science experience in Microsoft Fabric.
  • Create a new notebook, to copy/paste code into cells
  • For Spark 3.4 and above, Semantic link is available in the default runtime when using Fabric, and there's no need to install it. If you're using Spark 3.3 or below, or if you want to update to the most recent version of Semantic Link, you can run the command: python %pip install -U semantic-link  
  • Add a Lakehouse to your notebook
  • Download the Customer Profitability Sample.pbix semantic model from the datasets folder of the fabric-samples repository, and save the semantic model locally

Upload the semantic model into your workspace

This article uses the Customer Profitability Sample.pbix semantic model. This semantic model references a company manufacturing marketing materials. It contains product, customer, and corresponding revenue data for various business units.

  1. Open your workspace in Fabric Data Science
  2. Select Upload > Browse, and select the Customer Profitability Sample.pbix semantic model.

Screenshot showing the interface for uploading a semantic model into the workspace.

When the upload is complete, your workspace has three new artifacts: a Power BI report, a dashboard, and a semantic model named Customer Profitability Sample. The steps in this article rely on that semantic model.

Screenshot showing the items from the Power BI file uploaded into the workspace.

Use Python to read data from semantic models

The SemPy Python API can retrieve data and metadata from semantic models located in a Microsoft Fabric workspace. The API can also execute queries on them.

Your notebook, Power BI dataset semantic model, and lakehouse can be located in the same workspace or in different workspaces. By default, SemPy tries to access your semantic model from:

  • The workspace of your lakehouse, if you attached a lakehouse to your notebook.
  • The workspace of your notebook, if there's no lakehouse attached.

If your semantic model isn't located in either of these workspaces, you must specify the workspace of your semantic model when you call a SemPy method.

To read data from semantic models:

  1. List the available semantic models in your workspace.

    import sempy.fabric as fabric
    
    df_datasets = fabric.list_datasets()
    df_datasets
    
  2. List the tables available in the Customer Profitability Sample semantic model.

    df_tables = fabric.list_tables("Customer Profitability Sample", include_columns=True)
    df_tables
    
  3. List the measures defined in the Customer Profitability Sample semantic model.

    Tip

    In the following code sample, we specified the workspace for SemPy to use for accessing the semantic model. You can replace Your Workspace with the name of the workspace where you uploaded the semantic model (from the Upload the semantic model into your workspace section).

    df_measures = fabric.list_measures("Customer Profitability Sample", workspace="Your Workspace")
    df_measures
    

    Here, we determined that the Customer table is the table of interest.

  4. Read the Customer table from the Customer Profitability Sample semantic model.

    df_table = fabric.read_table("Customer Profitability Sample", "Customer")
    df_table
    

    Note

    • Data is retrieved using XMLA. This requires at least XMLA read-only to be enabled.
    • The amount of retrievable data is limited by - the maximum memory per query of the capacity SKU that hosts the semantic model - the Spark driver node (visit node sizes for more information) that runs the notebook
    • All requests use low priority to minimize the impact on Microsoft Azure Analysis Services performance, and are billed as interactive requests.
  5. Evaluate the Total Revenue measure for the state and date of each customer.

    df_measure = fabric.evaluate_measure(
        "Customer Profitability Sample",
        "Total Revenue",
        ["'Customer'[State]", "Calendar[Date]"])
    df_measure
    

    Note

    • By default, data is not retrieved using XMLA and therefore doesn't require XMLA read-only to be enabled.
    • The data is not subject to Power BI backend limitations.
    • The amount of retrievable data is limited by - the maximum memory per query of the capacity SKU hosting the semantic model - the Spark driver node (visit node sizes for more information) that' runs the notebook
    • All requests are billed as interactive requests
  6. To add filters to the measure calculation, specify a list of permissible values for a particular column.

    filters = {
        "State[Region]": ["East", "Central"],
        "State[State]": ["FLORIDA", "NEW YORK"]
    }
    df_measure = fabric.evaluate_measure(
        "Customer Profitability Sample",
        "Total Revenue",
        ["Customer[State]", "Calendar[Date]"],
        filters=filters)
    df_measure
    
  7. You can also evaluate the Total Revenue measure per customer's state and date with a DAX query.

    df_dax = fabric.evaluate_dax(
        "Customer Profitability Sample",
        """
        EVALUATE SUMMARIZECOLUMNS(
            'State'[Region],
            'Calendar'[Date].[Year],
            'Calendar'[Date].[Month],
            "Total Revenue",
            CALCULATE([Total Revenue]))
        """)
    

    Note

    • Data is retrieved using XMLA and therefore requires at least XMLA read-only to be enabled
    • The amount of retrievable data is limited by the available memory in Microsoft Azure Analysis Services and the Spark driver node (visit node sizes for more information)
    • All requests use low priority to minimize the impact on Analysis Services performance and are billed as interactive requests
  8. Use the %%dax cell magic to evaluate the same DAX query, without the need to import the library. Run this cell to load %%dax cell magic:

    %load_ext sempy
    

    The workspace parameter is optional. It follows the same rules as the workspace parameter of the evaluate_dax function.

    The cell magic also supports access of Python variables with the {variable_name} syntax. To use a curly brace in the DAX query, escape it with another curly brace (example: EVALUATE {{1}}).

    %%dax "Customer Profitability Sample" -w "Your Workspace"
    EVALUATE SUMMARIZECOLUMNS(
        'State'[Region],
        'Calendar'[Date].[Year],
        'Calendar'[Date].[Month],
        "Total Revenue",
        CALCULATE([Total Revenue]))
    

    The resulting FabricDataFrame is available via the _ variable. That variable captures the output of the last executed cell.

    df_dax = _
    
    df_dax.head()
    
  9. You can add measures to data retrieved from external sources. This approach combines three tasks:

    • It resolves column names to Power BI dimensions
    • It defines group by columns
    • It filters the measure Any column names that can't be resolved within the given semantic model are ignored (visit the supported DAX syntax resource for more information).
    from sempy.fabric import FabricDataFrame
    
    df = FabricDataFrame({
            "Sales Agent": ["Agent 1", "Agent 1", "Agent 2"],
            "Customer[Country/Region]": ["US", "GB", "US"],
            "Industry[Industry]": ["Services", "CPG", "Manufacturing"],
        }
    )
    
    joined_df = df.add_measure("Total Revenue", dataset="Customer Profitability Sample")
    joined_df
    

Special parameters

The SemPy read_table and evaluate_measure methods have more parameters that are useful for manipulating the output. These parameters include:

  • fully_qualified_columns: For a "True" value, the methods return column names in the form TableName[ColumnName]
  • num_rows: The number of rows to output in the result
  • pandas_convert_dtypes: For a "True" value, pandas cast the resulting DataFrame columns to the best possible dtype convert_dtypes. If this parameter is turned off, type incompatibility issues between columns of related tables can result; the Power BI model might not detect those issues because of DAX implicit type conversion

SemPy read_table also uses the model information that Power BI provides.

  • multiindex_hierarchies: If "True", it converts Power BI Hierarchies to a pandas MultiIndex structure

Write data consumable by semantic models

Spark tables added to a Lakehouse are automatically added to the corresponding default semantic model. This example demonstrates how to write data to the attached Lakehouse. The FabricDataFrame accepts the same input data as Pandas dataframes.

from sempy.fabric import FabricDataFrame

df_forecast = FabricDataFrame({'ForecastedRevenue': [1, 2, 3]})

df_forecast.to_lakehouse_table("ForecastTable")

With Power BI, the ForecastTable table can be added to a composite semantic model with the Lakehouse semantic model.