Read from semantic models and write data consumable by Power BI using python
In this article, you'll learn how to read data and metadata and evaluate measures in semantic models using the SemPy python library in Microsoft Fabric. You'll also learn how to write data that semantic models can consume.
Prerequisites
Get a Microsoft Fabric subscription. Or, sign up for a free Microsoft Fabric trial.
Sign in to Microsoft Fabric.
Use the experience switcher on the left side of your home page to switch to the Synapse Data Science experience.
- Go to the Data Science experience in Microsoft Fabric.
- Create a new notebook to copy/paste code into cells.
- For Spark 3.4 and above, Semantic link is available in the default runtime when using Fabric, and there's no need to install it. If you're using Spark 3.3 or below, or if you want to update to the most recent version of Semantic Link, you can run the command:
python %pip install -U semantic-link
- Add a Lakehouse to your notebook.
- Download the Customer Profitability Sample.pbix semantic model from the datasets folder of the fabric-samples repository, and save the semantic model locally.
Upload the semantic model into your workspace
In this article, we use the Customer Profitability Sample.pbix semantic model. This semantic model references a company manufacturing marketing materials and contains data about products, customers, and corresponding revenue for various business units.
- Open your workspace in Fabric Data Science.
- Select Upload > Browse and select the Customer Profitability Sample.pbix semantic model.
Once the upload is done, your workspace will have three new artifacts: a Power BI report, a dashboard, and a semantic model named Customer Profitability Sample. You'll use this semantic model for the steps in this article.
Use Python to read data from semantic models
The SemPy Python API can retrieve data and metadata from semantic models located in a Microsoft Fabric workspace and execute queries on them.
Your notebook, Power BI dataset semantic model, and lakehouse can be located in the same workspace or in different workspaces. By default, SemPy tries to access your semantic model from:
- The workspace of your lakehouse, if you attached a lakehouse to your notebook.
- The workspace of your notebook, if there's no lakehouse attached.
If your semantic model isn't located in either of these workspaces, you must specify the workspace of your semantic model when you call a SemPy method.
To read data from semantic models:
List the available semantic models in your workspace.
import sempy.fabric as fabric df_datasets = fabric.list_datasets() df_datasets
List the tables available in the Customer Profitability Sample semantic model.
df_tables = fabric.list_tables("Customer Profitability Sample", include_columns=True) df_tables
List the measures defined in the Customer Profitability Sample semantic model.
Tip
In the following code, we've specified the workspace for SemPy to use for accessing the semantic model. You can replace
Your Workspace
with the name of the workspace where you uploaded the semantic model (from the Upload the semantic model into your workspace section).df_measures = fabric.list_measures("Customer Profitability Sample", workspace="Your Workspace") df_measures
Now we've determined that the Customer table is the table of interest.
Read the Customer table from the Customer Profitability Sample semantic model.
df_table = fabric.read_table("Customer Profitability Sample", "Customer") df_table
Note
- Data is retrieved using XMLA and therefore requires at least XMLA read-only to be enabled.
- The amount of data that's retrievable is limited by the maximum memory per query of the capacity SKU hosting the semantic model and by the Spark driver node (see node sizes) that's running the notebook.
- All requests use low priority to minimize the impact on Microsoft Azure Analysis Services performance and are billed as interactive requests.
Evaluate the Total Revenue measure per customer's state and date.
df_measure = fabric.evaluate_measure( "Customer Profitability Sample", "Total Revenue", ["'Customer'[State]", "Calendar[Date]"]) df_measure
Note
- By default, data is not retrieved using XMLA and therefore doesn't require XMLA read-only to be enabled.
- Furthermore, the data is not subject to Power BI backend limitations.
- The amount of data that's retrievable is limited by the maximum memory per query of the capacity SKU hosting the semantic model and by the Spark driver node (see node sizes) that's running the notebook.
- All requests are billed as interactive requests.
You can add filters to the measure calculation by specifying a list of values that can be in a particular column.
filters = { "State[Region]": ["East", "Central"], "State[State]": ["FLORIDA", "NEW YORK"] } df_measure = fabric.evaluate_measure( "Customer Profitability Sample", "Total Revenue", ["Customer[State]", "Calendar[Date]"], filters=filters) df_measure
You can also evaluate the Total Revenue measure per customer's state and date by using a DAX query.
df_dax = fabric.evaluate_dax( "Customer Profitability Sample", """ EVALUATE SUMMARIZECOLUMNS( 'State'[Region], 'Calendar'[Date].[Year], 'Calendar'[Date].[Month], "Total Revenue", CALCULATE([Total Revenue])) """)
Note
- Data is retrieved using XMLA and therefore requires at least XMLA read-only to be enabled.
- The amount of data retrievable is limited by the available memory in Microsoft Azure Analysis Services and the Spark driver node (see node sizes).
- All requests use low priority to minimize the impact on Analysis Services performance and are billed as interactive requests.
You can evaluate the same DAX query without the need to import the library, by using the
%%dax
cell magic. Let's run the cell below to load%%dax
cell magic.%load_ext sempy
The workspace parameter is optional and follows the same rules as the workspace parameter of the
evaluate_dax
function. The cell magic also supports accessing Python variables using the{variable_name}
syntax. To use a curly brace in the DAX query, escape it with another curly brace (e.g.EVALUATE {{1}}
).%%dax "Customer Profitability Sample" -w "Your Workspace" EVALUATE SUMMARIZECOLUMNS( 'State'[Region], 'Calendar'[Date].[Year], 'Calendar'[Date].[Month], "Total Revenue", CALCULATE([Total Revenue]))
The resulting FabricDataFrame is available via the
_
variable, which captures the output of the last executed cell.df_dax = _ df_dax.head()
Alternatively, you can add measures to data retrieved from external sources. This approach combines three tasks: it resolves column names to Power BI dimensions, defines group by columns and filters the measure. Any column names that can't be resolved within the given semantic model are ignored (see the supported DAX syntax).
from sempy.fabric import FabricDataFrame df = FabricDataFrame({ "Sales Agent": ["Agent 1", "Agent 1", "Agent 2"], "Customer[Country/Region]": ["US", "GB", "US"], "Industry[Industry]": ["Services", "CPG", "Manufacturing"], } ) joined_df = df.add_measure("Total Revenue", dataset="Customer Profitability Sample") joined_df
Special parameters
The SemPy read_table
and evaluate_measure
methods have more parameters that are useful for manipulating the output. These parameters include:
fully_qualified_columns
: If the value is "True", the methods return columns names in the formTableName[ColumnName]
.num_rows
: Number of rows to output in the result.pandas_convert_dtypes
: If the value is "True", the resulting DataFrame's columns are cast to the best possible dtype, using pandas convert_dtypes. If this parameter is turned off, type incompatibility issues can result between columns of related tables that might not have been detected in the Power BI model due to DAX implicit type conversion.
SemPy read_table
also uses the model information provided by Power BI.
multiindex_hierarchies
: If True, converts Power BI Hierarchies to pandas MultiIndex structure.
Write data consumable by semantic models
Spark tables added to a Lakehouse are automatically added to the corresponding default semantic model. This example demonstrates how to write data to the attached Lakehouse. The FabricDataFrame accepts the same input data as Pandas dataframes.
from sempy.fabric import FabricDataFrame
df_forecast = FabricDataFrame({'ForecastedRevenue': [1, 2, 3]})
df_forecast.to_lakehouse_table("ForecastTable")
By using Power BI, the ForecastTable table can be added to a composite semantic model using the Lakehouse semantic model.