Lakehouse tutorial: Create a lakehouse, ingest sample data, and build a report

In this tutorial, you build a lakehouse, ingest sample data into the Delta table, apply transformation where required, and then create reports. In this tutorial, you'll learn to:

  • Create a lakehouse from the Data Engineering workload
  • Download and ingest sample customer data
  • Add tables to the semantic model
  • Build a report

If you don’t have a Microsoft Fabric, sign up for a free trial capacity.

Prerequisites

Create a lakehouse

In this section, you create a lakehouse from the Data Engineering experience.

  1. In Power BI, select Workspaces from the left-hand menu.

  2. To open your workspace, enter its name in the search textbox located at the top and select it from the search results.

  3. From the switcher located at the bottom left, select Data Engineering.

    Screenshot showing where to select the switcher and Data Engineering.

  4. In the Data Engineering screen, select Lakehouse to create a lakehouse.

  5. In the New lakehouse dialog box, enter wwilakehouse in the Name field.

    Screenshot of the New lakehouse dialog box.

  6. Select Create to create and open the new lakehouse.

Ingest sample data

In this section, you ingest sample customer data into the lakehouse.

Note

If you don't have OneDrive configured, sign up for the Microsoft 365 free trial: Free Trial - Try Microsoft 365 for a month.

  1. Download the dimension_customer.csv file from the Fabric samples repo.

  2. In the Home tab, under Get data in your lakehouse, you see options to load data into the lakehouse. Select New Dataflow Gen2.

    Screenshot showing where to select New Dataflow Gen2 option to load data into your lakehouse.

  3. On the new dataflow screen, select Import from a Text/CSV file.

  4. On the Connect to data source screen, select the Upload file radio button. Drag and drop the dimension_customer.csv file that you downloaded in step 1. After the file is uploaded, select Next.

    Screenshot showing where to select Upload file and where to drag the previously downloaded file.

  5. From the Preview file data page, preview the data and select Create to proceed and return back to the dataflow canvas.

  6. In the Query settings pane, update the Name field to dimension_customer.

    Note

    Fabric adds a space and number at the end of the table name by default. Table names must be lower case and must not contain spaces. Please rename it appropriately and remove any spaces from the table name.

    Screenshot of the query settings pane, showing where to enter the name and select the data destination.

  7. In this tutorial, you associated the customer data with a lakehouse. If you have other data items that you want to associate with the lakehouse, you can add them:

    1. From the menu items, select Add data destination and select Lakehouse. From the Connect to data destination screen, sign into your account if necessary and select Next.

    2. Navigate to the wwilakehouse in your workspace.

    3. If the dimension_customer table doesn't exist, select the New table setting and enter the table name dimension_customer. If the table already exists, select the Existing table setting and choose dimension_customer from the list of tables in the object explorer. Select Next.

      Screenshot showing how to choose the destination table.

    4. On the Choose destination settings pane, select Replace as Update method. Select Save settings to return to the dataflow canvas.

  8. From the dataflow canvas, you can easily transform the data based on your business requirements. For simplicity, we aren't making any changes in this tutorial. To proceed, select Publish at the bottom right of the screen.

    Screenshot of the Query setting pane that contains the Publish button.

  9. A spinning circle next to the dataflow's name indicates publishing is in progress in the item view. When publishing is complete, select the ... and select Properties. Rename the dataflow to Load Lakehouse Table and select Save.

  10. Select the Refresh now option next to the data flow name to refresh the dataflow. This option runs the data flow and moves data from the source file to lakehouse table. While it's in progress, you see a spinning circle under Refreshed column in the item view.

    Screenshot showing where to find the Refresh now icon.

  11. Once the dataflow is refreshed, select your new lakehouse in the left navigation bar to view the dimension_customer Delta table.

    Screenshot of navigation panel from which the lakehouse is opened.

  12. Select the table to preview its data. You can also use the SQL analytics endpoint of the lakehouse to query the data with SQL statements. Select SQL analytics endpoint from the Lakehouse drop-down menu at the top right of the screen.

    Screenshot of the Delta table, showing where to select SQL analytics endpoint.

  13. Select the dimension_customer table to preview its data or select New SQL query to write your SQL statements.

    Screenshot of the SQL analytics endpoint screen, showing where to select New SQL query.

  14. The following sample query aggregates the row count based on the BuyingGroup column of the dimension_customer table. SQL query files are saved automatically for future reference, and you can rename or delete these files based on your need.

    To run the script, select the Run icon at the top of the script file.

    SELECT BuyingGroup, Count(*) AS Total
    FROM dimension_customer
    GROUP BY BuyingGroup
    

Build a report

In this section, you'll build a report from the ingested data.

  1. Previously all the lakehouse tables and views were automatically added to the semantic model. With recent updates, for new lakehouses, you must manually add your tables to the semantic model. Open your lakehouse and switch to the SQL analytics endpoint view. From the Reporting tab, select Manage default semantic model and select the tables that you want to add to the semantic model. In this case, select the dimension_customer table.

    Screenshot where you can select the tables to add to the semantic model.

  2. To ensure that the tables in the semantic model are always in sync, switch to the SQL analytics endpoint view and open the lakehouse settings pane. Select Default Power BI semantic model and turn on Sync the default Power BI semantic model. For more information, see Default Power BI semantic models.

    Screenshot showing how to turn on data sync to the default semantic model.

  3. After the table is added, Fabric creates a semantic model with the same name as the lakehouse.

    Screenshot showing the default semantic model that was created when the new lakehouse was created.

  4. From the semantic model pane, you can view all the tables. You have options to create reports either from scratch, paginated reports, or let Power BI automatically create a report based on your data. For this tutorial, under Explore this data, select Auto-create a report. In the next tutorial, we create a report from scratch.

    Screenshot of the semantic model details page, showing where to select Create a report.

  5. Because the table is a dimension and there are no measures in it, Power BI creates a measure for the row count and aggregates it across different columns, and creates different charts as shown in the following image. You can save this report for the future by selecting Save from the top ribbon. You can make more changes to this report to meet your requirements by including or excluding other tables or columns.

    Screenshot of a Quick summary page displaying four different bar charts.

Next step