Power Query Introduction

Power Query previously known as 'Data Explorer' is an Excel add-in (compatible with 2010 and 2013 Professional - see my previous post for details) that enables discovery and transformation of data. It is helping to solve the "Big Data" problem... but what is this problem?

Everyone has a different opinion on what this really means, for me it is 3 key things:

  1. The rapid rise in the volume of data (Gartner predicted that by 2020 we would have 50X the amount of data we have today - I've since heard that this figure has gone up)
  2. New data types and formats to deal with
  3. Increasing end user expectations; users have higher demands from IT, business users want quick & easy access to data and insights

Power Query is a huge step in resolving some of these challenges, the most important part is that it is a self-service solution, it is delivered through Excel (very accessible with a familiar look & feel) it builds upon the existing well-used capabilities within Excel and brings them all together into one experience - It enables self-service data discovery, data refinement through easy query development, transformation and mash-up's to join many different datasets together.

For anyone who is yet to see/experience these capabilities here is an introduction to some of the Power Query capabilities:

Data Import: Online search, From Web, From File, From Database, From Other Sources

Online: Power Query enables you to search and import data from public data sources; currently whilst in preview the datasets include Wikipedia & a sub-set of data from Data.gov but this will continue to expand as we move through the preview:

As you can see from the print screen you can mouse-over the results to get a preview of the dataset that has been found before adding.

You can also add datasets from a webpage by simply adding the URL, it then displays any datasets visible for you to import into your sheet (I love this feature! No more awkward copy & paste trying to split data into cells!)

We then have traditional imports from file and database (including SQL Server, Windows Azure SQL, Access, Oracle, MySQL, DB2, PostgreSQL & Teradata) but the most interesting is the Non-traditional sources (& these will continue to evolve over time):

This is exciting and really shows the power that traditional information workers now have, it will be interesting to see what else gets added here over time. I tried out the Facebook connector and was surprised at just how much data is available (I must admit it is a bit of an eye-opener on how much info Facebook has about me!)

You can see how this will enable organizations to track engagement around specific content posted, I've already heard many examples of where Facebook data has been used, one example was where an organization had been able to determine the best time to post to their page to get the most engagement.  

Data Refinement

Once you have found and imported your dataset you now want to refine/ cleanse the data, traditionally this required manual efforts or the use of Excel formulas for trim, splitting etc. Now this can all be done through a single page with easy to find actions in the ribbon. I very easily edited my dataset in just a few seconds including making first row header, removing duplicates, changing types and splitting data based on a comma: 

You will notice on the left hand side that it also tracks the steps that you have performed enabling you to very easily back-track or re-use the query at a later date. You can also merge or append to other queries within the workbook with a single button click.

Publish, Share & Collaborate on Queries

Once you are happy with your queries you can also share with your colleagues / team (or even just post to the portal for future reference by yourself):

This posts your query into the Office 365 Power BI portal, within the portal you can view all your published queries, data sources and also usage statistics on your published queries!

 

Although brief this introduction highlights some of the key capabilities of Power Query & the powerful collaboration capabilities. Watch the this blog for future posts on data analysis and visualization.