Condividi tramite


Using Data Explorer to Combine Netflix and Rotten Tomatoes

I recently reactivated my Netflix subscription after having it on hold for a few months. I had a lot of movies built up in the queue and I was wondering which ones I should watch first. Netflix does offer recommendations based on what I’ve seen before, but I was looking for another opinion. Time to fire up Data Explorer!

Please note that this is going to be a bit more advanced than some of the other samples you’ve seen posted from the other team members to this point. We’re going to skip some of the basic steps that they’ve already covered and dig a little deeper into the formula language.

The first step is to get data in from Netflix and clean it up. You can find your own Netflix queue RSS feed from your account page. I’ll be using a sample queue and blurring out the full URL to protect the innocent. When prompted for security credentials in Data Explorer, use anonymous access to get to the feed.

01-rssfeed

Since this is an XML feed, let’s click on the XML button in the ribbon.

02-xml

Now we have some hierarchical data in a tabular format so we’ll need to drill in and just pull out the bits that we want. Drilling in twice (through the Value column) brings us to this view:
03-drilledin

The data is here but it needs to be cleaned up. We’ll start by removing the first 6 rows.

04-removerows

We need to get the actual title out of each row. If I drill in to Value, I can see that the title is buried in there. To get it out, I’ll insert a new Custom Column. The builder won’t quite let me do something this tricky, so I’ll type in the formula myself: = each [Value][Value]{0}.

05-customcolumn

Bingo! Now I have the title and I need to clean it up. To do that I’ll use Split Column by Delimiter (using a hyphen as the delimiter) to split into Queue Position and Title, use Show Columns to show only those two columns, and then use Change Format to convert the Queue Position column into a number. Here’s the end result:

06-endrss

From here, I want to look up each movie title on RottenTomatoes.com using their JSON API. Note that you need to sign up to get an API key (I’ve blurred mine out in these screenshots.) I’ll let you dig through the API docs because you can do a wide variety of things, but all I need to do is look up the basic movie information. That includes both critics and audience scores. I’ll create a new resource from a Formula and enter the following text:

= (title) => Json.Document(Web.Contents("api.rottentomatoes.com/api/public/v1.0/movies.json?q=" & title &"&page_limit=1&page=1&apikey=*************"))[movies]{0}

This creates a new resource which is a formula that I can use from other resources. It takes a movie title as a parameter, makes the API call, goes into the Movies node of the JSON response and pulls out the first item. Note that this doesn’t always work perfectly especially for movies with short names because other movies might get returned first. For the purposes of this sample, it’s good enough. Let’s rename this resource to RottenTomatoes and here’s what our sample looks like now:

07-apicall

We have all the tools needed to create our final result. We could just start with the RssNetflixCom resource, but since that already has quite a few tasks, I’ll start a new resource that references it. Create a new resource from a Formula and use RssNetflixCom as the formula. Name the new resource QueueWithReviews.

08-queuewithreviews

Let’s insert a custom column. The title is RTMovieInfo and the formula is each RottenTomatoes([Title]). For each row, pass the value in the Title field to the RottenTomatoes function that we created earlier. When you get prompted for security credentials, connect anonymously.

09-addapicolumn

By drilling into the new column, I can see that all the data I want is there, but I need to pull it up to the top level. This is very similar to what we did to get the Title field out of the RSS feed so I’ll go through this quickly.

  • Add a custom column called Critics with the formula: each [RTMovieInfo][ratings][critics_score]
  • Add a custom column called Audience with the formula: each [RTMovieInfo][ratings][audience_score]
  • Hide the RTMovieInfo column

10-addedscorecolumns

This is looking pretty good, but let’s add one more column which is the sum of the critics score and the audience score.

11-totalscore

Now I can scroll through my Netflix queue and quickly check out the review scores for each movie! It’s not hard to think of a lot of other features to add to this such as some IMDB ratings, a link to read more reviews online, etc.

I’ll show you one more cool thing about Data Explorer. When you’re clicking through the UI, you’re actually building up a program that gets evaluated whenever you want to see the latest results. You can see the program that you’ve built by right clicking on Section1 in the bottom left and choosing View Formulas. If you’ve followed along in this sample, you’ll see the following program:

section Section1;

shared QueueWithReviews = let
    Queue = RssNetflixCom,
    InsertedCustom = Table.AddColumn(Queue, "RTMovieInfo", each RottenTomatoes([Title])),
    InsertedCustom1 = Table.AddColumn(InsertedCustom, "Critics", each [RTMovieInfo][ratings][critics_score]),
    InsertedCustom2 = Table.AddColumn(InsertedCustom1, "Audience", each [RTMovieInfo][ratings][audience_score]),
    HiddenColumns = Table.RemoveColumns(InsertedCustom2,{"RTMovieInfo"}),
    InsertedCustom3 = Table.AddColumn(HiddenColumns, "TotalScore", each [Critics]+[Audience])
in
    InsertedCustom3;

shared RottenTomatoes = (title) => Json.Document(Web.Contents("api.rottentomatoes.com/api/public/v1.0/movies.json?q=" & title &"&page_limit=1&page=1&apikey=***********************"))[movies]{0};

shared RssNetflixCom = let
    RssNetflixCom = Web.Contents(https://rss.netflix.com/QueueRSS?id=\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\*),
    ImportedXml = Xml.Document(RssNetflixCom),
    Value = ImportedXml{0}[Value]{0}[Value],
    RemovedFirstRows = Table.Skip(Value,6),
    InsertedCustom = Table.AddColumn(RemovedFirstRows, "Title", each [Value][Value]{0}),
    SplitColumnDelimiter = Table.SplitColumnByDelimiter(InsertedCustom,"Title","-",{"Queue Position","Title"},Occurrence.First),
    ChangedFormat = Table.TransformColumns(SplitColumnDelimiter,{{"Title", Text.Trim}}),
    ShownColumns = Table.SelectColumns(ChangedFormat,{"Queue Position", "Title"}),
    ChangedFormat1 = Table.TransformColumns(ShownColumns,{{"Queue Position", Number.FromText}})
in
    ChangedFormat1;

We’ll have more documentation around the formula language available soon, but for now I’ll just say this is some very powerful stuff! You could theoretically write out your whole program from this view, but even though I know a lot of the language, I still find it faster and easier to use the UI. Sometimes it’s handy to know this view exists though. You can read another post about the formula language on the Data Explorer team blog.

I could go on and on with this sample but it’s already plenty long. We’ll have a lot more posts like this coming! Stay tuned!

Comments

  • Anonymous
    November 22, 2011
    As your thinking,fully agree with your thoughts. Continue to write <a href="www.mlfhardwoodflooringltd.ca ">hardwood flooring in toronto</a> and tell us a great job