Freigeben über


K-nearest neighbor - Heuristic what?

I was reading an article the other day about heuristics and how taking historical data allows you to forecast what a result may be.  The article was specific to search engines and SEO but I thought I would talk about one the simpler machine learning algorithms and how it can be used to to predict results based on the history of choices. 

I'm going to use an example of a video store and how it makes suggestions to a potential customer.

First you need an active database of all of the selections made over a period of time, combined with a weighting of how much the user did or didn't like the movie, indexed by a unique user ID.

Enter K-nearest neighbor.  Essentially classifying how much you liked a movie by the popularity of it compared to other movies indexed by like-minded users. 

Essentially what happens is over a period of time is you end up with a grouping around each movie of similar movies rated highly compared to what you've seen.  So if you really like a movie and another user also liked the movie, the algorithm would suggest movies equally rated as highly by the other users effectively using the group dynamic to create what movies you're more likely to enjoy.

Pretty simple huh?

Anyway this type of analysis is used all over the place, I even used something like this when I was in FMCG to see if people were stealing from the tills.

So why does this pertain to search engines?  Well guess what you're doing every time you click a link or many links on a search page.