SQL Server for Data Mining
SQL Server 2005/2008 has a number of data mining algorithms built in to discover hidden relationships in your data or to use a set of attributes to predict the value of other attributes:
- Microsoft Association Algorithm. Used to work out which items are most likely to appear together in a transaction, so this is the classic basket analysis you might see on Amazon (customers who bought this all bought..).
- Microsoft Clustering Algorithm. Groups similar data items together. For example in banking there are lemons (poor customers) and cherries (good ones) and this algorithm will do that segmentation.
- Microsoft Decision Trees Algorithm. used to predict the value of a discrete (as opposed to a continuous) item e.g. whether a customer will buy a new car might or not.
- Microsoft Naive Bayes Algorithm Also predicts the value of an item but this works for continuous values such as the probability of a customer making a purchase. The naive indicates that there is no interdependency between the attributes used to predict he outcome e.g. gender is not related to age.
- Microsoft Neural Network Algorithm (SSAS) . This is supposed to mimic the way the brain works and is the scariest algorithm because there is no way to see how the answer was arrived at, and is used complex situations such as stock market prediction.
- Microsoft Sequence Clustering Algorithm This finds similar instance of events where they have similar sequences of steps. A good example is the analysis of navigation through a web site (click-stream analysis)
- Microsoft Time Series Algorithm. Used to predict the future e.g. future sales, profit etc.
- Microsoft Linear Regression Algorithm Plots an ideal line to best represent a series of points to establish the relationship between two continuous variables e.g. age and salary
- Microsoft Logistic Regression Algorithm Works in a similar fashion to linear regression but uses a curved line where the predicted state is one of a discrete set of possibilities.
There is a great section on MSDN on when to use which one here.
One of the best ways to get your head around this is to come to an event like this joint Microsoft / Hitachi TechNet session by Rafal Lukawiecki of Project Botichelli.
Technorati Tags: Data Mining,SQL Server
Comments
Anonymous
January 15, 2008
PingBack from http://0d1.info/stock-market/2008/01/15/sql-server-for-data-mining/Anonymous
January 17, 2008
For some of those unfortunate ones that cannot attend the event in London :-( are you going to provide a recording?Anonymous
January 22, 2008
Kris I am sorry but we're only goping to have the decks available afterwards. Can you drop me an e-mail (afryer@microsoft.com) and we'll discuss.Anonymous
January 30, 2008
Nice job on the sql launch slip. any way to subscribe to a tag? I'm not a techie, but I am a data-miner (see here and here http://blogs.msdn.com/maamktg/archive/2007/12/06/working-in-the-data-mine.aspx http://blogs.msdn.com/maamktg/archive/2007/12/13/book-review-super-crunchers.aspx think you've got some good insights that I don't want to miss....