Solving Business Problems with SQL Server 2005 Data Mining
Data Mining is about exploring your data, finding patterns and performing predictions. Where querying and analysis tells you what, data mining tells you why and how. In SQL Server 2005, data mining is made accessible and easy to use through an integrated user interface, cross-product integration and familiar, standard APIs. From the two algorithms that existed in SQL Server 2000, there are now a total of seven algorithms available out of the box, and it's possible to extend this further using an SDK.
Data Organisation
How can we organise data to provide the most pertinent information? OLAP provides fast queries with aggregations, but the hierarchies are schema-driven. That makes it hard to know what hierarchies are meaningful or provide actionable information. "Intelligent" OLAP provides self-organising cubes with data-driven hierarchies. You can achieve this with a data mining dimension: from a cube, you generate a mining model that is used in turn to create a new dimension.
For example, the clustering algorithm takes the input data and attempts to segment it into groups. You could now rename those groups, before creating a data mining dimension and using them as dimension members for slicing data. Similarly, a decision trees algorithm could be used to generate a parent-child dimension. This enables us to determine (for example) which groups of customers are the most profitable, in a far more intelligent way than if we'd used OLAP alone without integrating data mining algorithms.
Data Cleansing
How do we clean data as we load it into the data warehouse? We can use DTS pipeline integration in combination with a DM clustering algorithm to separate out clean and bad data. This can help us detect spurious data members (for example, a parent younger than 10 years old).
The steps are as follows:
- Create a clustering model for the data you want to cleanse
- Create a DTS project
- Add a data source
- Add a Mining Query Transform using the PredictCaseLikelihood() function
- Do a conditional split based on the output.
Comments
- Anonymous
August 03, 2004
Hi,
I would like to know more about the Data Mining algorithms available in SQL Server 2005. Is there any neural network based prediction algorithm?.
I downloaded SQL Server 2005 Yukon Beta 2 from MSDN, Installed it, but the help pages does not cover Data Mining. In the "Data Mining" page I found a message "Information yet to come"
Pls help me,
Thanks in advance,
Dhanan
dhanan@ushustech.com - Anonymous
August 03, 2004
Dhanan,
That's a good question. We're still working on the data mining documentation as you can see, but I'll push the question around internally and find out whether there are any white papers or similar due for imminent release.
In the meantime, there's a good overview of the data mining capabilities in SQL Server 2005 at the following location:
http://www.microsoft.com/technet/prodtechnol/sql/2005/evaluate/dwsqlsy.mspx#XSLTsection126121120120
(Search for a heading entitled "Data Mining").
Hope this gives you some useful information,
Tim - Anonymous
July 11, 2005
I am a database administrator and an MBA student as well. I would like to know more about how to design a data mining infrastructure from the relational data to the data warehouse. I would like to develop a full solution on data mining and business intelligence by offering businesses a service that utilizes their existing OLTP databases for use in data mining pruposes. Any references?