MMM - Dogfood..... SSIS in Action
We use a really unattractive term here at Microsoft to describe the use of pre-release software internally to get our jobs done. We call it dog food. I think SteveB may have come up with this very descriptive term, as in "Eat your own.... dog food." Anyway, I've found it to be one of the best ways to learn how a product or technology actually works. I've just deployed my first dog food project on SQL Server 2005. It's a SharePoint data mart hosted in SQL Server 2005.
Basically we've got a bunch of data locked up in SharePoint sites relating to this program I am running known as Yukon Ascend. Data in these SharePoint sites are related to data in other sites like BetaPlace, ActiveDirectory, etc. Microsoft makes it so easy to build infrastructure like this, that before you know it you've got several different loosely coupled information stores tracking really critical information, but nowhere to tie it together.
Enter SQL Server 2005 and Visual Studio 2005. In just a couple of weeks, I was able to build a simple data mart that pulled together information from two SharePoint sites, Active Directory and BetaPlace into a single SQL Server 2005 database using SQL Server Integration Services (SSIS). Once I got all the data in one place, I was able to quickly build some relational reporting on top of it using SQL Server Reporting Services, and plan to add some analytics to it using SQL Server Analysis Services.
I learned quite a lot during this exercise so I thought I'd share some tidbits about the experience in my next few posts.
My first challenge was getting data out of SharePoint. Now SharePoint does a great job of exporting data to Excel interactively. But I couldn't use that format because there are some limitations on the kind of data that can be exported, and it needed to be done in an automated way using a batch process. With a little investigation I found that SharePoint supports a List Data Retrieval Web Service. All I needed to do was figure out a way to call this Web Service from within SSIS to extract the data I needed into XML.
Now SSIS 2005 has a Web Services task, and after a few attempts at using this in the SQL Server 2005 December CTP build, I realized it wasn't fully implemented yet, so I needed to find a different approach. The joys of dog food.
What I eventually settled on was to build a generic SharePoint command-line utility I call PortalExtractor using Visual Studio 2005. Basically you point this thing at a web reference for a list data retrieval service, and it dumps all the list data in the SharePoint site into XML files written to the file system. Two files are created for each list: an XML document with the exported list data, and an XSD schema for that document. A new directory is created for each run, allowing you to take distinct snapshots of the data on the SharePoint site.
Getting this utility to run inside SSIS was a snap. I just used the Execute Process Task. I didn't want to hard-wire the parameters for my command-line utility, but to make them configurable so they could be changed to reflect different environments when deploying the SSIS Package. To do this I defined package variables, then used container property expressions to dynamically construct the command-line parameters for my utility at runtime. I also discovered an extremely cool feature in SSIS known as Package Configurations. Basically this is a new feature which allows you to change most configurable properties for objects in an SSIS package at runtime using an XML Configuration file rather than passing in a thousand command line parameters when executing the package. Very very nice feature.
So now I had the raw data I needed in XML files on the file system. In my next post, I'll talk about how my SSIS package consumes these XML files at runtime.