Implementing an Master Data Management Solution With SQL Server Master Data Services and the MDS Utilities (Part 2 Loading the Model)
[This post uses the old staging functionality from MDS 2008 R2. With MDS 2012, you should be using entity-based staging which will provision the staging tables for you. See eg https://msdn.microsoft.com/en-us/sqlserver/Video/hh802433]
In the previous post in this series on using the MDS Utilities CodePlex project (https://mdsutilities.codeplex.com/ ) to jump-start and MDS implementation we created a simple MDS model by importing table and relationship definitions from the AdventureWorks sample database using the MdsModelImport.exe utility. In this post we’ll load data into that model from the same database, using first plain TSQL, and then a simple SSIS package.
To make this easier we’ll use another of the utilities: the mds_staging database. First download the “MDS Staging Database Creation Script” from the current project release. It’s a SqlCmd.exe script created by Visual Studio database tools, and you can run it either on the command line with sqlcmd.exe, or run it through SQL Server Management Studio in a Query Window set to SQLCMD mode. We’ll use SSMS.
Open the .sql file and switch the query window to SQLCMD mode (under the Query menu).
Then connect the query window to the SQL instance that has the MDS database, using an account that has permissions to create databases and administer the MDS database (probably a sysadmin). If you right-click anywhere in the query window you can change the what instance the query window is connected to. If you MDS database is not called ‘mds’ then you need to edit some of the synonym definitions in the script to match. Just change each occurrence of ‘[mds].’ with the appropriate database name.
Also at the top of the script you must supply the file location for the new database. Below I’ve entered the default folder for databases on a default instance of SQL 2008 R2 (you must have the ‘\’ at the end of the path):
Code Snippet
- /*
- Deployment script for mds_staging
- */
- GO
- SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON;
- SET NUMERIC_ROUNDABORT OFF;
- GO
- :setvar DatabaseName "mds_staging"
- :setvar DefaultDataPath "C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\"
- :setvar DefaultLogPath "C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\"
- GO
- USE [master]
Then the script to create the database.
Now that we’ve created the staging database, we need to generate the staging artifacts for our MDS model. We will create one view with an INSTEAD OF INSERT trigger for each entity in the model. Any time the MDS model changes we need to regenerate these artifacts. The artifacts are all generated/regenerated with a call to the GenerateStagingObjects stored procedure:
Code Snippet
- use mds_staging
- go
- exec dbo.GenerateStagingObjectsN'HumanResources'
For this model, running this will generate views for each entity and two model-specific stored procedures to control the staging process:
Let’s take a look at one of these views.
Code Snippet
- create view [HumanResources].[Department]
- as
- with atts as
- (
- select m.ModelName, m.EntityName, m.MemberName, m.MemberCode, ma.AttributeName, ma.AttributeValue
- from mdm.tblStgMember m
- left join mdm.tblStgMemberAttribute ma
- on m.EntityName = ma.EntityName
- and m.ModelName = ma.ModelName
- and m.MemberCode = ma.MemberCode
- where m.EntityName = 'Department'
- and m.ModelName = 'HumanResources'
- and m.Batch_ID is null
- ), pt as
- (
- select ModelName, EntityName, MemberCode, MemberName,
- [Name], [DepartmentID], [GroupName], [ModifiedDate]
- from atts
- PIVOT
- (
- MAX(AttributeValue)
- for AttributeName in
- (
- [Name], [DepartmentID], [GroupName], [ModifiedDate]
- )
- ) as pt
- )
- select MemberCode Code,
- coalesce(MAX([Name]), MAX(MemberName)) [Name], cast(MAX([DepartmentID]) as Decimal(23,4)) [DepartmentID],
- cast(MAX([GroupName]) as nvarchar(100)) [GroupName], cast(MAX([ModifiedDate]) as DateTime) [ModifiedDate]
- from pt
- group by ModelName, EntityName, MemberCode;
This view reads data from the MDS staging tables (tblStgMember and tblStageMemberAttribute) and PIVOT’s the data so you see each entity (here a Department) on a single row. Running this view shows you the current contents of the MDS staging tables for the HumanResources model and the Department entity.
This view also has an INSTEAD OF INSERT trigger that writes to the staging tables when the view is the target of an INSERT (or SSIS bulk load).
Code Snippet
- create trigger [HumanResources].[tg_Department]
- on [HumanResources].[Department]
- instead of insert
- as
- begin
- set nocount on
- insert into mdm.tblStgMember(UserName,ModelName,EntityName,MemberType_ID,MemberName,MemberCode)
- select null UserName,
- 'HumanResources' ModelName,
- 'Department' EntityName,
- 1 MemberType_ID,
- null MemberName,
- i.Code MemberCode
- from inserted i;
- with nv as
- (
- SELECT coalesce(cast([Code] as nvarchar(2000)),N'') [Code], coalesce(cast([DepartmentID] as nvarchar(2000)),N'') [DepartmentID], coalesce(cast([GroupName] as nvarchar(2000)),N'') [GroupName], coalesce(cast([ModifiedDate] as nvarchar(2000)),N'') [ModifiedDate], coalesce(cast([Name] as nvarchar(2000)),N'') [Name]
- from inserted
- ), pvt as
- (
- SELECT Code, AttributeName, AttributeValue
- FROM nv
- UNPIVOT
- (AttributeValue FOR AttributeName IN
- ([DepartmentID],[GroupName],[ModifiedDate],[Name])
- )AS unpvt
- )
- insert into mdm.tblStgMemberAttribute(UserName,ModelName,EntityName,MemberCode,AttributeName,AttributeValue,MemberType_ID)
- select
- cast(null as nvarchar(100)) UserName,
- cast(N'HumanResources' as nvarchar(50)) ModelName,
- cast(N'Department' as nvarchar(50)) EntityName,
- Code,
- AttributeName,
- coalesce(AttributeValue,N''),
- 1
- from pvt
- where coalesce(AttributeValue,N'') != N''
- or AttributeName in
- (N'Code',N'GroupName',N'Name'
- )
- ;
- end
The trigger inserts one row into tblStgMember for each row inserted into the view, and then UNPIVOTs the each row and inserts one row into tblStgMemberAttribute for each column for each row. Since the attributes of the Department entity are hard-coded as columns on this view, we need to regenerate the view whenever attributes are added or removed from the entity or new entities are added to the model.
To load data into the MDS model we just insert data into the view:
Code Snippet
- insert into mds_staging.HumanResources.Department(Code,Name,DepartmentID,GroupName,ModifiedDate)
- select CAST(DepartmentID as nvarchar(250)), Name, DepartmentID, GroupName, ModifiedDate
- from AdventureWorks.HumanResources.Department;
Remember in MDS every entity member is uniquely identified by a Code and every one has a Name, so in the loading query we need to supply values for the Code and Name. The Department table already has a Name column, so that’s easy. For the Code we need to generate a unique nvarchar(250) value based on the primary key of the row. Here the PK is just ID, so we cast that to nvarchar(250). If the table had a multi-column primary key, we would need to concatenate the primary key columns, probably using ‘.’ or ‘-‘ as a delimiter.
Then process the staged records:
Code Snippet
- exec mds_staging.HumanResources.ProcessStagedRecords
After that we will see the loaded entities (or any failures) in the MDS web site:
That’s it. We’ve loaded data into the model.
Ok, but often the master data needs to be loaded from other sources, how to do this in SSIS? It’s very simple, just a two things to remember. 1) Create the ‘Code’ derived column and 2) Set FIRE_TRIGGERS and disable external metadata validation on the Data Flow destination.
So create a new package with two OleDb Connection Managers, one pointing to the source, and one using SQL Native Client pointing to the mds_staging database. The control flow should have three items. First an Execute SQL task to clear out the staging table, then a Data Flow task to load the staging table, and last another Execute SQL task to process the staged records. It should look like this:
The Data Flow just needs an OleDb source, a Derived Column transformation and an OleDb destination. If your source is a database engine like SQL Server or Oracle, you can skip the Derived Column transformation and create the Code column in the source query in SQL. The Data Flow should look like this:
Here’s the OleDb destination properties:
ValidateExternalMetadata must be False or else SSIS will get confused about the updatablility of the view. “OpenRowset Using FastLoad” instructs SSIS to use the bulk load API to transfer the data to the view, and FIRE_TRIGGERS ensures that the INSTEAD OF INSERT trigger on the view actually fires (triggers are optional when performing a bulk load). Then just hit F5 to run the package:
It shows 16 rows loaded. And if you look in the MDS UI you will see the same 16 Departments in the Model. If you load a duplicate entity (ie with a matching Code attribute), MDS will give you a warning when processing the staged entity, and then will apply all the attributes you staged to the existing entity. In otherwords, MDS performs an “upsert” of the model data, so you don’t have to worry in your data loading process whether the entity already exists in the model.
That’s it. We created the mds_staging database, generate the staging artifacts for our model, and loaded an entity in the model both with TSQL and SSIS.
David
dbrowne_at_microsoft
Comments
- Anonymous
April 28, 2013
Hi. Sorry for asking a very obvious question but where exactly is the MDS Staging Create Database Script stored. I'vegot the 3 available downloads from the MDS Utilities page, but can't see any *.sql files that resembles the above ?