SSIS: Delete DQS Projects Created from Running DQS Cleansing Activities
Introduction
When you use an SSIS package to run Data Quality Services Cleansing activities via the DQS Cleansing Component, each run produces a Data Quality Project. The resulting Data Quality project is useful for auditing the cleansed data produced from the SSIS package, and also to export the data if a copy is needed.
However, over time the DQS projects accumulate in the DQS_PROJECTS database, the database may grow large, and the projects may become too numerous to easily delete. If you need to find the largest projects for cleanup, use the script described in KB 2685743 to identify the projects and the size of each.
Manual Approach
In the Data Quality Client, click the button to open the Data Quality Project
Enumerate the list of DQS Cleansing Projects. The SSIS projects will have a unique naming convention with the Package Name, Transform Name, Run Date and Time, GUID.
Right click on a project to unlock it (red text means locked)
Deselect the unlocked project by moving up or down one row (this will refresh the menu), then right click on the unlocked project, and choose Delete.
Automated TSQL Script Approach
In the case where there are too many projects to manually delete, you may use the TSQL script described here to automate the cleanup for a certain date range.
1. The script uses a date and time range to target the project deletion to a scope of time.
2. The script bulk deletes Data Quality projects according to the ‘Type’ flag which is set to 3 by default (1 = KB, 2 = Cleansing project DQS Client, 3 = SSIS Project)
3. The script deletes project in both Locked and Unlocked states.
4. If a project fails to delete, the script continues to delete the remaining projects.
5. Printed text output shows progress of the deletions and any errors which occur.
Instructions
1. The Windows account executing the TSQL script should have a ‘dqs_administrator’ role; we recommend the account to have a sysadmin role on the box.
2. Run the script from SQL Server Management Studio while connected to SQL Server instance running the DQS instance (hosting the DQS_MAIN and DQS_PROJECTS databases)
3. Modify the ‘FromDate’ and ‘ToDate’ dates and times in the script to define the window for cleansing up the projects
TSQL Code
SET NOCOUNT ON
USE DQS_MAIN
DECLARE @FromDate datetime
DECLARE @ToDate datetime
DECLARE @ProjectId bigint
DECLARE @LockClientId bigint
DECLARE @DqProject varbinary(max)
DECLARE @ResultRecords varbinary(max)
,@ErrMessage VARCHAR(max)
,@rowcount INT
,@errCount INT = 0
--Update From date and To date here before execution of script
SELECT @FromDate = CAST('2012-10-19 00:00:01.001' AS datetime)
SELECT @ToDate = CAST('2012-10-19 23:59:59.997' AS datetime)
PRINT '***************************************************************'
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Executing script for date range ' + CAST(@FromDate AS VARCHAR(MAX)) + ' to ' + CAST(@ToDate AS VARCHAR(MAX))
DECLARE DELETE_PROJECTS_CURSOR CURSOR FOR
SELECT [ID], ISNULL([LOCK_CLIENT_ID],-1)
FROM [DQS_Main].[dbo].[A_KNOWLEDGEBASE]
WHERE [TYPE] = 3 -- BatchDQProject, projects that are generated by SSIS packages
AND [CREATE_DATE] BETWEEN @FromDate AND @ToDate
OPEN DELETE_PROJECTS_CURSOR
FETCH NEXT FROM DELETE_PROJECTS_CURSOR
INTO @ProjectId, @LockClientId
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN TRY
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Operating on Project: [' + CAST(@ProjectId AS VARCHAR(MAX)) +']'
EXECUTE [KnowledgebaseManagement].[SetDataQualitySession] @clientId=@LockClientId, @knowledgebaseId=NULL
IF (@LockClientId != -1)
BEGIN
EXECUTE [KnowledgebaseManagement].[DQProjectGetById] @ProjectId,@DqProject OUTPUT
EXECUTE [KnowledgebaseManagement].[DQProjectExit] @DqProject,@ResultRecords OUTPUT
END
-- delete project's activity archive
DELETE FROM [dbo].[A_PROFILING_ACTIVITY_ARCHIVE]
WHERE [ACTIVITY_ID] IN (SELECT ID FROM [dbo].[A_KNOWLEDGEBASE_ACTIVITY] WHERE [KNOWLEDGEBASE_ID] = @ProjectId)
-- refresh the project state
EXECUTE [KnowledgebaseManagement].[DQProjectGetById] @ProjectId,@DqProject OUTPUT
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Deleting project: [' + CAST(@ProjectId AS VARCHAR(MAX)) +']'
EXECUTE [KnowledgebaseManagement].[DQProjectDelete] @DqProject
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Deleted project: [' + CAST(@ProjectId AS VARCHAR(MAX)) +']'
END TRY
BEGIN CATCH
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'An error has occurred with the following details'
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Error ' + CONVERT(varchar(50), ERROR_NUMBER()) +
', Severity ' + CONVERT(varchar(5), ERROR_SEVERITY()) +
', State ' + CONVERT(varchar(5), ERROR_STATE()) +
', Procedure ' + ISNULL(ERROR_PROCEDURE(), '-') +
', Line ' + CONVERT(varchar(5), ERROR_LINE());
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Error Message: ' + ERROR_MESSAGE();
SELECT @errCount = @errCount + 1
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Skipping this project because of errors'
END CATCH
FETCH NEXT FROM DELETE_PROJECTS_CURSOR INTO @ProjectId, @LockClientId
END
IF @errCount > 0
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Script completed with ' + CAST(@errCount AS VARCHAR(MAX)) + ' errors'
ELSE
PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Script completed successfully'
PRINT '***************************************************************'
BEGIN TRY
CLOSE DELETE_PROJECTS_CURSOR
DEALLOCATE DELETE_PROJECTS_CURSOR
END TRY
BEGIN CATCH
--Do nothing
END CATCH