Share via


SSIS: Delete DQS Projects Created from Running DQS Cleansing Activities

Introduction

When you use an SSIS package to run Data Quality Services Cleansing activities via the DQS Cleansing Component, each run produces a Data Quality Project. The resulting Data Quality project is useful for auditing the cleansed data produced from the SSIS package, and also to export the data if a copy is needed.

However, over time the DQS projects accumulate in the DQS_PROJECTS database, the database may grow large, and the projects may become too numerous to easily delete. If you need to find the largest projects for cleanup, use the script described in KB 2685743 to identify the projects and the size of each.

Manual Approach

In the Data Quality Client, click the button to open the Data Quality Project

Enumerate the list of DQS Cleansing Projects. The SSIS projects will have a unique naming convention with the Package Name, Transform Name, Run Date and Time, GUID.

Right click on a project to unlock it (red text means locked)

Deselect the unlocked project by moving up or down one row (this will refresh the menu), then right click on the unlocked project, and choose Delete.

 

Automated TSQL Script Approach

In the case where there are too many projects to manually delete, you may use the TSQL script described here to automate the cleanup for a certain date range.

1. The script uses a date and time range to target the project deletion to a scope of time.
2. The script bulk deletes Data Quality projects according to the ‘Type’ flag which is set to 3 by default (1 = KB, 2 = Cleansing project DQS Client, 3 = SSIS Project)
3. The script deletes project in both Locked and Unlocked states.
4.  If a project fails to delete, the script continues to delete the remaining projects.
5.  Printed text output shows progress of the deletions and any errors which occur.

Instructions

1. The Windows account executing the TSQL script should have a ‘dqs_administrator’ role; we recommend the account to have a sysadmin role on the box.
2. Run the script from SQL Server Management Studio while connected to SQL Server instance running the DQS instance (hosting the DQS_MAIN and DQS_PROJECTS databases)
3. Modify the ‘FromDate’ and ‘ToDate’ dates and times  in the script to define the window for cleansing up the projects

TSQL Code

SET NOCOUNT ON
USE DQS_MAIN
  
DECLARE @FromDate datetime
DECLARE @ToDate datetime
DECLARE @ProjectId bigint
DECLARE @LockClientId bigint
DECLARE @DqProject varbinary(max)
DECLARE @ResultRecords varbinary(max)
        ,@ErrMessage VARCHAR(max)
        ,@rowcount INT
        ,@errCount INT  = 0
  
--Update From date and To date here before execution of script
SELECT @FromDate = CAST('2012-10-19 00:00:01.001' AS  datetime)
SELECT @ToDate = CAST('2012-10-19 23:59:59.997' AS  datetime)
  
PRINT '***************************************************************'
PRINT CAST(GETDATE() AS  VARCHAR(MAX)) + ' :: ' +  'Executing script for date range '  + CAST(@FromDate AS  VARCHAR(MAX)) + ' to ' +   CAST(@ToDate  AS  VARCHAR(MAX))
  
DECLARE DELETE_PROJECTS_CURSOR CURSOR FOR
        SELECT [ID], ISNULL([LOCK_CLIENT_ID],-1)
        FROM [DQS_Main].[dbo].[A_KNOWLEDGEBASE]
        WHERE [TYPE] = 3 -- BatchDQProject, projects that are generated by SSIS packages
        AND [CREATE_DATE] BETWEEN @FromDate AND @ToDate
      
OPEN DELETE_PROJECTS_CURSOR
      
FETCH NEXT  FROM DELETE_PROJECTS_CURSOR
INTO @ProjectId, @LockClientId
  
WHILE @@FETCH_STATUS = 0
BEGIN
    BEGIN TRY
        PRINT CAST(GETDATE() AS  VARCHAR(MAX)) + ' :: ' +  'Operating on Project: ['  + CAST(@ProjectId AS  VARCHAR(MAX)) +']'
        EXECUTE [KnowledgebaseManagement].[SetDataQualitySession] @clientId=@LockClientId, @knowledgebaseId=NULL
  
        IF (@LockClientId != -1)
        BEGIN
            EXECUTE [KnowledgebaseManagement].[DQProjectGetById] @ProjectId,@DqProject OUTPUT
            EXECUTE [KnowledgebaseManagement].[DQProjectExit] @DqProject,@ResultRecords OUTPUT
        END
  
        -- delete project's activity archive
        DELETE FROM  [dbo].[A_PROFILING_ACTIVITY_ARCHIVE]
        WHERE [ACTIVITY_ID] IN (SELECT ID FROM [dbo].[A_KNOWLEDGEBASE_ACTIVITY] WHERE [KNOWLEDGEBASE_ID] = @ProjectId)
  
        -- refresh the project state
        EXECUTE [KnowledgebaseManagement].[DQProjectGetById] @ProjectId,@DqProject OUTPUT
        PRINT CAST(GETDATE() AS  VARCHAR(MAX)) + ' :: ' + 'Deleting project: [' + CAST(@ProjectId AS VARCHAR(MAX)) +']'
        EXECUTE [KnowledgebaseManagement].[DQProjectDelete] @DqProject
        PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Deleted project: [' + CAST(@ProjectId AS VARCHAR(MAX)) +']'
    END TRY
    BEGIN CATCH
        PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'An error has occurred with  the following details'
        PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Error ' + CONVERT(varchar(50), ERROR_NUMBER()) +
              ', Severity ' + CONVERT(varchar(5), ERROR_SEVERITY()) +
              ', State ' + CONVERT(varchar(5), ERROR_STATE()) +
              ', Procedure  ' + ISNULL(ERROR_PROCEDURE(), '-') +
              ', Line ' + CONVERT(varchar(5), ERROR_LINE());
        PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Error Message: ' + ERROR_MESSAGE();
        SELECT @errCount = @errCount + 1
        PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Skipping this project because of  errors'
    END CATCH                  
    FETCH NEXT FROM DELETE_PROJECTS_CURSOR INTO @ProjectId, @LockClientId
END
IF @errCount > 0
    PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Script completed with  ' + CAST(@errCount AS VARCHAR(MAX)) + ' errors'
ELSE
    PRINT CAST(GETDATE() AS VARCHAR(MAX)) + ' :: ' + 'Script completed successfully'
PRINT '***************************************************************'
  
BEGIN TRY
    CLOSE DELETE_PROJECTS_CURSOR
    DEALLOCATE DELETE_PROJECTS_CURSOR
END TRY
BEGIN CATCH
    --Do nothing
END CATCH

See Also

DQS Resources on TechNet Wiki