다음을 통해 공유


T-SQL : Finding Fill Rate of Table

Introduction

The Fill Rate is one of the most important metric in many of the business related decisions. So Knowing Fill Rate is one of the essential tasks in Data Analysis and Decision making in business processes. In this article, we're going to explain how to find the Fill Rate of a Table using T-SQL Queries.

Scenario

Observing Fill Rate is the key factor for knowing and improving data quality that creates the fulfillment to both our management as well as our end users. Need to find Fill Rate of a Table, How to do it?

What is meant by Fill Rate?

The Fill Rate is defined as the number of filled entries at each Column level that is divided by the total number of rows in a table. Depends on the total number of rows that is present in a table, we can figure out the Fill Rate easily.

Challenges

The Schema changes like below things makes our Fill Rate approach little difficult than actual.

  • Table name changes
  • Column name changes
  • Data type changes
  • Removing Existing columns
  • Adding New Columns

Due to the above challenges, we cannot simply go for Static Solution to find Fill Rate of a table. Instead, we need something like Dynamic Approach to avoid our future re-works.

Prerequisite

In the below sample, we are going to use one stored procedure named 'Get_FillRate' for demo. If any one have the same object name in database, please make sure to change the below stored procedure name.

Sample Table Creation with Data Loading Script

--dropping temp table if exists
IF OBJECT_ID('TempDb..#TestEmp') IS  NOT NULL
    DROP TABLE  #TestEmp;
  
CREATE TABLE  #TestEmp
(
    [TestEmp_Key] INT  IDENTITY(1,1) NOT  NULL,
    [EmpName] VARCHAR(100) NOT NULL,
    [Age] INT  NULL,
    [Address] VARCHAR(100) NULL,
    [PhoneNo] VARCHAR(11) NULL,
    [Inserted_dte] DATETIME NOT NULL,
    [Updated_dte] DATETIME NULL,
 CONSTRAINT [PK_TestEmp] PRIMARY KEY  CLUSTERED
 (
    TestEmp_Key ASC
 )
);
GO
   
INSERT INTO  #TestEmp
(EmpName,Age,[Address],PhoneNo,Inserted_dte)
VALUES
('Arul',24,'xxxyyy','1234567890',GETDATE()),
('Gokul',22,'zzzyyy',NULL,GETDATE()),
('Krishna',24,'aaa','',GETDATE()),
('Adarsh',25,'bbb','1234567890',GETDATE()),
('Mani',21,'',NULL,GETDATE()),
('Alveena',20,'ddd',NULL,GETDATE()),
('Janani',30,'eee','',GETDATE()),
('Vino',26,NULL,'1234567890',GETDATE()),
('Madhi',25,'ggg',NULL,GETDATE()),
('Ronen',25,'ooo',NULL,GETDATE()),
('Visakh',25,'www',NULL,GETDATE()),
('Jayendran',NULL,NULL,NULL,GETDATE());
GO
   
SELECT [TestEmp_Key],[EmpName],[Age],[Address],[PhoneNo],[Inserted_dte],[Updated_dte] FROM  #TestEmp; 
GO

Temp Table - #TestEmp

SQL Procedure For Finding Fill Rate in a Table - Dynamic Approach

Input Parameters

Both of the Input Parameters are mandatory.

1. @p_TableName - Data type used for this input Parameter is NVARCHAR(128) and Nullability is NOT NULL.

2. @p_Include_BlankAsNotFilled - Data type used for this input Parameter is BIT and Nullability is NOT NULL and either 0 or 1 needs to give. 0 is by Default and 0 means OFF. 1 is ON (when given as 1 - Blank entries will be considered As Not Filled Data).

Output Columns

There are Two output Columns. both of those are Non Nullable Output Columns.

1. [Column Name] - Data type used for this Output Column is sysname and Nullability is NOT NULL. All the Column Names for the user given Table Name would come as row values.

2. [Fill Rate (%)] -  Data type used for this Output Column is DECIMAL(5,2) and Nullability is NOT NULL. Values from 0.00 to 100.00 would come in result with respective Column Names.

Info reg Stored Procedure

  •   Created the store Procedure named - 'Get_FillRate'.
  •   To avoid the number of rows returned, set NOCOUNT as ON.
  •   Try, Catch Blocks are added for error handling's. 
  •   To read Uncommitted Modifications, set TRANSACTION ISOLATION LEVEL as READ UNCOMMITTED.
  •   Parameter Sniffing Concept is also included.
  •   Some handling's done on the Table Name input parameters to support user typing table name formats like '.table_name','..table_name','...table_name','table_name','[table_name]','dbo.table_name','dbo.[table_name]','[dbo].[table_name]' etc.,
  •   Validation is included at the start, when user gives other than 'table name', stored procedure would throw 'Table not exists in this Database' as error message.
  •   System table named SYS.OBJECTS and SYS.COLUMNS and System View named INFORMATION_SCHEMA.COLUMNS are used inside the stored procedure.
  •   ORDINAL_POSITION from INFORMATION_SCHEMA.COLUMNS is used, to return the result set with the same column order that the table structure already has.
  •   COLLATION_NAME from INFORMATION_SCHEMA.COLUMNS is used, to support conditions like blank is either need to consider or not, as not filled entries.
  •   COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS is used, to show the final result set with respective fill rates.
  •   Dynamic Query is used, to support dynamic approach and this would avoid all the challenges that would come in static solutions like schema changes.
  •   Both Method 1(Dynamic Query with WHILE LOOP) and Method 2(Dynamic Query with UNION ALL) produces same result sets and carries same functionality where some metrics like CPU time,Elapsed Time,Logical reads that are better in Method 2.

Method 1 - With the use of WHILE Loop

/******************************************************************************************************/
-- Created By : TECHNET WIKI MEMBER
-- Version    : 1.0
-- Created On : 20/05/2020
-- Description: --To Fetch the List of all columns of a table with their fill rates in %.
/******************************************************************************************************/
CREATE OR  ALTER PROCEDURE  [dbo].[Get_FillRate]   
(
    @p_TableName                  NVARCHAR(128),
    @p_Include_BlankAsNotFilled   BIT  = 0 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
  
BEGIN TRY
   
SET NOCOUNT ON;
SET TRANSACTION  ISOLATION LEVEL  READ UNCOMMITTED;  
  
--Parameter Sniffing
DECLARE @TableName                  NVARCHAR(128),
        @Include_BlankAsNotFilled   BIT,
        @ColumnName                 NVARCHAR(128),
        @R_NO                       INT,
        @DataType_Field             BIT,
        @i                          INT, --Iteration
        @RESULT                     NVARCHAR(MAX);
  
SELECT  @TableName                  =   @p_TableName,
        @Include_BlankAsNotFilled   =   @p_Include_BlankAsNotFilled,
        @i                          =   1; 
          
--To Support some of the table formats that user typing.
SELECT @TableName   =REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@TableName,'[',''),']',''),'dbo.',''),'...',''),'..',''),'.','');        
   
--validation
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U'  AND [NAME]=@TableName )    
    BEGIN
        SELECT Result = 1 , Reason ='Table not exists in this Database' ;
        RETURN 1;
    END;
   
--dropping temp table if exists - for debugging purpose
IF OBJECT_ID('TempDb..#Temp') IS  NOT NULL
    DROP TABLE  #Temp;
IF OBJECT_ID('TempDb..#Columns') IS  NOT NULL
    DROP TABLE  #Columns;
  
--temp table creations
CREATE TABLE  #Temp
(
[R_NO] INT  NOT NULL,
[ColumnName] NVARCHAR(128) NOT NULL,
[FillRate] DECIMAL(5,2) NOT NULL
PRIMARY KEY  CLUSTERED (ColumnName)
);
  
CREATE TABLE  #Columns
(
[R_NO] INT  NOT NULL,
[Name] [sysname] NOT NULL,
[DataType_Field] BIT  NOT NULL
PRIMARY KEY  CLUSTERED ([Name])
);
   
INSERT INTO  #Columns ([R_NO],[Name],[DataType_Field])
SELECT
    COLUMN_ID,
    [Name],
    IIF(collation_name IS  NULL,0,1)
FROM SYS.COLUMNS WHERE OBJECT_ID = OBJECT_ID(@TableName);
   
WHILE @i <= ( SELECT  MAX(R_NO) FROM #Columns) --Checking of Iteration till total number of columns
    BEGIN
        SELECT @DataType_Field=DataType_Field,@ColumnName=[Name],@R_NO=[R_NO] FROM  #Columns WHERE  R_NO = @i;
   
          SET @RESULT =
            'INSERT INTO #Temp ([R_NO],[ColumnName], [FillRate]) ' +
            'SELECT ' + QUOTENAME(@R_NO,CHAR(39)) + ',
                ''' + @ColumnName + ''',
                CAST((100*(SUM(
                    CASE WHEN ' +
                        CASE
                            WHEN @Include_BlankAsNotFilled = 0
                              THEN '[' + @ColumnName +  '] IS NOT NULL'
                            WHEN @DataType_Field = 0
                              THEN '[' + @ColumnName +  '] IS NOT NULL'
                            ELSE 'ISNULL([' + @ColumnName + '],'''')<>'''' ' END  +
                    ' THEN 1 ELSE 0 END)*1.0 / COUNT(*)))
                AS DECIMAL(5,2))
            FROM ' + @TableName;
   
        --PRINT(@RESULT); --for debug purpose
        EXEC(@RESULT);
   
        SET @i += 1; -- Incrementing Iteration Count
    END;
   
 --Final Result Set
    SELECT
      ColumnName AS  [Column  Name],
      FillRate AS  [Fill Rate (%)]
    FROM #TEMP
    ORDER BY  [R_NO];
   
    RETURN 0;
  
END TRY
BEGIN CATCH  --error handling even it is fetching stored procedure
    SELECT
         ERROR_NUMBER()     AS  ErrorNumber
        ,ERROR_SEVERITY()   AS  ErrorSeverity
        ,ERROR_STATE()      AS  ErrorState
        ,ERROR_PROCEDURE()  AS  ErrorProcedure
        ,ERROR_LINE()       AS  ErrorLine
        ,ERROR_MESSAGE()    AS  ErrorMessage;
    RETURN 1;
END CATCH;
  
END;

Execute this stored procedure - Method 1 by passing the table name like below

Execute like below if we need to consider NULL values alone as not filled

EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=0;

Execute like below if we need to consider both NULL values and empty/blank values as not filled

EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=1;

Method 1 -Output

Method 2 - With the use of UNION ALL

/******************************************************************************************************/
-- Created By : TECHNET WIKI MEMBER
-- Version    : 2.0
-- Created On : 20/05/2020
-- Description: - Version 1.0 - To Fetch the List of all columns of a table with their fill rates in %.
-- 20/05/2020 - Version 2.0 -  TECHNET WIKI MEMBER - Made Changes by avoiding while loop.
/******************************************************************************************************/
   
CREATE OR  ALTER PROCEDURE  [dbo].[Get_FillRate]   
(
    @p_TableName                  NVARCHAR(128),
    @p_Include_BlankAsNotFilled   BIT  = 0 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
  
BEGIN TRY
   
SET NOCOUNT ON;
SET TRANSACTION  ISOLATION LEVEL  READ UNCOMMITTED;  
  
--Parameter Sniffing
DECLARE @TableName                  NVARCHAR(128),
        @Include_BlankAsNotFilled   BIT,
        @RESULT                     NVARCHAR(MAX);
  
SELECT  @TableName                  =   @p_TableName,
        @Include_BlankAsNotFilled   =   @p_Include_BlankAsNotFilled,
        @RESULT                     =   '';
          
--To Support some of the table formats that user typing.
SELECT @TableName   =REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@TableName,'[',''),']',''),'dbo.',''),'...',''),'..',''),'.','');                
   
--validation
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U'  AND [NAME]=@TableName )    
    BEGIN
        SELECT Result = 1 , Reason ='Table not exists in this Database' ;
        RETURN 1;
    END;
  
--dropping temp table if exists - for debugging purpose
IF OBJECT_ID('TempDb..#Columns') IS  NOT NULL
    DROP TABLE  #Columns;
      
--temp table creations
CREATE TABLE  #Columns
(
[ORDINAL_POSITION] INT  NOT NULL,
[COLUMN_NAME] [sysname] NOT NULL,
[DataType_Field] BIT  NOT NULL,
[TABLE_NAME] [sysname] NOT NULL
PRIMARY KEY  CLUSTERED ([ORDINAL_POSITION],[COLUMN_NAME])
); 
  
INSERT INTO  #Columns ([ORDINAL_POSITION],[COLUMN_NAME],[DataType_Field],[TABLE_NAME])
SELECT
    [ORDINAL_POSITION],
    [COLUMN_NAME],
    CASE WHEN  COLLATION_NAME IS  NOT NULL  THEN 1 ELSE 0 END,
    [TABLE_NAME]
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME =@tablename; --Using System_View
   
--Final Result Set
SELECT @RESULT = @RESULT+ N'SELECT '''+C.COLUMN_NAME+''' AS [Column Name],
                            CAST((100*(SUM(
                                    CASE WHEN ' +
                                        CASE
                                            WHEN @include_blankasnotfilled = 0
                                              THEN '[' + C.COLUMN_NAME + '] IS NOT NULL'
                                            WHEN C.[DataType_Field]=0
                                              THEN '[' + C.COLUMN_NAME + '] IS NOT NULL'
                                            ELSE 'ISNULL([' + C.COLUMN_NAME + '],'''')<>'''' ' END  +
                                    ' THEN 1 ELSE 0 END)*1.0 / COUNT(*)))
                            AS DECIMAL(5,2)) AS [Fill Rate (%)]
                        FROM '+C.TABLE_NAME+' UNION ALL '
FROM #Columns C;
  
SET @RESULT=LEFT(@RESULT,LEN(@RESULT)-10); --To Omit 'Last UNION ALL '.
  
--PRINT(@RESULT); --for debug purpose
EXEC(@RESULT);
   
    RETURN 0;
  
END TRY
BEGIN CATCH  --error handling even it is fetching stored procedure
    SELECT
         ERROR_NUMBER()     AS  ErrorNumber
        ,ERROR_SEVERITY()   AS  ErrorSeverity
        ,ERROR_STATE()      AS  ErrorState
        ,ERROR_PROCEDURE()  AS  ErrorProcedure
        ,ERROR_LINE()       AS  ErrorLine
        ,ERROR_MESSAGE()    AS  ErrorMessage;
    RETURN 1;
END CATCH;
  
END;

Execute this stored procedure - Method 2 by passing the table name like below

Execute like below if we need to consider NULL values alone as not filled

EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=0;

Execute like below if we need to consider both NULL values and empty/blank values as not filled

EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=1;

Method 2 -Output

Metrics Difference between Method 1 Vs Method 2

The below four metrics taken for consideration for knowing the difference between Method 1 Vs Method 2

  • No. of Query Sets in Exec Query Plan

  • Total CPU Time (in ms)

  • Total Elapsed Time (in ms)

  • Total Logical Reads

Applies To

  • SQL Server
  • Data Analysis

Conclusion

In conclusion, we have seen how to find the Fill Rate of a Table Using T-SQL Queries that is applicable to run in both AZURE and On-Premises SQL Databases. Thus, it would helps us to take business decisions effectively as well as immediately.

Future Scope

As a future scope, We can do some more enhancements (some of the ideas mentioned below) with this and can do many more things like supporting Fill Rates for

  • Required column list of a table in a database (using exclude/include option and column_list as input parameter)
  • Required column list in a database (by introducing separate input parameter for column name and supporting null in table name)
  • Required list of tables in a database (by changing the data length in input parameter)
  • All tables in a database (by removing the table name filter)

Reference

See Also