T-SQL : Finding Fill Rate of Table
Introduction
The Fill Rate is one of the most important metric in many of the business related decisions. So Knowing Fill Rate is one of the essential tasks in Data Analysis and Decision making in business processes. In this article, we're going to explain how to find the Fill Rate of a Table using T-SQL Queries.
Scenario
Observing Fill Rate is the key factor for knowing and improving data quality that creates the fulfillment to both our management as well as our end users. Need to find Fill Rate of a Table, How to do it?
What is meant by Fill Rate?
The Fill Rate is defined as the number of filled entries at each Column level that is divided by the total number of rows in a table. Depends on the total number of rows that is present in a table, we can figure out the Fill Rate easily.
Challenges
The Schema changes like below things makes our Fill Rate approach little difficult than actual.
- Table name changes
- Column name changes
- Data type changes
- Removing Existing columns
- Adding New Columns
Due to the above challenges, we cannot simply go for Static Solution to find Fill Rate of a table. Instead, we need something like Dynamic Approach to avoid our future re-works.
Prerequisite
In the below sample, we are going to use one stored procedure named 'Get_FillRate' for demo. If any one have the same object name in database, please make sure to change the below stored procedure name.
Sample Table Creation with Data Loading Script
--dropping temp table if exists
IF OBJECT_ID('TempDb..#TestEmp') IS NOT NULL
DROP TABLE #TestEmp;
CREATE TABLE #TestEmp
(
[TestEmp_Key] INT IDENTITY(1,1) NOT NULL,
[EmpName] VARCHAR(100) NOT NULL,
[Age] INT NULL,
[Address] VARCHAR(100) NULL,
[PhoneNo] VARCHAR(11) NULL,
[Inserted_dte] DATETIME NOT NULL,
[Updated_dte] DATETIME NULL,
CONSTRAINT [PK_TestEmp] PRIMARY KEY CLUSTERED
(
TestEmp_Key ASC
)
);
GO
INSERT INTO #TestEmp
(EmpName,Age,[Address],PhoneNo,Inserted_dte)
VALUES
('Arul',24,'xxxyyy','1234567890',GETDATE()),
('Gokul',22,'zzzyyy',NULL,GETDATE()),
('Krishna',24,'aaa','',GETDATE()),
('Adarsh',25,'bbb','1234567890',GETDATE()),
('Mani',21,'',NULL,GETDATE()),
('Alveena',20,'ddd',NULL,GETDATE()),
('Janani',30,'eee','',GETDATE()),
('Vino',26,NULL,'1234567890',GETDATE()),
('Madhi',25,'ggg',NULL,GETDATE()),
('Ronen',25,'ooo',NULL,GETDATE()),
('Visakh',25,'www',NULL,GETDATE()),
('Jayendran',NULL,NULL,NULL,GETDATE());
GO
SELECT [TestEmp_Key],[EmpName],[Age],[Address],[PhoneNo],[Inserted_dte],[Updated_dte] FROM #TestEmp;
GO
Temp Table - #TestEmp
SQL Procedure For Finding Fill Rate in a Table - Dynamic Approach
Input Parameters
Both of the Input Parameters are mandatory.
1. @p_TableName - Data type used for this input Parameter is NVARCHAR(128) and Nullability is NOT NULL.
2. @p_Include_BlankAsNotFilled - Data type used for this input Parameter is BIT and Nullability is NOT NULL and either 0 or 1 needs to give. 0 is by Default and 0 means OFF. 1 is ON (when given as 1 - Blank entries will be considered As Not Filled Data).
Output Columns
There are Two output Columns. both of those are Non Nullable Output Columns.
1. [Column Name] - Data type used for this Output Column is sysname and Nullability is NOT NULL. All the Column Names for the user given Table Name would come as row values.
2. [Fill Rate (%)] - Data type used for this Output Column is DECIMAL(5,2) and Nullability is NOT NULL. Values from 0.00 to 100.00 would come in result with respective Column Names.
Info reg Stored Procedure
- Created the store Procedure named - 'Get_FillRate'.
- To avoid the number of rows returned, set NOCOUNT as ON.
- Try, Catch Blocks are added for error handling's.
- To read Uncommitted Modifications, set TRANSACTION ISOLATION LEVEL as READ UNCOMMITTED.
- Parameter Sniffing Concept is also included.
- Some handling's done on the Table Name input parameters to support user typing table name formats like '.table_name','..table_name','...table_name','table_name','[table_name]','dbo.table_name','dbo.[table_name]','[dbo].[table_name]' etc.,
- Validation is included at the start, when user gives other than 'table name', stored procedure would throw 'Table not exists in this Database' as error message.
- System table named SYS.OBJECTS and SYS.COLUMNS and System View named INFORMATION_SCHEMA.COLUMNS are used inside the stored procedure.
- ORDINAL_POSITION from INFORMATION_SCHEMA.COLUMNS is used, to return the result set with the same column order that the table structure already has.
- COLLATION_NAME from INFORMATION_SCHEMA.COLUMNS is used, to support conditions like blank is either need to consider or not, as not filled entries.
- COLUMN_NAME from INFORMATION_SCHEMA.COLUMNS is used, to show the final result set with respective fill rates.
- Dynamic Query is used, to support dynamic approach and this would avoid all the challenges that would come in static solutions like schema changes.
- Both Method 1(Dynamic Query with WHILE LOOP) and Method 2(Dynamic Query with UNION ALL) produces same result sets and carries same functionality where some metrics like CPU time,Elapsed Time,Logical reads that are better in Method 2.
Method 1 - With the use of WHILE Loop
/******************************************************************************************************/
-- Created By : TECHNET WIKI MEMBER
-- Version : 1.0
-- Created On : 20/05/2020
-- Description: --To Fetch the List of all columns of a table with their fill rates in %.
/******************************************************************************************************/
CREATE OR ALTER PROCEDURE [dbo].[Get_FillRate]
(
@p_TableName NVARCHAR(128),
@p_Include_BlankAsNotFilled BIT = 0 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
--Parameter Sniffing
DECLARE @TableName NVARCHAR(128),
@Include_BlankAsNotFilled BIT,
@ColumnName NVARCHAR(128),
@R_NO INT,
@DataType_Field BIT,
@i INT, --Iteration
@RESULT NVARCHAR(MAX);
SELECT @TableName = @p_TableName,
@Include_BlankAsNotFilled = @p_Include_BlankAsNotFilled,
@i = 1;
--To Support some of the table formats that user typing.
SELECT @TableName =REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@TableName,'[',''),']',''),'dbo.',''),'...',''),'..',''),'.','');
--validation
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U' AND [NAME]=@TableName )
BEGIN
SELECT Result = 1 , Reason ='Table not exists in this Database' ;
RETURN 1;
END;
--dropping temp table if exists - for debugging purpose
IF OBJECT_ID('TempDb..#Temp') IS NOT NULL
DROP TABLE #Temp;
IF OBJECT_ID('TempDb..#Columns') IS NOT NULL
DROP TABLE #Columns;
--temp table creations
CREATE TABLE #Temp
(
[R_NO] INT NOT NULL,
[ColumnName] NVARCHAR(128) NOT NULL,
[FillRate] DECIMAL(5,2) NOT NULL
PRIMARY KEY CLUSTERED (ColumnName)
);
CREATE TABLE #Columns
(
[R_NO] INT NOT NULL,
[Name] [sysname] NOT NULL,
[DataType_Field] BIT NOT NULL
PRIMARY KEY CLUSTERED ([Name])
);
INSERT INTO #Columns ([R_NO],[Name],[DataType_Field])
SELECT
COLUMN_ID,
[Name],
IIF(collation_name IS NULL,0,1)
FROM SYS.COLUMNS WHERE OBJECT_ID = OBJECT_ID(@TableName);
WHILE @i <= ( SELECT MAX(R_NO) FROM #Columns) --Checking of Iteration till total number of columns
BEGIN
SELECT @DataType_Field=DataType_Field,@ColumnName=[Name],@R_NO=[R_NO] FROM #Columns WHERE R_NO = @i;
SET @RESULT =
'INSERT INTO #Temp ([R_NO],[ColumnName], [FillRate]) ' +
'SELECT ' + QUOTENAME(@R_NO,CHAR(39)) + ',
''' + @ColumnName + ''',
CAST((100*(SUM(
CASE WHEN ' +
CASE
WHEN @Include_BlankAsNotFilled = 0
THEN '[' + @ColumnName + '] IS NOT NULL'
WHEN @DataType_Field = 0
THEN '[' + @ColumnName + '] IS NOT NULL'
ELSE 'ISNULL([' + @ColumnName + '],'''')<>'''' ' END +
' THEN 1 ELSE 0 END)*1.0 / COUNT(*)))
AS DECIMAL(5,2))
FROM ' + @TableName;
--PRINT(@RESULT); --for debug purpose
EXEC(@RESULT);
SET @i += 1; -- Incrementing Iteration Count
END;
--Final Result Set
SELECT
ColumnName AS [Column Name],
FillRate AS [Fill Rate (%)]
FROM #TEMP
ORDER BY [R_NO];
RETURN 0;
END TRY
BEGIN CATCH --error handling even it is fetching stored procedure
SELECT
ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
RETURN 1;
END CATCH;
END;
Execute this stored procedure - Method 1 by passing the table name like below
Execute like below if we need to consider NULL values alone as not filled
EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=0;
Execute like below if we need to consider both NULL values and empty/blank values as not filled
EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=1;
Method 1 -Output
Method 2 - With the use of UNION ALL
/******************************************************************************************************/
-- Created By : TECHNET WIKI MEMBER
-- Version : 2.0
-- Created On : 20/05/2020
-- Description: - Version 1.0 - To Fetch the List of all columns of a table with their fill rates in %.
-- 20/05/2020 - Version 2.0 - TECHNET WIKI MEMBER - Made Changes by avoiding while loop.
/******************************************************************************************************/
CREATE OR ALTER PROCEDURE [dbo].[Get_FillRate]
(
@p_TableName NVARCHAR(128),
@p_Include_BlankAsNotFilled BIT = 0 -- 0-OFF(Default); 1-ON(Blank As Not Filled Data)
)
AS
BEGIN
BEGIN TRY
SET NOCOUNT ON;
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
--Parameter Sniffing
DECLARE @TableName NVARCHAR(128),
@Include_BlankAsNotFilled BIT,
@RESULT NVARCHAR(MAX);
SELECT @TableName = @p_TableName,
@Include_BlankAsNotFilled = @p_Include_BlankAsNotFilled,
@RESULT = '';
--To Support some of the table formats that user typing.
SELECT @TableName =REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@TableName,'[',''),']',''),'dbo.',''),'...',''),'..',''),'.','');
--validation
IF NOT EXISTS(SELECT 1 FROM SYS.OBJECTS WHERE [TYPE]='U' AND [NAME]=@TableName )
BEGIN
SELECT Result = 1 , Reason ='Table not exists in this Database' ;
RETURN 1;
END;
--dropping temp table if exists - for debugging purpose
IF OBJECT_ID('TempDb..#Columns') IS NOT NULL
DROP TABLE #Columns;
--temp table creations
CREATE TABLE #Columns
(
[ORDINAL_POSITION] INT NOT NULL,
[COLUMN_NAME] [sysname] NOT NULL,
[DataType_Field] BIT NOT NULL,
[TABLE_NAME] [sysname] NOT NULL
PRIMARY KEY CLUSTERED ([ORDINAL_POSITION],[COLUMN_NAME])
);
INSERT INTO #Columns ([ORDINAL_POSITION],[COLUMN_NAME],[DataType_Field],[TABLE_NAME])
SELECT
[ORDINAL_POSITION],
[COLUMN_NAME],
CASE WHEN COLLATION_NAME IS NOT NULL THEN 1 ELSE 0 END,
[TABLE_NAME]
FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME =@tablename; --Using System_View
--Final Result Set
SELECT @RESULT = @RESULT+ N'SELECT '''+C.COLUMN_NAME+''' AS [Column Name],
CAST((100*(SUM(
CASE WHEN ' +
CASE
WHEN @include_blankasnotfilled = 0
THEN '[' + C.COLUMN_NAME + '] IS NOT NULL'
WHEN C.[DataType_Field]=0
THEN '[' + C.COLUMN_NAME + '] IS NOT NULL'
ELSE 'ISNULL([' + C.COLUMN_NAME + '],'''')<>'''' ' END +
' THEN 1 ELSE 0 END)*1.0 / COUNT(*)))
AS DECIMAL(5,2)) AS [Fill Rate (%)]
FROM '+C.TABLE_NAME+' UNION ALL '
FROM #Columns C;
SET @RESULT=LEFT(@RESULT,LEN(@RESULT)-10); --To Omit 'Last UNION ALL '.
--PRINT(@RESULT); --for debug purpose
EXEC(@RESULT);
RETURN 0;
END TRY
BEGIN CATCH --error handling even it is fetching stored procedure
SELECT
ERROR_NUMBER() AS ErrorNumber
,ERROR_SEVERITY() AS ErrorSeverity
,ERROR_STATE() AS ErrorState
,ERROR_PROCEDURE() AS ErrorProcedure
,ERROR_LINE() AS ErrorLine
,ERROR_MESSAGE() AS ErrorMessage;
RETURN 1;
END CATCH;
END;
Execute this stored procedure - Method 2 by passing the table name like below
Execute like below if we need to consider NULL values alone as not filled
EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=0;
Execute like below if we need to consider both NULL values and empty/blank values as not filled
EXEC [Get_FillRate] @p_TableName='#TestEmp',@p_Include_BlankAsNotFilled=1;
Metrics Difference between Method 1 Vs Method 2
The below four metrics taken for consideration for knowing the difference between Method 1 Vs Method 2
No. of Query Sets in Exec Query Plan
Total CPU Time (in ms)
Total Elapsed Time (in ms)
Total Logical Reads
Applies To
- SQL Server
- Data Analysis
Conclusion
In conclusion, we have seen how to find the Fill Rate of a Table Using T-SQL Queries that is applicable to run in both AZURE and On-Premises SQL Databases. Thus, it would helps us to take business decisions effectively as well as immediately.
Future Scope
As a future scope, We can do some more enhancements (some of the ideas mentioned below) with this and can do many more things like supporting Fill Rates for
- Required column list of a table in a database (using exclude/include option and column_list as input parameter)
- Required column list in a database (by introducing separate input parameter for column name and supporting null in table name)
- Required list of tables in a database (by changing the data length in input parameter)
- All tables in a database (by removing the table name filter)