Transact-SQL: Search for Missing Values within a Numerical Sequence
Scope
The purpose of this short article is to show a method for identifying, within a SQL Server table, gaps in a numerical sequence, to quickly locate absences of consequence within given data. This approach is useful when in need, for example, to verify that there are no missing documents in a given list (think of the possibility of having to monitor a flow of invoices, when you must check if in a given range some of them are missing, basing your analysis on the document's registration number).
Basic concept
The script is based on a simple concept: if we could have a table which contains each number among those possible for the management of our case, we could compare our actual list (where some numbers will potentially not be present) with the ideal list, that contains them all, and then verify the deficiencies.
It is highly recommended to have a numbers table in each instance of the SQL Server. There is no reason to create new table each time, in most cases. Moreover, we can use one table for any query in any database in the server instance. There are lot of queries that can be improved by using this numbers table.
* Tip!
You can creates an Accessories database, which include accessories like numbers table, dates tables, CLR functions, T-SQL functions.... and more. This database should be on read only mode. This will make the queries faster with less locks and give more security. All users should have privileges to read from this database.
If you did not create a numbers table yet, then it is time to do it now!
---------------------------------------------------> Numbers table
CREATE TABLE AriNumbersTbl (Number int not null)
GO
INSERT INTO AriNumbersTbl(Number)
SELECT TOP 10000000 row_number() over(order by (select NULL)) as N
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
GO
ALTER TABLE AriNumbersTbl ADD CONSTRAINT PK_AriNumbersTbl PRIMARY KEY CLUSTERED (Number)
GO
Application
We'll now create a table to be used for our comparisons. Will create a table named Products, having two fields: the first is a hypothetical unique ID of product, while the second is a description of the product itself. In a table like this are going to insert some test records:
CREATE TABLE Products( ProductID INT PRIMARY KEY, ProductDes VARCHAR (50))
INSERT INTO Products VALUES(1 , 'Product 1' )
INSERT INTO Products VALUES(2 , 'Product 2' )
INSERT INTO Products VALUES(3 , 'Product 3' )
INSERT INTO Products VALUES(4 , 'Product 4' )
INSERT INTO Products VALUES(5 , 'Product 5' )
INSERT INTO Products VALUES(10 , 'Product 10' )
We have 5 consequential records, while the sixth has an ID = 10. Following the above logic, if we wish to spot potential missing of values, the result we should expect will be constituted by record having ID from 6 to 9, missing in the Products table but not in AriNumbersTbl. Lets see how to achieve this:
SELECT number
FROM AriNumbersTbl
WHERE number NOT IN (SELECT ProductID FROM Products) AND
number <= (SELECT MAX(ProductID) FROM Products )
This simple SELECT contains two subqueries in the WHERE clause. What we will do is extract from AriNumbersTbl table those values not contained (NOT IN) in the selection of all ID's values from Products table (SELECT ProductID FROM Products). If we stop here, we will obtain the values 6, 7, 8, 9, but also all those values between 11 and 99999, being them absent from the Products table, but not into AriNumbersTbl. As an upper limit, we can then put the maximum value readable from Products table: since the higher ID is 10 at the moment, probably we are not interested to spot values that will go beyond that limit, being obvious for them to satisfy the basic conditions for our analysis. So, we simply add a second condition to extract, among all the values of AriNumbersTbl, those that won't go beyond the maximum ID present in Products table (SELECT MAX(ProductID) FROM Products).
Executing our script in Management Studio, using the example data, we'll obtain a list like this:
Complete script
--> Don't forget that you need a numbers table which is not part of this script, since it is something that should be in any server instance.
-- Products Table creation and population
CREATE TABLE Products( ProductID INT PRIMARY KEY, ProductDes VARCHAR (50))
INSERT INTO Products VALUES(1 , 'Product 1' )
INSERT INTO Products VALUES(2 , 'Product 2' )
INSERT INTO Products VALUES(3 , 'Product 3' )
INSERT INTO Products VALUES(4 , 'Product 4' )
INSERT INTO Products VALUES(5 , 'Product 5' )
INSERT INTO Products VALUES(10 , 'Product 10' )
-- Identifying numerical holes in products list
SELECT *
FROM AriNumbersTbl
WHERE number NOT IN (SELECT ProductID FROM Products) AND
number <= (SELECT MAX( ProductID) FROM Products )
-- Deleting sample tables
DROP TABLE Products