Share via


TransactSQL: Simulating IGNORE NULLs Functionality On FIRST_VALUE, LAST_VALUE Functions

Introduction

Sequencing of values is a common operation required to perform in many of the common use cases like finding first x values, last n values etc. There are numerous ways of implementing the sequencing logic. 

Since the 2005 days, one of the most popular and easier methods was to use a subquery based on APPLY operator to get associated values in sequence. 

However from  2012, there are direct functions available like FIRST_VALUE and LAST_VALUE which can be used for getting the first or the last values within a group based on a set of sequence criteria. `

Though LAST_VALUE and FIRST_VALUE are quite handy on many occasions, one of the main limitations for the functions have been the absence of **IGNORE_NULLS **support, as found in many other RDBMS. This makes it hard to use the functions in cases where intermixed NULL values are present as it doesn't handle NULLs gracefully.

This article explains a possible workaround which can be applied for the FIRST_VALUE and LAST_VALUE functions to handle NULL values gracefully and use them to return the first of the last non NULL value within a group

Illustration

The illustration given below is taken from a real project scenario with some sample dummy data replacing the actual ones.

The scenario on hand consists of operational data for a series of machines whose opening and closing reading needs to be captured for each execution cycle. 

The catch here is that machines may or may not give reading at the prescribed times based on the availability of input. The readings will not be in any kind of sequence. The requirement was to get the opening (first) and closing (last) reading for each operational cycle  for each of the machines.

This scenario can be simplified using a sample data as below. 

For the sake of this illustration, only the required columns and their values are included

declare @t table(
    ExecutionInstanceID uniqueidentifier,
    MachineID char(1),
    CaptureDate datetime,
    OutputValue Decimal(15,2)
)
  
INSERT @t
VALUES
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 10:20',NULL),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 11:48',NULL),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 12:08',30),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 14:50',115),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 16:20',76),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 18:02',68),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 20:10',NULL),
('934C75E6-FC26-47F6-A18C-806890F2980F','A','20180120 22:04',NULL),
  
 ('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180213 22:20',NULL),
('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180213 23:55',NULL),
('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180214 00:48',150),
('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180214 14:30',NULL),
('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180214 19:55',345),
('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180215 03:12',227),
('CCD50F31-733D-4F02-A520-FF4FBE4BD073','A','20180215 12:08',NULL),
  
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180311 16:44',NULL),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180311 20:19',NULL),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180311 22:43',NULL),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180312 07:13',380),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180312 09:35',446),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180312 11:45',135),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180313 00:10',324),
('8C7894AD-7D14-44F6-BE99-7720C5A7C638','B','20180313 02:23',NULL),
  
 ('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 10:20',NULL),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 11:32',224),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 12:43',NULL),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 13:34',115),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 14:55',NULL),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 16:32',213),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 18:42',304),
('35542D14-2E3A-4CF1-86D9-00CF0433EB85','C','20180125 23:24',NULL)

Now as per the requirement on hand, the solution for the above case would look like this

SELECT *,
    FIRST_VALUE(CaptureDate) 
        OVER (
        PARTITION BY  ExecutionInstanceID,MachineID
        ORDER BY  CaptureDate ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS  ExecutionInstanceStart,
    LAST_VALUE(CaptureDate) 
        OVER (
        PARTITION BY  ExecutionInstanceID,MachineID
        ORDER BY  CaptureDate ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS  ExecutionInstanceEnd,
    FIRST_VALUE(OutputValue) 
        OVER (PARTITION BY  ExecutionInstanceID,MachineID
        ORDER BY  CaptureDate  ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS  FirstValue,
    LAST_VALUE(OutputValue) 
        OVER (PARTITION BY  ExecutionInstanceID,MachineID
        ORDER BY  CaptureDate RANGE BETWEEN  UNBOUNDED PRECEDING AND  UNBOUNDED FOLLOWING
    ) AS  LastValue
FROM @t

The above solution when tried will provide us with the below result

On careful analysis of the above result, we can find that both first and last readings are returned as NULL values for every machine. 

This is because of the presence of NULL values towards beginning and end for each machine's execution cycle. But these are valid business scenarios as they represent instances within execution cycle of the machines where there was no visible output for reading. 

This use case clearly indicates the limitation which makes it not possible to use FIRST_VALUE and LAST_VALUE functions to get first and last readings. Presence of IGNORE_NULLS switch would have helped to get correct result here but unfortunately it's not implemented yet

Solution

Since FIRST_VALUE, LAST_VALUE functions cannot be applied for getting the solution for the scenario on hand, an alternate approach has to be followed for getting the required solution for the use case on hand.

The use case requires identifying the first non NULL value and last non NULL value within each group determined by the PARTITION BY clause for getting the first and last reading.

Taking into consideration the above points, we can rewrite the above query as:

SELECT ExecutionInstanceID,
MachineID,
CaptureDate,
OutputValue,
ExecutionInstanceStart,
ExecutionInstanceEnd,
MIN(CASE WHEN  ValuesBefore = 1 THEN OutputValue END) OVER (PARTITION BY  ExecutionInstanceID,
MachineID) AS  FirstReading,
MAX(CASE WHEN  ValuesAhead = 1 THEN  OutputValue END) OVER (PARTITION BY  ExecutionInstanceID,
MachineID)  AS  LastReading
FROM
(
SELECT *,
FIRST_VALUE(CaptureDate) OVER (PARTITION BY  ExecutionInstanceID,
MachineID
ORDER BY  CaptureDate  ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS ExecutionInstanceStart,
LAST_VALUE(CaptureDate) OVER (PARTITION BY  ExecutionInstanceID,
MachineID
ORDER BY  CaptureDate  ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS ExecutionInstanceEnd,
FIRST_VALUE(OutputValue) OVER (PARTITION BY  ExecutionInstanceID,
MachineID
ORDER BY  CaptureDate  ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FirstValue,
LAST_VALUE(OutputValue) OVER (PARTITION BY  ExecutionInstanceID,
MachineID
ORDER BY  CaptureDate ROWS  BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LastValue,
COUNT(CASE WHEN  OutputValue IS  NOT NULL  THEN OutputValue END) OVER (PARTITION BY  ExecutionInstanceID,
MachineID
ORDER BY  CaptureDate  ROWS  BETWEEN CURRENT  ROW AND UNBOUNDED FOLLOWING) AS  ValuesAhead,
COUNT(CASE WHEN  OutputValue IS  NOT NULL  THEN OutputValue END) OVER (PARTITION BY  ExecutionInstanceID,
MachineID
ORDER BY  CaptureDate   ROWS  BETWEEN UNBOUNDED PRECEDING AND CURRENT  ROW ) AS  ValuesBefore
FROM @t
)t

Now the query will return the expected results i.e initial and final reading along with each row. 

The logic applied is to check the count of non NULL values from each row upward and downward over the entire partition window. This is achieved by applying 

 clause to get the sliding window with respect to each row. 

The cases where it returns 1 indicates that's it's the first or the last non NULL value in the group. Hence putting a filter of 1 for the count retrieves the first and last reading.

The result of the query will look like the below

 

Conclusion

From the above illustration we can conclude the below

  • Functions LAST_VALUE and FIRST_VALUE can't be used to get last and first values within a group when data has intermixed NULL values.
  • The implementation of FIRST_VALUE and LAST_VALUE does not support IGNORE NULLs mode in SQLServer.
  • An alternate approach of applying COUNT over partition over current row to end of the group top or bottom will be able to identify the first or the last non null value within the group

 

See Also