Transact-SQL: Get the first and last entry added per user
Introduction
In this article, we will explain two different queries to get the first and last entry added in a specific table per user.
We will also compare the 2 queries from the below perspectives
- Result.
- Disk Usage.
- Execution time.
Scenario
Consider you have a [Transaction] table to store multiple transaction for different users daily!
The [Transaction] table structure is simple like the below one:
Now, you need to
- Get First transaction added per user.
- Get Last transaction added per user.
SQL: Get the first transaction added per user | Inner Join Approach
First, we will get all required field from the main table [Transaction] as shown below:
SELECT [Transactions].UserName,
[Transactions].TransDate,
[Transactions].Amount
FROM [dbo].[Transactions]
The result is
Second, we will get all records aggregated with Minimum function on the [TransDate] grouped by [UserName] field.
SELECT [UserName],
Min([TransDate]) AS MinDate
FROM [dbo].[Transactions]
GROUP BY [UserName]
The result is
Additionally, we will use "WITH" clause to give the above query a block name (MinTransTable)
WITH MinTranstable AS
(SELECT [UserName],
min([TransDate]) AS MinDate
FROM [dbo]. [Transactions]
GROUP BY [UserName])
Finally, we will perform inner join between main [Transaction] table and [MinTransTable] as the following:
WITH MinTranstable AS
(SELECT [UserName],
min([TransDate]) AS MinDate
FROM [dbo]. [Transactions]
GROUP BY [UserName])
SELECT [Transactions].UserName,
[Transactions].TransDate,
[Transactions].Amount
FROM [dbo].[Transactions]
INNER JOIN MinTranstable ON [Transactions].username = MinTranstable.username
AND [Transactions].TransDate = MinTranstable.MinDate
The final result is
SQL: Get the first transaction added per user | row_number() Approach
Before we getting started, let's first know:
What's the row_number() function ?
- It's a function that assigns a sequential integer to each row within a partition of a result set.
- You can use it with "PARTITION BY" to divide the result set into partitions per specific column, actually, it's an optional part, if you don't use it, the function will treat all rows of the query result set as a single group.
For more details, please check ROW_NUMBER (Transact-SQL).
Now, let's go through our query,
Get all rows from [Transaction] table with a sequential row number for each "[UserName]" ordered by "[Transaction Date]" Ascending.
SELECT row_number() over(PARTITION BY UserName; ORDER BY TransDate asc) AS ID,
[UserName],
[TransDate],
[Amount]
FROM [dbo].[Transactions] )
The result is
Use "WITH" clause to give the above query a block name (MinTransTable)
WITH MinTransTable AS
(SELECT row_number() over(PARTITION BY UserName
ORDER BY TransDate ASC) AS ID,
[UserName],
[TransDate],
[Amount]
FROM [dbo]. [Transactions])
Now, get all record filtered by the sequential number ID= 1 and the [Transaction date] is not null
WITH MinTransTable AS
(SELECT row_number() over(PARTITION BY UserName
ORDER BY TransDate ASC) AS ID,
[UserName],
[TransDate],
[Amount]
FROM [dbo]. [Transactions])
SELECT *
FROM MinTransTable
WHERE MinTransTable.ID = 1
AND MinTransTable.[TransDate] IS NOT NULL
SQL: Get the last transaction added per user | Inner Join Approach
Actually, it's the same query with a bit different, instead of using Min() , we will use Max() on the date field!
So the final query should be
WITH MaxTranstable AS
(SELECT [UserName],
MAX([TransDate]) AS MaxDate
FROM [dbo]. [Transactions]
GROUP BY [UserName])
SELECT [Transactions].UserName,
[Transactions].TransDate,
[Transactions].Amount
FROM [dbo].[Transactions]
INNER JOIN MaxTranstable ON [Transactions].username = MaxTranstable .username
AND [Transactions].TransDate = MaxTranstable .MaxDate
SQL: Get the last transaction added per user | row_number() Approach
Again, we will use the same query of getting the first entry, but we will order by the transaction date Descending.
WITH MaxTransTable AS
(SELECT row_number() over(PARTITION BY UserName
ORDER BY TransDate DESC) AS ID,
[UserName],
[TransDate],
[Amount]
FROM [dbo]. [Transactions])
SELECT *
FROM MaxTransTable
WHERE MaxTransTable .ID = 1
AND MaxTransTable .[TransDate] IS NOT NULL
Compare the performance between 2 queries
In this section, we will answer an important question in your head right now, Which query should you use?
Actually, there are different ways to measure the performance and execution time for your query
- The execution plan.
- SQL commands.
You can include actual execution plan during running your query from the tool bar as shown below:
Once you activate the "Actual execution plan", you will get a new tab in the result window that show in details the behavior and performance of your query as shown below:
You can save the execution plan and compare it with another execution plan by right-clicking on a blank area of the execution plan and click "Compare Showplan".
Note
In this article, we don't cover the execution plan in details, but you can learn more about it at Execution Plans
You can also append a SQL commands to your query to display information about the IO and Time as the following:
SET STATISTICS IO ON
It displays information about the amount of disk activity generated by Transact-SQL statements.
SET STATISTICS TIME ON
It displays the number of milliseconds required to parse, compile, and execute each statement.
DBCC DROPCLEANBUFFERS
It cleans all the buffers from the buffer pool to make sure that no pages for this table are loaded in memory.
Let's now append the above commands to our query to measure the execution time as the following:
Query (1)
SET STATISTICS IO ON
SET STATISTICS TIME ON
DBCC DROPCLEANBUFFERS ;
WITH MinTranstable AS
(SELECT [UserName],
min([TransDate]) AS MinDate
FROM [dbo]. [Transactions]
GROUP BY [UserName])
SELECT [Transactions].UserName,
[Transactions].TransDate,
[Transactions].Amount
FROM [dbo].[Transactions]
INNER JOIN MinTranstable ON [Transactions].username = MinTranstable.username
AND [Transactions].TransDate = MinTranstable.MinDate
Query (1) Result
(5 row(s) affected)
Table 'Transactions'. Scan count 2, logical reads 12, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 207 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Query (2)
SET STATISTICS IO ON
SET STATISTICS TIME ON
DBCC DROPCLEANBUFFERS
;
WITH MinTransTable AS
(SELECT row_number() over(PARTITION BY UserName
ORDER BY TransDate ASC) AS ID,
[UserName],
[TransDate],
[Amount]
FROM [dbo]. [Transactions])
SELECT *
FROM MinTransTable
WHERE MinTransTable.ID = 1
AND MinTransTable.[TransDate] IS NOT NULL
Query (2) Result
(4 row(s) affected)
Table 'Transactions'. Scan count 1, logical reads 1, physical reads 1, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 78 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
As per the above result, the second query is faster than the first query
Conclusion
In this article, we have explained how to
- Get the first and last entry added per each user in SQL.
- Measure the execution time for SQL query,
- Compare two execution plans.