T-SQL: Random Equal Distribution
Introduction
Recent question in MSDN Transact-SQL forum Update value in one table with value from looping through another table prompted writing of this article and exploration of the equal random distribution problem.
Problem Definition
In the aforementioned thread the topic starter wanted to randomly distribute claims among the personnel. Each user needed to receive an equal (or nearly equal) amount of claims from the unassigned pool of claims.
In order to simulate the environment I created the following tables and populated them with this code:
IF OBJECT_ID(N'Claims', N'U') IS NOT NULL DROP TABLE dbo.Claims;
CREATE TABLE [dbo].[Claims] (
[ClaimID] INT identity(1, 1)
,[PaidAmt] [money] NOT NULL
,[PaidDate] [datetime] NULL
,[Notes] [varchar](max) NULL
,[CompleteUser] INT NULL
,[CompleteDate] [datetime] NULL
,[Tracking] [bit] NULL
,[FollowUpUser] INT NULL
,[FollowUpDate] [datetime] NULL
,CONSTRAINT [PK_Claims] PRIMARY KEY CLUSTERED ([ClaimID] ASC) WITH (
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY]
DECLARE @Loop INT = 1;
WHILE @Loop <= 520
BEGIN
INSERT INTO dbo.Claims (PaidAmt)
VALUES (@Loop * 10)
SET @Loop = @Loop + 1;
END
IF OBJECT_ID(N'Staff', N'U') IS NOT NULL DROP TABLE dbo.Staff;
CREATE TABLE dbo.Staff (
UserID INT identity(1, 1)
,FirstName VARCHAR(25)
,LastName VARCHAR(30)
,CONSTRAINT [PK_Staff] PRIMARY KEY CLUSTERED ([UserID] ASC) WITH (
PAD_INDEX = OFF
,STATISTICS_NORECOMPUTE = OFF
,IGNORE_DUP_KEY = OFF
,ALLOW_ROW_LOCKS = ON
,ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY]
SET @Loop = 1;
WHILE @Loop <= 30
BEGIN
INSERT INTO dbo.Staff (
FirstName
,LastName
)
VALUES (
'FName ' + cast(@Loop AS VARCHAR(10))
,'LName ' + cast(@Loop AS VARCHAR(10))
);
SET @Loop = @Loop + 1;
END
As you see, with the code above we created Staff and Claims tables where Staff has 30 rows and Claims has 520 rows. You may want to use different numbers for Staff and Claims tables, the solution still will be the same.
Now, given the above tables we want to update FollowUpUser and FollowUpDate columns assigning random claims to the users (for users we would order them in the UserID order, so first 10 users will get one extra claim).
Solution
To achieve random distribution of claims we would order them based on NewID() function. This is well-known trick of producing random order. I would also mention that we can not use RAND() function directly to achieve random order as in case of RAND() function every row will get the same number.
DECLARE @TotalUsers INT = (
SELECT count(*)
FROM dbo.Staff
);
WITH cteClaims
AS (
SELECT *
,row_number() OVER (
ORDER BY newID()
) AS Rn
FROM Claims
)
,cteUsersOrdered
AS (
SELECT *
,row_number() OVER (
ORDER BY UserId
) AS Rn
FROM dbo.Staff
)
MERGE cteClaims AS Target
USING cteUsersOrdered AS Source
ON (Target.Rn - 1) % @TotalUsers + 1 = Source.Rn
WHEN MATCHED
THEN
UPDATE
SET FollowUpUser = Source.UserID,
FollowUpDate = CURRENT_TIMESTAMP;
Using this formula: (ClaimsRow - 1) % Number Of Users + 1 we achieve the desired result where all claims are distributed.
In order to verify our results we can use the following select statement:
SELECT COUNT(C.ClaimID) as claimsCount, S.UserID,
S.FirstName, S.LastName from dbo.Claims C
INNER JOIN dbo.Staff S ON C.FollowUpUser = S.UserID
GROUP BY S.UserID, S.FirstName, S.LastName
ORDER by claimsCount DESC;
which produced desired result:
Conclusion
This article demonstrated solution for a common problem of random equal distribution using set-based approach.
See Also
This article participated in the TechNet Guru for May 2014 competition and won the Gold Prize.