How to generate a unique 20 digit identifier in T-SQL

Anonymous 61 Reputation points
2020-12-11T16:57:10.29+00:00

Hi,

I am using the following T-SQL code to generate unique 20 digit number. This Unique 20 digit number will be used as import id when importing records.

SELECT REPLACE(STR(CAST(CAST(NEWID() AS binary(5)) AS bigint),16), ' ', 0) + 'UKWS'

Is there any way I can test that this will always generate a unique import id?

Let's say that I will import 10 million records in next 5 or 10 years. Is there any chance that above T-SQL will generate duplicate import ids?

Thanks,

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,682 questions
{count} votes

Accepted answer
  1. Erland Sommarskog 115.9K Reputation points MVP
    2020-12-11T22:53:00.067+00:00

    You say 20 digits, but it appears that you want 16 digits plus UKWS at the end. And what you generate is a 5-byte random number, so you will only get 12 full digits. The first three will always be a 0, and the fourth will most also be 0.

    Here is a better expression that gives you full 16 random digits:

    SELECT CAST(  abs(CAST(CRYPT_GEN_RANDOM(8) AS bigint) % CAST(1E16 AS bigint)) AS CHAR(16)) + 'UKWS'
    

    You need to do some math to find out what the likelihood for duplicates are with 10 million rows. My gut feeling is that it may be OK, but maybe not. Viorel's test showed that 12 digits was not good enough.

    If you go by a sequence as suggested by Tom, it becomes a non-issue. The data type for the sequence in your case should probably be decimal(16,0).

    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Viorel 119.2K Reputation points
    2020-12-11T20:22:59.087+00:00

    It is not clear if you are interested in generating unique values or in validation of your method.

    If you did not check it yet, I made a test that inserts 10,000,000 values to a table according to shown approach. The test found 51 collisions; I hope that it was accurate. Therefore the method does not seem fully reliable.

    1 person found this answer helpful.

  2. Tom Phillips 17,741 Reputation points
    2020-12-11T18:35:25.83+00:00
    0 comments No comments

  3. David Browne - msft 3,851 Reputation points
    2020-12-11T23:05:17.783+00:00

    For generated random 64bit (8 byte) values, the chance of a collision within the first 10,000,000 values is less than 1/100,000.

    See eg https://preshing.com/20110504/hash-collision-probabilities for an overview of the math and this handy table:

    47552-image.png

    This is well below the threshold you really need to worry about, as an occasional collision would generate an isolated Primary Key violation failure.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.