CHECKSUM and BINARY_CHECKSUM might not return unique values for different inputs
CHECKSUM and BINARY_CHECKSUM create checksums for text inputs. Checksum values should be different for different input text. However, in some cases, returned values are not always unique for different inputs.
You can find patterns of inputs that return same values. This is similar to modulo operator that returns the same remainder if you find the right pattern of numbers.
As an example, if you run following queries you will find that these functions return 0 for all strings containing only letter 'a' that have lengths 32, 64, 96, n * 32 characters:
select CHECKSUM(REPLICATE('a',32)), CHECKSUM(REPLICATE('a',64)), CHECKSUM(REPLICATE('a',96))
select BINARY_CHECKSUM(REPLICATE('a',32)), BINARY_CHECKSUM(REPLICATE('a',64)), BINARY_CHECKSUM(REPLICATE('a',96))
So if your text contains only letter 'a' and has length 26000 (i.e. 812*32) it will be in the same bucket as other values in this example and you will get value 0.
However, if you use text with 33, 65, 97, n * 32 + 1 characters 'a', CHECKSUM will return 142, and BINARY_CHECKSUM will return 97:
select CHECKSUM(REPLICATE('a',33)), CHECKSUM(REPLICATE('a',65)), CHECKSUM(REPLICATE('a',97))
select BINARY_CHECKSUM(REPLICATE('a',33)), BINARY_CHECKSUM(REPLICATE('a',65)), BINARY_CHECKSUM(REPLICATE('a',97))
Conclusion: Do not assume that checksum will always return unique values for different texts. In some rare cases you can get the same values for text values that follows the same pattern.
Comments
- Anonymous
December 09, 2015
The comment has been removed