다음을 통해 공유


T-SQL: Random String

Introduction

In this article we are going to show several logistics on how to build a random string. This is very useful for maintenance tasks like testing (Populate large tables with random values), generate random password and so on...

If you have any other way of doing it, then you are most welcome to edit this article and give us your insight :-)

Solutions

Let's examine several solutions. Those solutions came from forum's users, and we will try to put them into perspective of advantages & disadvantages. We will close the article with conclusions and recommendations.  If you are just looking for the best solution then you can jump to the end.

1. Using NEWID as base string & NEWID to generate a random length

Basic Idea

  1. Create random string using the function NEWID (), this will give us a random 36 characters string.
  2. Create a random number using the function NEWID, as the string length.
  3. Cut the string using the function LEFT

Code

DECLARE @StringMaxLen int = 12
SELECT TOP (1000)
    LEFT (CAST  (NEWID () AS  NVARCHAR(MAX)) , ABS (CHECKSUM (NEWID ())) % @StringMaxLen + 1)
FROM SYS.OBJECTS A
CROSS JOIN SYS.OBJECTS B

Advantages & Disadvantages

  • A: Very fast executing
  • A: Very fast writing the code
  • A: No need to create UDF (if someone care about this)
  • D: NOT a Random solution!
    • The converting of NEWID to NVARCHAR generate a string with a specific format: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.
      There for we are going to get in the string always the same format. For example Ninth chain character will always be a dash...
  • D: NOT the same odds!
    • As explained above, we get more chance of getting a dash sign.
  • D: Limited to a maximum length of 36 characters!
    • We can't chose length more the the NEWID length (36 characters), and we probably want to build 4000 characters sometimes.
    • Theoretically, we can always join chains in order to get the length we need. This will fix the problem of "limited maximum length", but at the same time will create a problem of Inflexible, as we will need to build the query according to the maximum length.
  • D: Limit of characters that can get in results!
    • We can get only characters that might be in NEWID value. We can't get any other character! The only characters that we can get are: uppercase English, numbers, and the sign "-"

2. Using Clean NEWID as base string & NEWID to generate a random length.

Basic Idea

  1. Create random string using the function NEWID, this will give us a random 36 characters string.
  2. Clean the dash character, this will give us a random 32 characters string.
  3. Create a random number using the function NEWID, as the string length.
  4. Cut the string using the function LEFT

Code

DECLARE @StringMaxLen int = 12
SELECT TOP (1000)
    LEFT (REPLACE(CAST (NEWID () AS NVARCHAR(MAX)),'-','') , ABS (CHECKSUM (NEWID ())) % @StringMaxLen + 1)
FROM SYS.OBJECTS A
CROSS JOIN SYS.OBJECTS B
GO

Advantages & Disadvantages

  • advantages same as before.
  • disadvantage same as before basically, except the dash character problem but the max length is now only 32 characters.

3. Performing data manipulation on an existing data

Basic Idea

We can use an existing data, which is not random as a base string. Then we use text manipulation like "data scrambling", "data parsing", "random sorting" and so on, in order to get a "look like random data".

* This idea can be improved significantly scale by using an existing random data table!

Code

;WITH cte_1 AS(
    SELECT ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS  RN, t.name
    FROM sys.tables AS t
    CROSS JOIN sys.tables  AS tt
),
cte_2 AS(
    SELECT ROW_NUMBER() OVER (ORDER BY NEWID() ASC) AS  RN, t.name
    FROM sys.columns AS t
    CROSS JOIN sys.columns  AS tt
)
SELECT
    cte_1.name + cte_2.name AS RandomString1,
    REPLICATE(cte_1.name + cte_2.name,CASE WHEN  ABS (CHECKSUM (NEWID ())) % 4 = 0 THEN 1 ELSE ABS (CHECKSUM (NEWID ())) % 4 + 1 END) AS  RandomString2
FROM cte_1 
INNER JOIN cte_2 
ON cte_1.RN = cte_2.RN

In the example above we just used the tables name in the system as a base strings for manipulation. This is only an example as this idea (using existing data) can be done in any way and on any tables that we want.

Advantages & Disadvantages

  • D: NOT a Random solution & NOT the same odds!
    • The solution is based on existing data with the more manipulation we do, the more we can make this more "look like" a random data.
  • D: Limited to a maximum length of the existing data!
  • D: Limit of characters that can get in results!
    • We can get only characters that might be in the existing data.
  • D: Slow and inefficient as the number of manipulations on the text exceeds.

4. Using Random CHAR-> Using Loop to build a flexible string length

Basic Idea

We are using a UDF to create a single random string. the function get 2 parameters: (A) the maximum length of the String (B) Do we need to create a string as long as the maximum or randomly length.

Code

/******************************
 * Version("2.0.0.0")
 * FileVersion("2.0.0.0")
 * WrittenBy("Ronen Ariely")
 * WebSite("http://ariely.info/")
 * Blog("http://ariely.info/Blog/tabid/83/language/en-US/Default.aspx")
******************************/
CREATE function  [dbo].[ArielyRandomStringFunc_Ver2.0.0](@NumberOfChar int, @IsFixedLength bit  = true)
returns nvarchar(MAX)
WITH EXECUTE  AS CALLER
AS
begin
    DECLARE @TotalNumberOfCharToReturn int
     
    IF (@IsFixedLength = 1) 
        SET @TotalNumberOfCharToReturn = @NumberOfChar
    ELSE
        -- I am using my own random function
        -- you can read more about the reason here:
        -- Using side-effecting build in functions inside a UDF (your function)
        -- http://ariely.info/Blog/tabid/83/EntryId/121/Using-side-effecting-build-in-functions-inside-a-UDF-your-function.aspx
        SET @TotalNumberOfCharToReturn = CONVERT(int,(AccessoriesDB.dbo.ArielyRandFunc() * @NumberOfChar) +1)
 
    declare @Out as nvarchar(MAX) = ''
 
    declare @QQ01 as int = 0
    while @QQ01 < @TotalNumberOfCharToReturn begin
        set @QQ01 += 1
        -- This is in-secure Function as we chose any unicode character without filtering!
        -- I preferred to choose from secured characters list as I show in preview function (ver1.0.0)
        -- 
        -- 65535: Maximum UNICODE character value
        -- You can limit this value to your own language's values or your needs
        -- Numbers:           48 - 58
        -- English uppercase: 65 - 91
        -- English lowercase: 97 - 123
        -- Hebrew:            1488 - 1515
        select @Out += ISNULL(NCHAR(CAST(65535 * AccessoriesDB.dbo.ArielyRandFunc() AS INT)),'')
    end
    --print @Out
    RETURN @Out
end

Advantages & Disadvantages

  • A: Relatively fast.
  • A: Full Random
  • A: No length limit
  • A: No characters limit
  • D: No filtering option for security

5. Selecting from characters list-> Using Loop to build a flexible string length

Basic Idea

The basic idea is same as above, with the option for filtering characters, as we chose from a list.  We are choosing a random number in order to chose the character in that position on our list. We use a loop to build the entire string.

Code

/******************************
 * Version("1.0.0.0")
 * FileVersion("1.0.0.0")
 * WrittenBy("Ronen Ariely")
 * WebSite("http://ariely.info/")
 * Blog("http://ariely.info/Blog/tabid/83/language/en-US/Default.aspx")
******************************/
CREATE function  [dbo].[ArielyRandomStringFunc_Ver1.0.0](@NumberOfChar int, @IsFixedLength bit  = true)
returns nvarchar(MAX)
WITH EXECUTE  AS CALLER
AS
begin
    DECLARE @TotalNumberOfCharToReturn int
     
    IF (@IsFixedLength = 1) 
        SET @TotalNumberOfCharToReturn = @NumberOfChar
    ELSE
        -- I am using my own random function
        -- you can read more about the reasons here:
        -- Using side-effecting build in functions inside a UDF (your function)
        -- http://ariely.info/Blog/tabid/83/EntryId/121/Using-side-effecting-build-in-functions-inside-a-UDF-your-function.aspx
        SET @TotalNumberOfCharToReturn = CONVERT(int,(AccessoriesDB.dbo.ArielyRandFunc() * @NumberOfChar) +1)
 
    DECLARE @AllChar as nvarchar(MAX) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890אבגדהוזחטיכלמסעפצקרשת'
    -- it is faster to use a table with ID column and char column,
    -- index by ID column
    -- order the table in random order
    -- get the top @NumberOfChar records
    -- but in this case we will get a select without repeating a value!
    -- if we want "with repeating" then we have to re-chose each character separately
 
    -- We can just use select number and chose char(random number) to get random char
    -- It is faster
    -- But this is in-secure and I preferred to chose from secured characters list here
 
    declare @MyRnd as int
    declare @Out as nvarchar(MAX) = ''
 
    declare @QQ01 as int = 0
    while @QQ01 < @TotalNumberOfCharToReturn begin
       set @QQ01 += 1
       set @MyRnd = (SELECT ((RandNumber * LEN(@AllChar)) +1) FROM ArielyRandView)
       select @Out += SUBSTRING(@AllChar,@MyRnd,1)
       --print SUBSTRING(@AllChar,@MyRnd,1)
    end
    --print @Out
    RETURN @Out
end

Advantages & Disadvantages

  • A: Relatively fast.
  • A: Full Random
  • A: No length limit
  • A: No characters limit
  • A: Filtering option for security

6. Building a fix length random string using NEWID and NCHAR->Cut randomly using LEFT

Basic Idea

We build a String manually by joining a fix number of random characters. 

Code

SELECT top 1000
       LEFT (
             NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
          + NCHAR(CAST(1000 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT))
              
          , 1 + CAST(29 * RAND( ABS(CHECKSUM(CAST(NEWID() AS  VARCHAR(100)))) ) AS INT)
          ) AS  Str
from sys.columns a CROSS JOIN sys.columns b

Advantages & Disadvantages

  • A: Fast execution solution.
  • D: Very hard for coding, if we need long length string.
    * We can build a dynamic query for this solution to get more flexible solution.
  • D: Length limit by the number of char in the query
  • A: No characters limit
  • D: No Filtering option for security

7. Using CLR Function

Basic Idea

The option of using CLR function added in SQL 2005 and yet most DBAs do not use it. DBAs have to internalize the extreme improvement and differences that can be (sometimes) by use CLR! While SQL Server works well with SET of DATA, CLR work much better in manipulating strings (Split, Regular expression...).

Code

http://social.technet.microsoft.com/wiki/contents/articles/21219.sql-server-create-random-string-using-clr.aspx
* Links to other versions can be seen on the resources

Advantages & Disadvantages

  • A: VERY FAST.
  • A: Extremely flexible.
    • No Length limit
    • No characters limit
  • A: Filtering option for security
  • D: needed to enable the use of CLR (hopefully you have done it already!)

Conclusions and Recommendations

  • Without a doubt, CLR function is the best solution! If you can use it, then chose it. Tests have shown that this function can produce in less than 2 seconds what other functions have not been able to produce in more than 20 minutes (The execution was terminated after 20 minutes). This solution meat with any requirement.
  • It is highly recommended not to use solution without filtering mechanism! several UNICODE characters might be harmful in some situation. You can get more information about the problematic CHAR ZERO For example in this link [Hebrew].
  • If you need (A) fast query (B) flexible & unlimited length or (C) Filtering mechanism to chose the characters that can be use, then use solution 5 or change a bit solution 4 to add filtering.
  • If you need (A) fast query and (B)short max length string and (C) you have to use all the characters range, then you can use solution 6.

Resources

This article is based on this forum discussion:
http://social.technet.microsoft.com/Forums/he-IL/6759c653-a716-4602-b184-09f38de0e177/-nvarchar-?forum=sqlhe

Create Random String Using CLR [DOT.NET 4.0+, using LINQ]
http://social.technet.microsoft.com/wiki/contents/articles/21219.sql-server-create-random-string-using-clr.aspx

Create Random String Using CLR, Including (1)version using LINQ, (2)Old version for DOT.NET 2.0+
http://ariely.info/Blog/tabid/83/EntryId/134/SQL-Random-String-using-CLR.aspx 

.

Contributed to the original discussion:

Ivan Radchenko, Tomer Shtrum, Ronen Ariely, Guy GlantserYossi Hakikat

.

======      

======