UNISTR (Transact-SQL)

Applies to: Azure SQL Database

UNISTR provides support for Unicode string literals by letting you specify the Unicode encoding value of characters in the string. UNISTR returns the Unicode characters corresponding to the input expression, as defined by the Unicode standard.

The escape sequence for a Unicode character can be specified in the form of \xxxx or \+xxxxxx, where xxxx is a valid UTF-16 codepoint value, and xxxxxx is a valid Unicode codepoint value. You can look up Unicode codepoint values in the Unicode Code Charts.

Compared to functions like NCHAR, UNISTR provides a more flexible and comprehensive way to handle Unicode characters. For example, while NCHAR can convert a single Unicode value to a character, UNISTR handles multiple Unicode values and escape sequences, making it easier to work with complex strings that include various Unicode characters.

Transact-SQL syntax conventions

Syntax

UNISTR ( 'character_expression' [ , 'unicode_escape_character' ] )

Remarks

Key benefits of UNISTR include:

  • Support for Unicode escape sequences: UNISTR allows you to specify Unicode characters using escape sequences

  • Flexibility with input types: UNISTR supports various character types such as char, nchar, varchar, and nvarchar. For char and varchar data types, the collation should be a valid UTF-8 collation.

  • Custom escape characters: You can define a custom escape character to perform the necessary conversion of Unicode values into a string character set.

Code page support

The UNISTR function isn't compatible with legacy code pages, meaning it doesn't support collations that use non-Unicode character sets. These collations with legacy code pages can be identified using the following query.

SELECT DISTINCT p.language,
                p.codepage
FROM sys.fn_helpcollations() AS c
CROSS APPLY (VALUES (LEFT(c.name, CHARINDEX('_', c.name) - 1),
    COLLATIONPROPERTY(c.name, 'codepage'))) AS p(language, codepage)
WHERE p.codepage NOT IN (
    0 /* Unicode Only collation */,
    65001 /* UTF-8 code page */
);

For more information, see Appendix G DBCS/Unicode Mapping Tables and Appendix H Code Pages.

Arguments

'character_expression'

An expression of any character type, such as char, nchar, varchar, or nvarchar. For char and varchar data types, the collation should be a valid UTF-8 collation. You can specify either string literals or Unicode or UTF-16 code point values or both. character_expression supports a length as large as varchar(max) and nvarchar(max).

N'unicode_escape_character'

A single character representing a user-defined Unicode escape sequence. If not supplied, the default value is \.

Return types

A string value whose length and type depend on the input types.

Examples

A. Use UNISTR vs the NCHAR function

The following examples all use the UNISTR functions to perform the necessary conversion of the Unicode values into string character set, to display the unicode character Smiling Face With Open Mouth. The database collation must be a UTF-8 collation if the input is of char or varchar data types.

Using UNISTR and NCHAR:

SELECT N'Hello! ' + NCHAR(0xd83d) + NCHAR(0xde00);

This sample example can also be written:

SELECT UNISTR(N'Hello! \D83D\DE00');
SELECT UNISTR(N'Hello! \+01F603');

Here's the result set.

-----------
Hello! 😃

B. Use UNISTR function with user defined escape character

The following example uses the UNISTR function with a custom escape character to perform the necessary conversion of the Unicode into a string character set.

SELECT UNISTR(N'ABC#00C0#0181#0187', '#');

Here's the result set.

-----------
ABCÀƁƇ

C. Use UNISTR function by combining string literals and Unicode code points

In the following example, UNISTR is used with a user-defined escape character ($) and a varchar data type with UTF-8 collation. It combines string literals with a Unicode codepoint value:

SELECT UNISTR('I $2665 Azure SQL.' COLLATE Latin1_General_100_CI_AS_KS_SC_UTF8, '$');

Here's the result set.

------------------
I ♥ Azure SQL.

D. Use UNISTR function for characters beyond the UTF-8 limit

If you need to use a character set beyond UTF-8, you must convert the character sequence to UTF-8 using the COLLATE clause. Here's an example:

SELECT UNISTR('\306F\3044' COLLATE Latin1_General_100_BIN2_UTF8) AS Yes_in_Japanese_Hiragana;

Here's the result set.

Yes_in_Japanese_Hiragana
------------------------
はい