REGEXP_SUBSTR (Transact-SQL)

Applies to: Azure SQL Database SQL database in Microsoft Fabric

Note

As a preview feature, the technology presented in this article is subject to Supplemental Terms of Use for Microsoft Azure Previews.

Returns one occurrence of a substring of a string that matches the regular expression pattern. If no match is found, it returns NULL.

REGEXP_SUBSTR 
     (
      string_expression,
      pattern_expression [, start [, occurrence [, flags [, group ] ] ] ] 
     )

Arguments

string_expression

An expression of a character string.

Can be a constant, variable, or column of character string.

Data types: char, nchar, varchar, or nvarchar.

pattern_expression

Regular expression pattern to match. Usually a text literal

Data types: char, nchar, varchar, or nvarchar. pattern_expression supports a maximum character length of 8,000 bytes. 

start

Specify the starting position for the search within the search string. Optional. Type is int or bigint.

The numbering is 1-based, meaning the first character in the expression is 1 and the value must be >= 1. If the start expression is less than 1, returns error. If the start expression is greater than the length of string_expression, the function returns NULL. The default is 1.

occurrence

An expression (positive integer) that specifies which occurrence of the pattern expression within the source string to be searched or replaced. Default is 1. Searches at the first character of the string_expression. For a positive integer n, it searches for the nth occurrence beginning with the first character following the first occurrence of the pattern_expression, and so forth.

flags

One or more characters that specify the modifiers used for searching for matches. Type is varchar or char, with a maximum of 30 characters.

For example, ims. The default is c. If an empty string (' ') is provided, it will be treated as the default value ('c'). Supply c or any other character expressions. If flag contains multiple contradictory characters, then SQL Server uses the last character.

For example, if you specify ic the regex returns case-sensitive matching.

If the value contains a character other than those listed at Supported flag values, the query returns an error like the following example:

Invalid flag provided. '<invalid character>' are not valid flags. Only {c,i,s,m} flags are valid.
Supported flag values
Flag Description
i Case-insensitive (default false)
m Multi-line mode: ^ and $ match begin/end line in addition to begin/end text (default false)
s Let . match \n (default false)
c Case-insensitive (default true)

group

Specifies which capture group (subexpression) of a pattern_expression determines the position within string_expression to return. The capture group (subexpression) is a fragment of pattern enclosed in parentheses and can be nested. The capture groups are numbered in the order in which their left parentheses appear. The data type of group will be integer and the value must be greater than or equal to 0 and must not be greater than the number of capture groups (subexpressions) in pattern_expression. The default group value is 0, which indicates that the position is based on the string that matches the entire pattern.

Return value

String.

Examples

Extract the domain name from an email address.

SELECT REGEXP_SUBSTR (EMAIL, '@(.+)$', 1, 1, 'i', 1) AS DOMAIN FROM CUSTOMERS; 

Find the first word in a sentence that starts with a vowel.

SELECT REGEXP_SUBSTR (COMMENT, '\b[aeiou]\w*', 1, 1, 'i') AS WORD FROM FEEDBACK; 

Get the last four digits of a credit card number.

SELECT REGEXP_SUBSTR (CARD_NUMBER, '\d{4}$') AS LAST_FOUR FROM PAYMENTS;