Compartilhar via


Regular Expression Syntax (Scripting)

A regular expression describes one or more strings to match when you search a body of text. The expression serves as a template for matching a character pattern to the string that is being searched.

A regular expression consists of ordinary characters (for example, letters a through z) and special characters, known as metacharacters.

Special Characters

The following table contains a list of single-character metacharacters and their behavior in regular expressions.

Note

To match one of these special characters, you must first escape the character, that is, precede it with a backslash character (\). For instance, to search for the "+" literal character, you can use the expression "\+".

Metacharacter

Behavior

Example

*

Matches the previous character or subexpression zero or more times.

Equivalent to {0,}.

zo* matches "z" and "zoo".

+

Matches the previous character or subexpression one or more times.

Equivalent to {1,}.

zo+ matches "zo" and "zoo", but not "z".

?

Matches the previous character or subexpression zero or one time.

Equivalent to {0,1}.

When ? immediately follows any other quantifier (*, +, ?, {n}, {n,}, or {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible. The default greedy pattern matches as much of the searched string as possible.

zo? matches "z" and "zo", but not "zoo".

o+? matches a single "o" in "oooo", and o+ matches all "o"s.

do(es)? matches the "do" in "do" or "does".

^

Matches the position at the start of the searched string. If the Multiline property is set, ^ also matches the position following \n or \r.

When used as the first character in a bracket expression, ^ negates the character set.

^\d{3} matches 3 numeric digits at the start of the searched string.

[^abc] matches any character except a, b, and c.

$

Matches the position at the end of the searched string. If the Multiline property is set, $ also matches the position before \n or \r.

\d{3}$ matches 3 numeric digits at the end of the searched string.

.

Matches any single character except the newline character \n. To match any character including the \n, use a pattern like [\s\S].

a.c matches "abc", "a1c", and "a-c".

[]

Marks the start and end of a bracket expression.

[1-4] matches "1", "2", "3", or "4". [^aAeEiIoOuU] matches any non-vowel character.

{}

Marks the start and end of a quantifier expression.

a{2,3} matches "aa" and "aaa".

()

Marks the start and end of a subexpression. Subexpressions can be saved for later use.

A(\d) matches "A0" to "A9". The digit is saved for later use.

|

Indicates a choice between two or more items.

z|food matches "z" or "food". (z|f)ood matches "zood" or "food".

/

Denotes the start or end of a literal regular expression pattern in JScript. After the second "/", single-character flags can be added to specify search behavior.

/abc/gi is a JScript literal regular expression that matches "abc". The g (global) flag specifies to find all occurrences of the pattern, and the i (ignore case) flag makes the search case-insensitive.

\

Marks the next character as a special character, a literal, a backreference, or an octal escape.

\n matches a newline character. \( matches "(". \\ matches "\".

Most special characters lose their meaning and represent ordinary characters when they occur inside a bracket expression. For more information, see "Characters in Bracket Expressions" in Lists of Matching Characters (Scripting).

Metacharacters

The following table contains a list of multiple-character metacharacters and their behavior in regular expressions.

Metacharacter

Behavior

Example

\b

Matches a word boundary, that is, the position between a word and a space.

er\b matches the "er" in "never" but not the "er" in "verb".

\B

Matches a word non-boundary.

er\B matches the "er" in "verb" but not the "er" in "never".

\d

Matches a digit character.

Equivalent to [0-9].

In the searched string "12 345", \d{2} matches "12" and "34". \d matches "1", 2", "3", "4", and "5".

\D

Matches a nondigit character.

Equivalent to [^0-9].

\D+ matches "abc" and " def" in "abc123 def".

\w

Matches any of the following characters: A-Z, a-z, 0-9, and underscore.

Equivalent to [A-Za-z0-9_].

In the searched string "The quick brown fox…", \w+ matches "The", "quick", "brown", and "fox".

\W

Matches any character except A-Z, a-z, 0-9, and underscore.

Equivalent to [^A-Za-z0-9_].

In the searched string "The quick brown fox…", \W+ matches "…" and all of the spaces.

[xyz]

A character set. Matches any one of the specified characters.

[abc] matches the "a" in "plain".

[^xyz]

A negative character set. Matches any character that is not specified .

[^abc] matches the "p" in "plain".

[a-z]

A range of characters. Matches any character in the specified range.

[a-z] matches any lowercase alphabetical character in the range "a" through "z".

[^a-z]

A negative range of characters. Matches any character not in the specified range.

[^a-z] matches any character not in the range "a" through "z".

{n}

Matches exactly n times. n is a nonnegative integer.

o{2} does not match the "o" in "Bob", but does match the two "o"s in "food".

{n,}

Matches at least n times. n is a nonnegative integer.

* is equivalent to {0,}.

+ is equivalent to {1,}.

o{2,} does not match the "o" in "Bob" but does match all the "o"s in "foooood".

{n,m}

Matches at least n and at most m times. n and m are nonnegative integers, where n <= m. There cannot be a space between the comma and the numbers.

? is equivalent to {0,1}.

In the searched string"1234567", \d{1,3} matches "123", "456", and "7".

(pattern)

Matches pattern and saves the match. You can retrieve the saved match from the SubMatches collection in Visual Basic Scripting Edition (VBScript) or from array elements returned by the exec Method in JScript. To match parentheses characters ( ), use "\(" or "\)".

(Chapter|Section) [1-9] matches "Chapter 5", and "Chapter" is saved for later use.

(?:pattern)

Matches pattern but does not save the match, that is, the match is not stored for possible later use. This is useful for combining parts of a pattern with the "or" character (|).

industr(?:y|ies) is equivalent to industry|industries.

(?=pattern)

Positive lookahead. After a match is found, the search for the next match starts before the matched text. The match is not saved for later use.

^(?=.*\d).{4,8}$ applies a restriction that a password must be 4 to 8 characters long, and must contain at least one digit.

Within the pattern, .*\d finds any number of characters followed by a digit. For the searched string "abc3qr", this matches "abc3".

Starting before instead of after that match, .{4,8} matches a 4-8 character string. This matches "abc3qr".

The ^ and $ specify the positions at the start and end of the searched string. This is to prevent a match if the searched string contains any characters outside of the matched characters.

(?!pattern)

Negative lookahead. Matches a search string that does not match pattern. After a match is found, the search for the next match starts before the matched text. The match is not saved for later use.

\b(?!th)\w+\b matches words that do not start with "th".

Within the pattern, \b matches a word boundary. For the searched string " quick ", this matches the first space. (?!th) matches a string that is not "th". This matches "qu".

Starting before that match, \w+ matches a word. This matches "quick".

\cx

Matches the control character indicated by x. The value of x must be in the range of A-Z or a-z. If it is not, c is assumed to be a literal "c" character.

\cM matches a CTRL+M or carriage return character.

\xn

Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. Allows ASCII codes to be used in regular expressions.

\x41 matches "A". \x041 is equivalent to "\x04" followed by "1", (because n must be exactly 2 digits).

\num

Matches num, where num is a positive integer. This is a reference to saved matches.

(.)\1 matches two consecutive identical characters.

\n

Identifies either an octal escape value or a backreference. If \n is preceded by at least n captured subexpressions, n is a backreference. Otherwise, n is an octal escape value if n is an octal digit (0-7).

(\d)\1 matches two consecutive identical digits.

\nm

Identifies either an octal escape value or a backreference. If \nm is preceded by at least nm captured subexpressions, nm is a backreference. If \nm is preceded by at least n captured subexpressions, n is a backreference followed by literal m. If neither of those conditions exist, \nm matches octal escape value nm when n and m are octal digits (0-7).

\11 matches a tab character.

\nml

Matches octal escape value nml when n is an octal digit (0-3) and m and l are octal digits (0-7).

\011 matches a tab character.

\un

Matches n, where n is a Unicode character expressed as four hexadecimal digits.

\u00A9 matches the copyright symbol (©).

Nonprinting Characters

The following table contains escape sequences that represent non-printing characters.

Character

Matches

Equivalent to

\f

Form-feed character.

\x0c and \cL

\n

Newline character.

\x0a and \cJ

\r

Carriage-return character.

\x0d and \cM

\s

Any white-space character. This includes space, tab, and form feed.

[ \f\n\r\t\v]

\S

Any non–white space character.

[^ \f\n\r\t\v]

\t

Tab character.

\x09 and \cI

\v

Vertical tab character.

\x0b and \cK

Order of Precedence

A regular expression is evaluated much like an arithmetic expression; that is, it is evaluated from left to right and follows an order of precedence.

The following table contains the order of precedence of the regular expression operators, from highest to lowest.

Operator or operators

Description

\

Escape

(), (?:), (?=), []

Parentheses and brackets

*, +, ?, {n}, {n,}, {n,m}

Quantifiers

^, $, \anymetacharacter

Anchors and sequences

|

Alternation

Characters have higher precedence than the alternation operator, which, for example, allows "m|food" to match "m" or "food".

See Also

Concepts

Creating a Regular Expression (Scripting)

Change History

Date

History

Reason

August 2009

Added examples.

Information enhancement.