Поделиться через


Alternation and Subexpressions

Alternation in a regular expression enables you to group choices between two or more alternatives. You can essentially specify "this OR that" in a pattern.

Subexpressions enable you to match a pattern in searched text and divide the match into separate submatches. The resulting submatches can be retrieved by the program. Subexpressions also enable you to reformat text, as described in Backreferences in JScript.

For more information about regular expressions, see Creating a Regular Expression and Regular Expression Syntax.

Alternation

You can use the pipe (|) character to specify a choice between two or more alternatives. This is known as alternation. The largest possible expression on either side of the pipe character is matched. You might think that the following JScript expression matches either "Chapter" or "Section" followed by one or two digits.

/Chapter|Section [1-9][0-9]{0,1}/

Instead, the regular expression matches either the word "Chapter" or the word "Section" and whatever numbers follow that. If the searched string is "Section 22", the expression matches "Section 22". However, if the searched string is "Chapter 22", the expression matches the word "Chapter" instead of matching "Chapter 22".

Alternation with Parentheses

You can use parentheses to limit the scope of the alternation, that is, to make sure that it applies only to the two words "Chapter" and "Section". By adding parentheses, you can make the regular expression match either "Chapter 1" or "Section 3".

Parentheses, however, are also used to create a subexpression. The resulting submatch can be retrieved by the program.

The following JScript regular expression uses parentheses to group "Chapter" and "Section". Possible matches will then include "Chapter" followed by a number.

/(Chapter|Section) [1-9][0-9]{0,1}/

The parentheses around Chapter|Section also cause either of the two matching words to be saved for future use.

The following example shows how the matches and submatches can be retrieved in code. Because there is only one set of parentheses in the expression, there is only one saved submatch.

var re = /(Chapter|Section) [1-9][0-9]{0,1}/g
var src = "Chapter 50  Section 85"
ShowMatches(src, re);

// Output:
//  Chapter 50
//  submatch 1: Chapter

//  Section 85
//  submatch 1: Section

// Perform a search on a string by using a regular expression,
// and display the matches and submatches.
function ShowMatches(src, re)
{
    var result;

    // Get the first match.
    result = re.exec(src);
    
    while (result != null)
    {
        // Show the entire match.
        print();
        print(result[0]);

        // Show the submatches.
        for (var index=1; index<result.length; index++)
            {
                print("submatch " + index + ": " + result[index]);
            }

        // Get the next match.
        result = re.exec(src);
    }
}

Alternation Without a Saved Submatch

In the previous example, you just want to use the parentheses to group a choice between the words "Chapter" and "Section".

To prevent the submatch from being saved for later use, you can specify the subexpression (?:pattern). The following example does the same thing as the previous example, but it does not save the submatch.

var re = /(?:Chapter|Section) [1-9][0-9]{0,1}/g
var src = "Chapter 50  Section 85"
ShowMatches(src, re);
// Output:
//  Chapter 50
//  Section 85

Subexpressions

Placing parentheses in a regular expression creates a subexpression. The resulting submatch can be retrieved by the program.

In the following example, the regular expression contains three subexpressions. The submatch strings display together with each match.

var re = /(\w+)@(\w+)\.(\w+)/g
var src = "Please send mail to george@contoso.com and someone@example.com. Thanks!"
ShowMatches(src, re);
// The ShowMatches function is provided earlier.

// Output:
//  george@contoso.com
//  submatch 1: george
//  submatch 2: contoso
//  submatch 3: com

//  someone@example.com
//  submatch 1: someone
//  submatch 2: example
//  submatch 3: com

The following example separates a Universal Resource Indicator (URI) into its component parts.

The first parenthetical subexpression saves the protocol part of the Web address. It matches any word that comes before a colon and two forward slashes. The second parenthetical subexpression saves the domain address part of the address. It matches any sequence of characters that does not include slash mark (/) or colon (:) characters. The third parenthetical subexpression saves a Web site port number, if one is specified. It matches zero or more digits following a colon. The fourth parenthetical subexpression saves the path and/or page information specified by the Web address. It matches zero or more characters other than the number sign character (#) or the space character.

var re = /(\w+):\/\/([^\/:]+)(:\d*)?([^# ]*)/gi;
var src = "https://msdn.microsoft.com:80/scripting/default.htm";
ShowMatches(src, re);

// Output:
//  https://msdn.microsoft.com:80/scripting/default.htm
//  submatch 1: http
//  submatch 2: msdn.microsoft.com
//  submatch 3: :80
//  submatch 4: /scripting/default.htm

Positive and Negative Lookaheads

A positive lookahead is a search in which, after a match is found, the search for the next match starts before the matched text. The match is not saved for later use. To specify a positive lookahead, use the syntax (?=pattern).

In the following example, a search is performed to determine whether a password is 4 to 8 characters long and contains at least one digit.

In the regular expression, .*\d finds any number of characters followed by a digit. For the searched string "abc3qr", this matches "abc3". Starting before instead of after that match, .{4,8} matches a 4 to 8 character string. This matches "abc3qr".

The ^ and $ specify the positions at the start and end of the searched string. This is to prevent a match if the searched string contains any characters outside of the matched characters.

var re = /^(?=.*\d).{4,8}$/gi
var src = "abc3qr"
ShowMatches(src, re);
// The ShowMatches function is provided earlier.
// Output:
//  abc3qr

A negative lookahead searches for a search string that does not match the pattern in a negative lookahead expression. After a match is found, the search for the next match starts before the matched text. The match is not saved for later use. To specify a negative lookahead, use the syntax (?!pattern).

The following example matches words that do not start with "th".

In the regular expression, \b matches a word boundary. For the searched string " quick ", this matches the first space. (?!th) matches a string that is not "th". This matches "qu". Starting before that match, \w+ matches a word. This matches "quick".

var re = /\b(?!th)\w+\b/gi
var src = "The quick brown fox jumps over the lazy dog."
ShowMatches(src, re);
// Output:
//  quick
//  brown
//  fox
//  jumps
//  over
//  lazy
//  dog

See Also

Concepts

Backreferences in JScript

Other Resources

Introduction to Regular Expressions