Compartir a través de


Backreferences (Scripting)

 

Backreferences are used to find repeating groups of characters. They are also used to reformat an input string by rearranging the order and placement of the elements in the input string.

You can refer to a subexpression from within a regular expression, and from within a replacement string. Each subexpression is identified by number, and is referred to as a backreference.

Parentheses in a regular expression are used to create a subexpression. The resulting submatch can be retrieved by the program. For more information, see Alternation and Subexpressions (Scripting).

Stored Submatches

You can refer to a subexpression from within a regular expression.

In a regular expression, each saved submatch is stored as it is encountered from left to right. The buffer numbers in which the submatches are stored begin at 1 and continue up to a maximum of 99 subexpressions. Within the regular expression, you can access each buffer by using \n, where n is one or two decimal digits identifying a specific buffer.

One application of backreferences provides the ability to locate the occurrence of two identical words together in a text. Take the following sentence: Is is the cost of of gasoline going up up?

This sentence contains several duplicated words. It would be useful to devise a way to fix the sentence without having to look for duplicates of every word. The following JScript regular expression uses a single subexpression to do that.

/\b([a-z]+) \1\b/gi

Following is the equivalent Visual Basic Scripting Edition (VBScript) expression.

"\b([a-z]+) \1\b"

The subexpression in this case is everything enclosed in parentheses. That subexpression includes one or more alphabetical characters, as specified by [a-z]+. The second part of the regular expression is the reference to the previously saved submatch, that is, the second occurrence of the word just matched by the parenthetical expression. \1 is used to specify the first submatch.

The \b word boundary metacharacters make sure that only separate words are detected. Otherwise, a phrase such as "is issued" or "this is" would be incorrectly identified by this expression.

The following example lists the duplicated words. It shows how matches and submatches can be retrieved in code.

var newLine = "<br />";
var result;
var s = "";

var re = /\b([a-z]+) \1\b/gi
var src = "Is is the cost of of gasoline going up up?"

// Get the first match.
result = re.exec(src);

while (result != null)
{
    // Show the entire match.
    s += newLine + result[0] + newLine;

    // Show the submatches.
    // You can also obtain the submatches from RegExp.$1,
    // RegExp.$2, and so on.
    for (var index = 1; index < result.length; index++) {
        s += "submatch " + index + ": ";
        s += result[index];
        s += newLine;
    }

    // Get the next match.
    result = re.exec(src);
}
document.write(s);

// Output:
//  Is is
//  submatch 1: Is

//  of of
//  submatch 1: of

//  up up
//  submatch 1: up
Dim re, src, Match, Matches, NewLine, Index, s
NewLine = "<br />"

' Create the regular expression.
Set re = New RegExp
re.Pattern = "\b([a-z]+) \1\b"
re.Global = True
re.IgnoreCase = True

' Get the Matches collection.
src = "Is is the cost of of gasoline going up up?."
Set Matches = re.Execute(src)

s = ""
For Each Match in Matches
    ' Show the entire match.
    s = s & NewLine & Match.Value & NewLine

    ' Show the submatches.
    For Index = 0 to Match.SubMatches.Count - 1
        s = s & "submatch " & Index & ": "
        s = s & Match.Submatches(Index)
        s = s & NewLine
    Next
Next

document.write(s)

' Output:
'  Is is
'  submatch 0: Is

'  of of
'  submatch 0: of

'  up up
'  submatch 0: up

You can also refer to a subexpression from within a replacement string.

Using the regular expression shown above, the following example replaces an occurrence of two consecutive identical words with a single occurrence of the same word. In the replace method, $1 refers to the first saved submatch. If there is more than one submatch, you refer to them consecutively as $2, $3, and so on.

var re = /\b([a-z]+) \1\b/gi
var src = "Is is the cost of of gasoline going up up?"
var result = src.replace(re, "$1");
document.write (result);
// Output:
//  Is the cost of gasoline going up?
Dim re, src, result

Set re = New RegExp
re.Pattern = "\b([a-z]+) \1\b"
re.Global = True
re.IgnoreCase = True

src = "Is is the cost of of gasoline going up up?"
result = re.Replace(src, "$1")
document.write (result)
' Output:
'  Is the cost of gasoline going up?

The following example exchanges each pair of words in the string.

var re = /(\S+)(\s+)(\S+)/gi
var src = "The quick brown fox jumps over the lazy dog."
var result = src.replace(re, "$3$2$1");
document.write (result);
// Output:
//  quick The fox brown over jumps lazy the dog.
Dim re, src, result

Set re = New RegExp
re.Pattern = "(\S+)(\s+)(\S+)"
re.Global = True
re.IgnoreCase = True

src = "The quick brown fox jumps over the lazy dog."
result = re.Replace(src, "$3$2$1")
document.write (result)

' Output:
'  quick The fox brown over jumps lazy the dog.

Change History

Date

History

Reason

January 2010

Added examples.

Information enhancement.

See Also

Alternation and Subexpressions (Scripting)
Regular Expression Syntax (Scripting)
replace Method (Windows Scripting - JScript)
Replace Method (VBScript)