Anchors

Статья
10/04/2012

Anchors enable you to fix a regular expression to the start or end of a line or input string. They also enable you to create expressions that match the start, end, or interior of a word.

For example, in the expression er\b, the \b matches a word boundary. The expression matches the "er" in "never", but not the "er" in "verb".

How Anchors Work

The following table contains the list of regular expression anchors and their meanings:

Character	Description
^	Matches the position at the beginning of the input string. If the m (multiline search) character is included with the flags, ^ also matches the position following \n or \r.
$	Matches the position at the end of the input string. If the m (multiline search) character is included with the flags, $ also matches the position preceding \n or \r.
\b	Matches a word boundary, that is, the position between a word and a space.
\B	Matches a nonword boundary.

You cannot use a quantifier with an anchor. Since you cannot have more than one position immediately before or after a newline or word boundary, expressions such as ^* are not permitted.

To match text at the beginning of a line of text, use the ^ character at the beginning of the regular expression. Do not confuse this use of the ^ with the use within a bracket expression.

To match text at the end of a line of text, use the $ character at the end of the regular expression.

To use anchors when searching for chapter headings, the following regular expression matches a chapter heading that contains no more than two following digits and that occurs at the beginning of a line:

/^Chapter [1-9][0-9]{0,1}/

Not only does a true chapter heading occur at the beginning of a line, it is also the only text on the line. It occurs at beginning of the line and also at the end of the same line. The following expression ensures that the specified match only matches chapters and not cross-references. It does so by creating a regular expression that matches only at the beginning and end of a line of text.

/^Chapter [1-9][0-9]{0,1}$/

Matching word boundaries is a little different but adds a very important capability to regular expressions. A word boundary is the position between a word and a space. A nonword boundary is any other position. The following expression matches the first three characters of the word "Chapter" because the characters appear following a word boundary:

/\bCha/

The position of the \b operator is critical. If it is at the beginning of a string to be matched, it looks for the match at the beginning of the word. If it is at the end of the string, it looks for the match at the end of the word. For example, the following expression matches the string "ter" in the word "Chapter" because it appears before a word boundary:

/ter\b/

The following expression matches the string "apt" as it occurs in "Chapter" but not as it occurs in "aptitude":

/\Bapt/

The string "apt" occurs on a nonword boundary in the word "Chapter" but on a word boundary in the word "aptitude". For the \B nonword boundary operator, position is not important because the match is not relative to the beginning or end of a word.

Поделиться через

Anchors

How Anchors Work

See Also

Other Resources

Дополнительные ресурсы