Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

7

Parentheses in Regular

Expressions

Parentheses are powerful tools when you are using regular expressions. They can be used to group characters for several purposes, each of which will be explained later in the chapter. Parentheses can, for example, be used to express simple alternatives or to express multiple options.

Parentheses create one or more groups. Groups of matched characters can then be used later in the manipulation of text — for example, in the construction of replacement text for specified pieces of matched text.

In this chapter, you will learn how to do the following:

Use parentheses for grouping

Use quantifiers with groups of characters and/or metacharacters

Match literal opening and closing parenthesis characters

Provide alternatives or multiple options in regular expression patterns

Use capturing and non-capturing parentheses

Use back references

Grouping Using Parentheses

Parentheses inside regular expression patterns are used to group characters and remember matched text. Often, the group(s) of characters created by using parentheses are used for text manipulation purposes. This section looks at what grouping is and how it is achieved.

Chapter 7

To create a group of characters, simply precede the character group with an opening parenthesis character and follow the group with a closing parenthesis character. For example, examine the following pattern:

United States

Containing only literal characters, this pattern could be used to match the text United States. Now examine the following pattern:

(United)( )(States)

It would match the same text but would, at the same time, create three groups: the first for the sequence of characters United, the second for a space character, and the third for the sequence of characters States. If, for example, you wanted to replace the group United with an uppercase U, the group containing the space character with nothing, and the group States with the uppercase character S, you could by a slightly cumbersome process replace the string United States with the abbreviation US.

Parentheses allow you to group sequences of characters. What you do with the groups depends on your text manipulation task.

When using parentheses in patterns, be careful that you don’t inadvertently include any whitespace inside the parentheses, because if you do include whitespace, the regular expression engine will attempt to match that whitespace as part of the sequence of characters. If it fails to match the whitespace, you might find that you fail to get the desired matches.

Try It Out

Grouping Characters

This example demonstrates simple grouping in action:

1.Open the Komodo Regular Expressions Toolkit, and delete any regular expression pattern and/or test text left over from previous use.

2.In the Enter a String to Match Against text area, enter the test text The hot water.

3.In the Enter a Regular Expression area, enter the pattern hot.

4.Inspect the matched text (which is the string hot), and notice the message in the gray area below the Enter a String to Match Against text area: Match succeeded: 0 groups.

5.Change the regular expression pattern to read (hot), inspect the matched text (it is still the string hot), and notice that the message in the gray area below the Enter a String to Match Against text area has changed. It now reads Match succeeded: 1 group.

Figure 7-1 shows the appearance after Step 6. By adding parentheses to a simple regular expression pattern you have, as indicated by the message mentioned in Step 7, created a group. Notice that in the Group Match Variables area, the value $1 appears in the Variable column, and the value hot appears in the Value column, indicating that the group can be referenced as $1 and that its value is the sequence of characters hot. This topic is further discussed later in the chapter.

172

Parentheses in Regular Expressions

Figure 7-1

How It Works

The first pattern hot contains only three literal characters and simply matches a sequence of characters: hot.

When the matched parentheses enclose the hot component of the pattern, those three literal characters form a group.

Parentheses and Quantifiers

One basic use for parentheses is in grouping characters and/or metacharacters so that a quantifier can be applied to the group contained by the parentheses.

For example, if you expect that a group of characters occurs more than once in succession, the relevant characters and/or metacharacters can be enclosed in parentheses and an appropriate quantifier used immediately following the closing parenthesis.

For example, suppose that you want to match a sequence of characters that consists of an uppercase A, followed by a numeric digit, followed by another uppercase A and another numeric digit. You can use the following pattern to match the described sequence of characters:

(A\d){2}

The (A\d) component of the pattern matches an uppercase A followed by a numeric digit. The parentheses do not match anything — neither a character in the test text nor a position. The quantifier {2} indicates that everything contained in the parentheses must occur exactly twice if there is to be a match.

173

Chapter 7

Try It Out

Parentheses and Quantifiers

A test file, QuantifierTest.txt, is shown here:

A3A4CDE

B9B6XYZ

A2A9RTE

B4B4UIO

G2H1WEQ

1.Open PowerGrep, and enter the regular expression pattern (A\d){2} in the Search Text area.

2.Enter the text C:\BRegExp\Ch07 in the Folder text box. Amend this if you installed the code download in some other location.

3.Enter the filename QuantifierTest.txt in the File Mask text box; click the Search button; and inspect the outcome in the Results area, as shown in Figure 7-2.

4.Click the Search button, and inspect the outcome in the Results area.

Figure 7-2

How It Works

The pattern (A\d){2} has the same meaning, in terms of the sequence of characters that it matches, as the pattern A\dA\d. In other words, it matches an uppercase A, followed by a numeric digit, followed by another uppercase A and a numeric digit.

PowerGrep indicates that there are matches on lines 1 and 5.

When the regular expressions engine is at the position before the first A on line 1, it first attempts to match the first A on the line against the first character inside the parentheses, an uppercase A. There is a match. It next attempts to match the second component inside the parentheses, the metacharacter \d,

174