Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Parentheses in Regular Expressions

However, when you change the pattern to (ab|a), there is a two-character match on Line 2. The first option, ab, is evaluated first. Assuming that matching starts from the position before the a, the first option matches, because the pattern ab is matched by the sequence of characters ab. Because the first option matches, the second option is not evaluated.

Capturing Parentheses

The uses of parentheses you’ve looked at have routinely captured the content between the opening and closing parentheses. As you saw in Figure 7-1, there was a group of characters, with the value of hot, captured and assigned to a variable $1.

Numbering of Captured Groups

Numbering of captured groups is determined by the order of the opening parentheses in a regular expression pattern.

For example, use the following pattern:

(United) (States)

Match it against the following test text:

The United States

The sequence of characters United (which follows the first opening parenthesis) is the value of the variable $1, and the sequence of characters States (which follows the second opening parenthesis) is the value of the variable $2. Figure 7-8 shows this in the Komodo Regular Expressions Toolkit.

Figure 7-8

185

Chapter 7

Suppose that you modify the pattern so that the space character is also inside parentheses, as follows:

(United)( )(States)

Then there are three groups, as shown in Figure 7-9.

Figure 7-9

If you are wondering why you might want to capture a whitespace character, think of how you might combine two names:

([A-Za-z])( )([A-Za-z])

as a filename such as the following:

FirstWord_SecondWord.html

You might specifically want to replace the space character with an underline character.

Numbering When Using Nested Parentheses

When you use nested parentheses, the same rule applies. Variables are numbered in accordance with the order in which the opening parentheses characters occur.

You have the following sample text:

A22 33

And you want to match it against the following rather confusing pattern:

((\w(\d{2}))(( )(\d{2})))

186

Parentheses in Regular Expressions

The first question that might occur to you is why might you want to do this. Also, you may want to use the documentation techniques mentioned in Chapter 10 and illustrated in several of the languagespecific chapters later in the book.

To get a hint as to why nested parentheses can be useful, take a look at Figure 7-10.

Figure 7-10

If you want to use different parts of the test text A22 33 for different programmatic purposes, you are now in a position to do so. The variable $1 has the value A22 33. So you can use all six characters (including the space character) for some purpose. However, if you want to reuse the three-character sequence A22 for some purpose, the variable $2 exists and can be used in your code.

Other variables have been created with values that correspond to different chunks of the test text.

Named Groups

Python and the .NET languages allow named groups to be created. The syntax:

(?<GroupName>Pattern)

or:

(?’GroupName’Pattern)

is used in .NET languages, whereas the pattern:

(?P<GroupName>Pattern)

is used in Python.

187