Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Character Classes

Figure 5-3 shows the use of the former pattern to match the test text grey in Komodo Regular Expression Toolkit.

Figure 5-3

The first two characters of the pattern gr[ae]y match the literal characters g and r. The character class [ae] matches the third character of the test text, a lowercase a. The final y of the pattern is matched by the final y of gray.

Using Quantifiers with Character Classes

So far, you have seen how a simple character class can be used when no quantifier is specified. Just like a single character, the absence of a quantifier indicates that a character from the character class can occur exactly once.

However, the ?, *, and + quantifiers and the curly-brace quantifier syntax can be used with a character class just as they are used with a single character.

For example, you could use the {2} quantifier in the following pattern to match against the test text in

ABPartNumbers.txt:

[AB]{2}[12][0-9]

Figure 5-4 shows the results using the Find All button in OpenOffice.org Writer.

As you can see, all sequences of part numbers between AB10 and AB29 in the sample text are matched.

However, using the character class [AB] with the quantifier {2} can cause undesired matches. For example, if the test text contained a part number AA23 or BB19, each of those would match, although they are not desired matches according to the problem definition that was expressed earlier in this chapter.

111

Chapter 5

Think about how it works. If the regular expression engine is at the position before the initial B of BB19, it attempts to match the character class [AB] against the uppercase B. That matches. Next, because of the {2} quantifier, it attempts to match the character class [AB] against the second B. That too matches. Then the character class [12] is matched against the numeric digit 1. That matches. Finally, the character class [0-9] is matched against the numeric digit 9, which matches. Because all components of the pattern have a corresponding match in the test text, the test text matches.

Figure 5-4

Using the \b Metacharacter in Character Classes

One metacharacter, which you will read about in Chapter 6, that has a different meaning inside a character class than it has outside a character class is the \b metacharacter.

112

Character Classes

Inside a character class the \b metacharacter represents a backspace character. Outside a character class the \b metacharacter signifies a word boundary; at least, it does in several regular expression implementations.

The use of the \b metacharacter outside a character class is described in Chapter 6.

The \b metacharacter isn’t the only metacharacter that has one meaning inside a character class and another meaning outside. The hyphen (-) and caret (^) metacharacters have special meanings inside characters, as explored later in this chapter. Inside a character class the $ character simply matches itself. Outside a character class the $ metacharacter matches a position rather than a character, as is discussed in Chapter 6.

Selecting Literal Square Brackets

You have probably realized that if you use the [ and ] metacharacters to define the boundaries of a character class you cannot at the same time use those characters to select themselves literally. The text file SquareBrackets.txt, shown here, illustrates some situations in which square brackets may occur literally:

These are alphabetic characters [A to Z and a to z].

myVariable = myArray[3];

Character[7]

The first five characters in the ASCII character set after uppercase Z are [, \, ], ^, and _.

To select either square-bracket character you must escape the corresponding literal square-bracket character. Escaping, in this context, simply means preceding the square bracket by a backslash character. So to select the left square-bracket character, [, use the following pattern:

\[

And use the following to select the right square-bracket character, ]:

\]

Figure 5-5 shows the use, in OpenOffice.org Writer, of the \[ pattern to select the left square-bracket character.

The backslash (\) character can be used to escape most metacharacters. To use the backslash literally you must escape it, too. So the pattern \\ selects a single backslash character.

113