Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Программирование

Файл:

Beginning Regular Expressions 2005.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

25.42 Mб

Скачать

☆

►Содержание►

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 2829 / 16929 30 31 32 33 34 35 36 37 38 39 40 41 > Следующая >>>

Character Classes

Figure 5-3 shows the use of the former pattern to match the test text grey in Komodo Regular Expression Toolkit.

Figure 5-3

The first two characters of the pattern gr[ae]y match the literal characters g and r. The character class [ae] matches the third character of the test text, a lowercase a. The final y of the pattern is matched by the final y of gray.

Using Quantifiers with Character Classes

So far, you have seen how a simple character class can be used when no quantifier is specified. Just like a single character, the absence of a quantifier indicates that a character from the character class can occur exactly once.

However, the ?, *, and + quantifiers and the curly-brace quantifier syntax can be used with a character class just as they are used with a single character.

For example, you could use the {2} quantifier in the following pattern to match against the test text in

ABPartNumbers.txt:

[AB]{2}[12][0-9]

Figure 5-4 shows the results using the Find All button in OpenOffice.org Writer.

As you can see, all sequences of part numbers between AB10 and AB29 in the sample text are matched.

However, using the character class [AB] with the quantifier {2} can cause undesired matches. For example, if the test text contained a part number AA23 or BB19, each of those would match, although they are not desired matches according to the problem definition that was expressed earlier in this chapter.

111

Chapter 5

Think about how it works. If the regular expression engine is at the position before the initial B of BB19, it attempts to match the character class [AB] against the uppercase B. That matches. Next, because of the {2} quantifier, it attempts to match the character class [AB] against the second B. That too matches. Then the character class [12] is matched against the numeric digit 1. That matches. Finally, the character class [0-9] is matched against the numeric digit 9, which matches. Because all components of the pattern have a corresponding match in the test text, the test text matches.

Figure 5-4

Using the \b Metacharacter in Character Classes

One metacharacter, which you will read about in Chapter 6, that has a different meaning inside a character class than it has outside a character class is the \b metacharacter.

112

Character Classes

Inside a character class the \b metacharacter represents a backspace character. Outside a character class the \b metacharacter signifies a word boundary; at least, it does in several regular expression implementations.

The use of the \b metacharacter outside a character class is described in Chapter 6.

The \b metacharacter isn’t the only metacharacter that has one meaning inside a character class and another meaning outside. The hyphen (-) and caret (^) metacharacters have special meanings inside characters, as explored later in this chapter. Inside a character class the $ character simply matches itself. Outside a character class the $ metacharacter matches a position rather than a character, as is discussed in Chapter 6.

Selecting Literal Square Brackets

You have probably realized that if you use the [ and ] metacharacters to define the boundaries of a character class you cannot at the same time use those characters to select themselves literally. The text file SquareBrackets.txt, shown here, illustrates some situations in which square brackets may occur literally:

These are alphabetic characters [A to Z and a to z].

myVariable = myArray[3];

Character[7]

The first five characters in the ASCII character set after uppercase Z are [, \, ], ^, and _.

To select either square-bracket character you must escape the corresponding literal square-bracket character. Escaping, in this context, simply means preceding the square bracket by a backslash character. So to select the left square-bracket character, [, use the following pattern:

And use the following to select the right square-bracket character, ]:

Figure 5-5 shows the use, in OpenOffice.org Writer, of the \[ pattern to select the left square-bracket character.

The backslash (\) character can be used to escape most metacharacters. To use the backslash literally you must escape it, too. So the pattern \\ selects a single backslash character.

113

<<< < Предыдущая 17 18 19 20 21 22 23 24 25 26 27 2829 / 16929 30 31 32 33 34 35 36 37 38 39 40 41 > Следующая >>>

Соседние файлы в предмете Программирование

#
17.08.20132.9 Mб56Beginning Perl Web Development - From Novice To Professional (2006).pdf
#
17.08.20138.05 Mб109Beginning Programming for Dummies 2004.pdf
#
17.08.201315.78 Mб158Beginning Python (2005).pdf
#
17.08.201313.91 Mб134Beginning Python - From Novice To Professional (2005).pdf
#
17.08.201318.51 Mб233Beginning REALbasic - From Novice To Professional (2006).pdf
#
17.08.201325.42 Mб95Beginning Regular Expressions 2005.pdf
#
17.08.20137.52 Mб25Beginning SharePoint With Excel - From Novice To Professional (2006).pdf
#
17.08.201325.54 Mб67Beginning Ubuntu Linux - From Novice To Professional (2006).pdf
#
17.08.201314.97 Mб219Beginning Visual Basic 2005 (2006).pdf
#
17.08.201321.25 Mб386Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006).pdf
#
17.08.201338.67 Mб28Blog Design Solutions (2006).pdf