Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Character Classes

How It Works

If the regular expression engine starts at the position immediately before the A of the first line of the test file, the A is tested against the pattern [:alnum:]?. There is a match because uppercase A is an alphabetic character. The matched text is highlighted in reverse video.

When the Find All button is used, after that first successful match the regular expression engine moves to the position between A and B and attempts to match against the following character, B. That matches, and so it, too, is highlighted in reverse video. The regular expression engine moves to the next position and then matches the C, and so on. When the newline character is reached, there is no match against the pattern [:alnum:]?, and the regular expression engine moves on to the position after the newline character and attempts to match the next character.

When the regular expression engine reaches the position before the underscore character and attempts to match that character, there is no match, because the underscore character is neither an alphabetic character nor a numeric digit.

Exercises

1.You have a document that contains American English and British English. State a problem definition to locate occurrences of license (U.S. English) and licence (British English). Specify a regular expression pattern using a character class to find both sequences of characters.

2.The pattern (20|19)[0-9]{2}[-./][01][0-9][-./][0123][0-9] was used earlier in this chapter to match dates. As written, this pattern would allow months such as 00, 13, or 19 and allow days such as 00, 32, and 39. Modify the relevant components of the pattern so that only months 01 through 12 and days 01 through 31 are allowed.

141

6

String, Line, and Word

Boundaries

This chapter looks at metacharacters that match positions before, between, or after characters rather than selecting matching characters. These positional metacharacters complement the metacharacters that were described in Chapter 4, each of which signified characters to be matched.

For example, you will see how to match characters, or sequences of characters, that immediately follow the position at the beginning of a line. In normal English you might, for example, say that you want to match a specified sequence of characters only when they immediately follow the beginning of a line or the beginning of the whole test text. The implication is that you don’t want to match the specified sequence of characters if they occur anywhere else in the text. So using a positional character in this way can significantly change the sequences of characters that match or fail to match.

Equally, you might want to look for whole words rather than sequences of characters or sequences of characters when they occur in relation to the beginning or end of a word. Many regular expression implementations have positional metacharacters that allow you to do that.

This chapter provides you with the information needed to make matches based on the position of a sequence of characters.

The term anchor is sometimes used to refer to the metacharacters that match a position rather than a character.

In some documentation (for example, the documentation for .NET regular expression functionality), these same positional metacharacters are termed atomic zero-width assertions.