Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Metacharacters and Modifiers

The \D Metacharacter

The \d metacharacter, as you have seen, matches a numeric digit, 0 through 9. The \D metacharacter matches characters that don’t match the \d metacharacter. So the characters that match the \D metacharacter include alphabetic characters (both English-language and non–English-language alphabetic characters) and whitespace characters such as space characters.

Try It Out

The \D Metacharacter

1.Open the Komodo Regular Expression Toolkit, and clear any residual regular expression patterns and test text.

2.Enter sample text 321ABC in the Enter a String to Match Against area.

3.Enter the regular expression pattern \D in the Enter a Regular Expression area.

4.Inspect the results in the area below and within the Enter a String to Match Against area. Below the area, the message Match succeeded: 0 groups is expected. Within the area, the A of 321ABC should be highlighted.

Figure 4-10 shows the result’s appearance in the Komodo Regular Expression Toolkit.

Figure 4-10

5.In the Enter a String to Match Against area, click the mouse between 321 and ABC of 321ABC, and press the spacebar once.

6.Inspect the results in the Enter a String to Match Against area and in the gray area below it. In the former, the space character between the 1 and A of 321ABC should be highlighted in pale green (on-screen). In the latter, the message Match succeeded: 0 groups should be displayed.

Figure 4-11 shows the result.

89

Chapter 4

Figure 4-11

How It Works

First, consider what happens when the test text is 321ABC. The regular expression engine starts at the position before the 3 of 321ABC and attempts to find a match. There is no match because the next character, the numeric digit 3, does not match the pattern \D (which matches characters that are not numeric digits). So the regular expression engine moves to the position after the 3 of 321ABC and again attempts to find a match. That too fails. When the regular expression engine moves to the position before the A of 321ABC and attempts a match, it is successful, because uppercase A is not a numeric digit and, therefore, is a match for the \D pattern. As mentioned earlier, in the Komodo Regular Expression Toolkit, matching characters are highlighted in pale green on-screen.

After Step 5, there is a space character between the 1 and A of the test text 321ABC. The regular expression fails to match when the starting position is any of the positions before the position immediately before the space character. When the regular expression engine starts at that latter position the next character, a space character, is not a numeric digit and therefore matches the \D pattern.

Alternatives to \d and \D

The \d metacharacter matches numeric digits 0 through 9. It is possible to match the same digits using other, less succinct regular expression patterns. The techniques to do this involve alternation, which is described in Chapter 7, or character classes, which are described in Chapter 5. You saw an example of using a character class a little earlier in this chapter. However, a couple of simple examples using alternation are shown here so you can see how to handle the matching of digits or nondigits in implementations that do not support the \d and \D metacharacters.

The \d metacharacter is a succinct way of expressing the notion of “0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9.” That notion can be expressed in a closely corresponding way using the following pattern, where the vertical bar (sometimes called a pipe) signifies logical OR:

(0|1|2|3|4|5|6|7|8|9)

90

Metacharacters and Modifiers

Figure 4-12 shows the use of the preceding pattern in OpenOffice.org Writer, which does not support the \d metacharacter.

Figure 4-12

A more succinct way to express the same idea is to use a character class, where the start and the end of the character class are expressed using the [ and ] metacharacters, respectively. To express a character class containing the numeric digits, you could write the following pattern:

[0123456789]

Or, more succinctly, use a range in a character class, as shown here:

[0-9]

Figure 4-13 shows the [0-9] character class used to match all numeric digits in the sample file,

Digits.txt.

91

Chapter 4

Figure 4-13

Whitespace and Non-Whitespace

Metacharacters

Whitespace characters may often occur in significant places in data. For example, in XML it is common to have whitespace inside the start tag of an XML element. Suppose you had a simple XML document that contained data about a person, Person.xml:

<?xml version=’1.0’?>

<Person DateOfBirth=”1970/01/12”> <FirstName>John</FirstName> <LastName>Scoliosis</LastName> </Person>

The DateOfBirth attribute has a single space character before it; otherwise, the element type name would be incorrectly read as PersonDateOfBirth. In addition, after the closing double quote that follows the value of the DateOfBirth attribute, there is no whitespace, but the rules of XML allow the use

92