Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

String, Line, and Word Boundaries

Using the ^ and $ Metacharacters Together

Using the ^ and $ metacharacters together can be useful to identify lines that consist entirely of desired characters. This can be very useful when validating user input, for example.

The sample text, ABCPartNumbers.txt, is shown here:

ABC123

There is a part number ABC123.

ABC234

A purchase order for 400 of ABC345 was received yesterday.

ABC789

Notice that some lines consist only of a part number, whereas other lines include the part number as part of some surrounding text.

The intention is to match lines that consist only of a part number. The problem definition is as follows:

Match a beginning of line position, followed by the literal sequence of characters A, B, and C, followed by three numeric digits, followed by a position that is either the end-of-line position or an end-of-string position.

Try It Out

Matching Part Numbers

This example demonstrates using the ^ and $ metacharacters in the same pattern:

1.Open OpenOffice.org Writer, and open the test file ABCPartNumbers.txt.

2.Open the Find & Replace dialog box, using the Ctrl+F keyboard shortcut, and check the Regular Expressions and Match Case check boxes.

3.Enter the pattern ^ABC[0-9]{3}$ in the Search For text box.

4.Click the Find All button, and inspect the highlighted text, as shown in Figure 6-9. Notice how three occurrences of a sequence of characters representing a part number are highlighted as matches, while two occurrences of a part number are not highlighted because they are not matches.

153

Chapter 6

Figure 6-9

How It Works

The regular expression engine begins the matching process at the start of the test file. It attempts to match the ^ metacharacter against the current position. There is a match. It next attempts to match the literal character A in the pattern against the first character in the line, which is uppercase A. There is a match. The matching process is repeated successfully for the literal characters B and C. Then the regular expression engine attempts to match the pattern [0-9]{3}. It attempts to match the character class [0-9] against the character 1 in the test text. That matches. It then proceeds to match the character class [0-9] a second time, this time against the character 2. That also matches. It next proceeds to match the character class [0-9] for a third time, as indicated by the {3} quantifier, against the character 3.

That too matches. Finally, it attempts to match the $ metacharacter against the position following the

154

String, Line, and Word Boundaries

character 3. That matches because it immediately precedes a Unicode newline character. Each component of the pattern matches; therefore, the entire pattern matches.

At the beginning of the second line, the regular expression successfully matches the ^ metacharacter. It next attempts to match the literal character A in the pattern against the first character on the line, an uppercase T. The attempt at matching fails. Any subsequent attempt to match on that line fails when the attempt is made to match the ^ metacharacter because the position is not at the beginning of the line.

Matching Blank Lines

One of the potential uses of the ^ and $ metacharacters together is to match blank lines. The following pattern should match a blank line, because the ^ metacharacter signifies the beginning of the line and the $ metacharacter signifies the position immediately either before a Unicode newline character or the end of the test string.

^$

However, not all tools support this pattern.

The test file, WithBlankLines.txt, is shown here:

Line 1

 

 

 

Line 3

which follows a blank line

Line

5

which

follows

a second blank line

Line

7

which

follows

a third blank line

 

 

 

 

 

After Line 7, there are two further blank lines to end the test file.

Try It Out

Replacing Blank Lines

1.Open OpenOffice.org Writer, and open test file WithBlankLines.txt.

2.Open the Find & Replace dialog box using the Ctrl+F keyboard shortcut, and check the Regular Expressions and Match Case check boxes.

3.Enter the pattern ^$ in the Search For text box.

4.Click the Find All button, and inspect the results, as shown in Figure 6-10.

155

Chapter 6

Figure 6-10

Each blank line, except the last two, is highlighted as a match. If you try to scroll down, you will find that OpenOffice.org Writer has lost one of the blank lines that is present if you open the WithBlankLines.txt file in Notepad. If you manually reenter one of the blank lines that OpenOffice.org Writer strips out, an additional blank line will match. A blank line at the end of a file seems not to match in OpenOffice.org Writer.

5.Click the Replace All button, and inspect the results, as shown in Figure 6-11. Notice that the three previously highlighted blank lines have been deleted.

156

String, Line, and Word Boundaries

Figure 6-11

How It Works

The second line of the original test file is a blank line. When the regular expression engine is at the position at the beginning of that blank line, matching is attempted against the ^ metacharacter. There is a match. Without moving its position, the regular expression engine then attempts to match the $ metacharacter against the same position. Because that position immediately precedes a Unicode newline character, there is a match for the $ metacharacter, too. Therefore, the entire pattern matches. In OpenOffice.org Writer, the matching of the blank line leads to the entire width of the text area on that line being highlighted.

When the regular expression engine is at the beginning of the third line of the original file, it first attempts to match the ^ metacharacter. That matches. It next attempts to match the $ metacharacter against the same position. Because the position is followed by the character uppercase L, it is not the position that precedes a Unicode newline character. Therefore, the attempt at matching fails.

157