Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 3

You will find matches on four lines, as shown in Figure 3-11. The preceding command line will work correctly only if the ABC123.txt file is in the current directory. If it is in a different directory, you will need to reflect that in the path for the file that you enter at the command line.

Figure 3-11

The next section will combine the techniques that you have seen so far to find a combination of literally expressed characters and a sequence of characters.

Matching Sequences of Different Characters

A common task in simple regular expressions is to find a combination of literally specified single characters plus a sequence of characters.

There is an almost infinite number of possibilities in terms of characters that you could test. Let’s focus on a very simple list of part numbers and look for part numbers with the code DOR followed by three numeric digits. In this case, the regular expression should do the following:

Look for a match for uppercase D. If a match is found, check if the next character matches uppercase O. If that matches, next check if the following character matches uppercase R. If those three matches are present, check if the next three characters are numeric digits.

Try It Out

Finding Literal Characters and Sequences of Characters

The file PartNumbers.txt is the sample file for this example.

BEF123

RRG417

DOR234

DOR123

CCG991

54

Simple Regular Expressions

First, try it in OpenOffice.org Writer, remembering that you need to use the regular expression pattern [0-9] instead of \d.

1.Open the file PartNumbers.txt in OpenOffice.org Writer, and open the Find and Replace dialog box by pressing Ctrl+F.

2.Check the Regular Expression check box and the Match Case check box.

3.Enter the pattern DOR[0-9][0-9][0-9] in the Search For text box, and click the Find All button.

The text DOR234 and DOR123 is highlighted, indicating that those are matches for the regular expression.

How It Works

The regular expression engine first looks for the literal character uppercase D. Each character is examined in turn to determine if there is or is not a match.

If a match is found, the regular expression engine then looks at the next character to determine if the following character is an uppercase O. If that too matches, it looks to see if the third character is an uppercase R. If all three of those characters match, the engine next checks to see if the fourth character is a numeric digit. If so, it checks if the fifth character is a numeric digit. If that too matches, it checks if the sixth character is a numeric digit. If that too matches, the entire regular expression pattern is matched. Each match is displayed in OpenOffice.org Writer as a highlighted sequence of characters.

You can check the PartNumbers.txt file for lines that contain a match for the pattern:

DOR[0-9][0-9][0-9]

using the findstr utility from the command line, as follows:

findstr /N DOR[0-9][0-9][0-9] PartNumbers.txt

As you can see in Figure 3-12, lines containing the same two matching sequences of characters, DOR234 and DOR123, are matched. If the directory that contains the file PartNumbers.txt is not the current directory in the command window, you will need to adjust the path to the file accordingly.

Figure 3-12

The Komodo Regular Expression Toolkit can also be used to test the pattern DOR\d\d\d. As you can see in Figure 3-13, the test text DOR123 matches.

Now that you have looked at how to match sequences of characters, each of which occur exactly once, let’s move on to look at matching characters that can occur a variable number of times.

55