Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

2

Regular Expression Tools and an Approach to Using Them

This chapter takes a preliminary look at many of the tools with which you can use regular expressions. That basic familiarity with the tools will allow for a progressive development of a range of examples over the next several chapters, so that the individual aspects of regular expressions can be demonstrated in examples drawn from several possible usage scenarios.

Many tools that can be used on the Windows platform have at least some support for regular expressions — or, to use a term found in Word and some other programs, wildcards. The tools described in the following sections are discussed in more detail in later chapters of this book.

The second part of the chapter takes you step by step through a pretty rigorous approach to using regular expressions. A systematic approach to regular expressions, other than the very simplest ones, can help you ensure that the regular expressions that you create do what you want them to do. It also helps make maintenance of those regular expressions easier and more efficient, particularly when you carefully document the regular expressions that you create.

In this chapter, you will learn the following:

How to use some regular expression tools

How regular expressions are used in some popular programming languages and database management systems

Regular Expression Tools

This section introduces several utilities, tools, and languages with which you can use regular expressions on the Windows platform. Some of the tools mentioned (for example, MySQL) can be used on several platforms, not only on Windows. The brief introductory descriptions in this

Chapter 2

chapter apply to the behavior of those tools on the Windows platform. Note that different versions of some tools differ a little in their behavior between versions designed for particular platforms.

In the examples that follow, it is assumed that you have downloaded the sample code for the book and have installed it in the directory C:\BRegExp\Ch02. If you have installed the code in some other location, you will need to make appropriate adjustments to the instructions given.

One of the issues discussed on many occasions in this book is the variation among tools in how they implement regular expressions. These introductory descriptions mention some variations in implementations or nonstandard usages, but fuller descriptions of these differences are found in later chapters.

findstr

The findstr utility is a command-line utility found in several versions of Windows. To run the findstr utility in Windows XP, simply open a command prompt window and type the following at the command-line prompt:

findstr /?

You should see an image similar to that shown in Figure 2-1.

Figure 2-1

If you attempt to type simply findstr at the command line, you will likely receive a bad-command-line error message, because you haven’t supplied necessary command-line parameters.

22

Regular Expression Tools and an Approach to Using Them

In the C:\BRegExp\Ch02 directory are a couple of test files that you can use. The first is named Test.txt, and its contents are shown here:

test text

tent teat

You can use the findstr utility to locate any line in the same directory that contains the text tent using the following command:

findstr /N tent *.*

Strictly speaking, the tent on the command line is a regular expression pattern. The /N switch causes the line number to be displayed for each line that contains a match, together with the filename and the matching text. So the result for the preceding command is shown in Figure 2-2.

Figure 2-2

As you can see in Figure 2-2, the filename is displayed first, followed by the number of the line that contains a match for the literal regular expression pattern tent, followed by the text contained on the line that has a match for the regular expression pattern tent.

There are some limitations in how you can use regular expressions with the findstr utility. For example, the findstr utility lacks a metacharacter that quantifies a single optional occurrence of a character. For some purposes, the metacharacter that specifies an optional character but allows that character to occur more than once will provide appropriate functionality. At other times, it may allow undesired matches.

The use of regular expressions with the findstr utility is described in more detail in Chapter 13.

Microsoft Word

Microsoft Word provides wildcards, which are an incomplete and nonstandard implementation of a few fairly simple pieces of regular expression functionality.

Wildcards differ from many regular expression implementations in the range of metacharacters that are supported, for example, but wildcards have the same purpose of attempting to match patterns of characters in text.

23

Chapter 2

To make use of wildcard functionality in Microsoft Word, use the keyboard shortcut Ctrl+F to open the Find and Replace dialog box. By default, search functionality in Word simply uses literal text for search. Turn on the wildcard functionality by checking the appropriate check box. To access that check box you need to click the More button in the Find and Replace window (see Figure 2-3, which shows its appearance in Word 2003).

Figure 2-3

Further options are then displayed in the Find and Replace dialog box. To use the wildcard functionality, click the Use Wildcards check box, as shown in Figure 2-4.

Figure 2-4

For example, you can search for sequences of characters where there is a single different character between words. You might want to find the following words, which are contained in the sample file ight.txt:

24

Regular Expression Tools and an Approach to Using Them

right

sight

might

light

You could find them all in a Microsoft Word document using the following regular expression pattern:

?ight

The question mark is a nonstandard Word wildcard that stands for a single alphanumeric character.

Word allows only one match to be highlighted at a time. Figure 2-5 shows the result of using the regular expression pattern ?ight to search the document one match at a time. Figure 2-6 shows the match of sight, which is highlighted after you click the Find Next button twice.

Figure 2-5

25

Chapter 2

Figure 2-6

Try It Out

Find a Match Using Word

To try out this simple example, follow these instructions:

1.Open Microsoft Word.

2.Open the file ight.txt.

3.Use the Ctrl+F keyboard shortcut to open the Find and Replace dialog box

4.Click the More button.

5.Check the Use Wildcards check box.

6.Type ?ight in the Find What text box.

7.Click the Find Next button to highlight the first match.

8.Click the Find Next button again to find other matches.

Each of the four words in the file ight.txt is matched in turn.

26