Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 12

Metacharacters Available

The metacharacters used in OpenOffice.org Writer have similarities to those used in Microsoft Word but are not identical to the range of wildcards in Word; neither are the metacharacters used in OpenOffice.org Writer identical to those found, for example, in Perl regular expressions. The following table summarizes the metacharacters supported in OpenOffice.org Writer. POSIX character classes that are supported in OpenOffice.org Writer are described separately in a later section.

In the following table, the term chunk is used to refer to a single character or a group contained in paired parentheses.

Metacharacter

Description

 

 

.

Matches almost any character, including many symbols and characters not

 

used in English. Used alone or with the ?, *., or + quantifiers.

?

A quantifier indicating zero or one occurrence of the preceding chunk.

*

A quantifier indicating zero or more occurrences of the preceding chunk.

+

A quantifier indicating one or more occurrences of the preceding chunk.

\n

Nonstandard usage. It matches a new line only when it was created using

 

Shift+Enter.

^

The beginning-of-line position.

$

The end-of-line position.

\t

Matches a tab character.

\<

Matches the beginning-of-word position.

\>

Matches the end-of-word position.

&

Nonstandard. Behaves similarly to a back reference.

[]

Character class.

|

Alternation. Matching is of the chunk that precedes or follows the |

 

metacharacter.

{n,m}

Quantifier syntax.

 

 

The following metacharacters are not supported:

\b

\s

\w

\d

284

Regular Expressions in StarOffice/OpenOffice.org Writer

Quantifiers

The use of quantifiers in OpenOffice.org Writer is standard. The ?, *, +, and {n,m} quantifiers are all available.

The test file, AandSomeBs.txt, is shown here:

ABC

ABBC

AC

A3C

ABBBBBBC

AbbCC

The pattern AB?C matches the character sequences ABC and AC, as shown in Figure 12-3. Each of the matched character sequences consists of an uppercase A, followed by zero or one uppercase B, followed by an uppercase C.

Figure 12-3

285

Chapter 12

If the pattern is edited to AB*C, the character sequences matched are ABC, ABBC, AC, and ABBBBBBC. Each of the matched character sequences consists of an uppercase A, followed by zero or more uppercase Bs, followed by an uppercase C.

If the pattern is edited to AB+C, the character sequence AC no longer matches, because it does not have an uppercase B. The pattern AB+C means “Match an uppercase A, followed by one or more uppercase Bs, followed by an uppercase C.”

If the pattern is edited to AB{2,4}C, only the character sequence ABBC matches, because it is the only character sequence in the test text that has an uppercase A, followed by between two and four uppercase Bs, followed by an uppercase C.

Modes

OpenOffice.org Writer supports both case-insensitive (the default) and case-sensitive matching. The Match Case check box is the interface tool that controls which mode is used in matching.

Character Classes

The implementation of character classes in OpenOffice.org Writer is pretty much standard. Ranges are supported, as are negated character classes.

OpenOffice.org Writer does not support the \d metacharacter, which matches numeric digits, or the \w metacharacter, which matches word characters. Therefore, the regular expressions author must use the corresponding character classes, [0-9] to match a numeric digit and [A-Za-z] to match both cases of alphabetic characters. The preceding character classes can be qualified by any of the quantifiers mentioned in the preceding “Quantifiers” section.

The following test text, ClassTest.txt is used in the Try It Out exercise that follows:

AB1

RD2

K9

993ABC

ABCDEFGHIJKLMNOPQRSTUVWXYZ

abcdefghijklmnopqrstuvwxyz

0123456789

Try It Out

Character Classes

1.

2.

Open OpenOffice.org Writer, and open the test file ClassTest.txt.

Open the Find & Replace dialog box using the Ctrl+F keyboard shortcut.

286

Regular Expressions in StarOffice/OpenOffice.org Writer

3.Check the Regular Expressions and Match Case check boxes.

4.In the Search For text box, type the pattern [0-9].

5.Click the Find All button, and inspect the results. As shown in Figure 12-4, all the numeric digits in the test document match the character class [0-9].

6.Edit the pattern in the Search For text box to [A-Z].

7.Click the Find All button, and inspect the results. As shown in Figure 12-5, all the uppercase alphabetic characters in the test document match the character class [A-Z].

8.Edit the pattern in the Search For text box to [a-z].

9.Click the Find All button, and inspect the results. All lowercase alphabetic characters should now be highlighted as matches of the new pattern.

10.Uncheck the Match Case check box.

11.Click the Find All button, and inspect the results. As shown in Figure 12-6, both lowercase and uppercase alphabetic characters are now highlighted as matches.

Figure 12-4

287

Chapter 12

Figure 12-5

288

Figure 12-6