Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Regular Expressions in StarOffice/OpenOffice.org Writer

The pattern in Step 8, ^.*Heaven.*Hell.*$, uses the $ metacharacter, which matches the end of a paragraph in OpenOffice.org Writer. The ^ metacharacter matches the beginning-of-paragraph position, the .* matches zero or more characters, the Heaven matches literally, the .* matches zero or more characters, the Hell matches literally, the .* matches zero or more characters, and the $ metacharacter matches the end-of-paragraph position. In effect, this means that if Heaven precedes Hell in a paragraph, there is a match.

Search-and-Replace Example

The following example illustrates a very practical use of regular expressions in OpenOffice.org Writer.

Online Chats

Information tools are changing very fast. Online chats can be one of the most useful places to keep up with cutting-edge information. However, unedited transcripts of many online chats are very difficult to read because the actual chat is often swamped by information about which chat participants are joining or leaving. Regular expressions can be useful to quickly clean such documents.

A highly simplified sample document, Interesting Chat.sxw, is shown here:

Some interesting chat A welcome message.

Some interesting information. Somebody says something interesting.

(Andrew Smith has joined the conversation (Jane Callander has left the conversation.

Another piece of real chat.

(Harry Danvers has joined the conversation (Carol Clairvoyant has left the conversation (Ceridwen Davies has joined the conversation. Another real comment.

The 8 in the preceding sample is the representation of the nonalphabetic character used by the chat software to flag the joining and leaving actions.

On a really busy chat, the joining and leaving information can totally dominate the real information. For example, when applying this technique to a real chat on a day I was writing this chapter. there were over 1,200 lines replaced in one chat transcript.

Figure 12-12 shows the visual appearance of the sample document. Notice the right-pointing arrow at the beginning of lines that contain information about joining and leaving.

The aim is to remove the extraneous information about joining and leaving, making the document easier to read so the theme of the chat can be better assimilated. The problem definition is as follows:

Delete all lines that contain information about individuals joining or leaving the chat.

297

Chapter 12

Figure 12-12

The chat software conveniently presents all joining and leaving information on a separate line, making the task straightforward. The problem definition for the sample chat transcript can be refined as follows:

Match all lines that begin with a special character that the chat software uses, followed by zero or more characters of any kind, followed by an end-of-line position. Replace all matches with nothing.

The real-life example I mentioned earlier was in a Microsoft Word document. Because Microsoft Word has no metacharacter to match a beginning-of-line or end-of-line character, it was more convenient to carry out the search and replace in OpenOffice.org Writer. I opened the document in OpenOffice.org Writer and used Writer’s more complete regular expression support to do what I wanted.

The default behavior of Writer when opening a Word document is to open it read-only. To edit the document, simply click the Edit button in the toolbar, and you will be asked if you want to edit the document. Choosing Yes opens a new Writer (.sxw) document on which you can use Writer regular expressions to clean up. You can then save the cleaned document in Word format, using the Save As option in Writer.

Try It Out

Tidying Up an Online Chat Transcript

1.Open OpenOffice.org Writer; then open the test file Interesting Chat.sxw.

2.Open the Find & Replace dialog box using the Ctrl+F keyboard shortcut.

3.Check the Regular Expressions and Match Case check boxes.

4.Highlight the right-arrow symbol on one line of text.

298

Regular Expressions in StarOffice/OpenOffice.org Writer

5.In the Search For text box, type the ^ character, paste in the right-arrow symbol, and then type

.*$. You should see the pattern shown in Figure 12-13 in the Search For text box. Notice that the pasted right arrow is displayed as a hollow square. Although the display is ambiguous, the matching proceeds correctly. Leave the Replace With text box blank.

6.Click the Find button once. The first line containing the right-arrow symbol is highlighted.

7.Click Replace once. The line that was highlighted after Step 6 is now blank.

8.Click the Replace All button once. All lines that contain the right-arrow symbol are now blank.

Figure 12-14 shows the appearance after Step 8. Notice that all the lines that previously contained the right-arrow symbol have been deleted.

9.In the Search For text box, enter the pattern ^$. Leave the Replace With text box blank.

10.Return the cursor to the beginning of the document. Click the Find button. The first blank line is now highlighted.

Figure 12-13

299

Chapter 12

Figure 12-14

11.Click the Replace All button so that all blank lines are now replaced, and inspect the results, as shown in Figure 12-15. Notice that all the lines that contained the right-arrow symbol (and therefore contained information about people joining or leaving) have now been deleted.

How It Works

The pattern created in Step 5 matches any line that begins with the right-arrow symbol. The ^ metacharacter matches the position at the beginning of a line. The right-arrow symbol matches itself. The pattern

.* matches zero or more characters. The $ metacharacter matches the position at the end of a line.

The chat transcript I used in real life had the right-arrow symbol as the first character of each line that contained joining or leaving information. Other chat clients may vary in how they treat lines that only contain joining or leaving information. You might, for example, have to insert a space character after the ^ metacharacter if the right-arrow symbol is preceded by a space.

300