Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 5

The next component of the pattern, [01], matches the numeric digits 0 or 1, because months always have 0 or 1 as the first digit in this date format. Similarly, the next component, the character class [0-9], matches any number from 0 through 9. This would allow numbers for the month such as 14 or 18, which are obviously undesirable. One of the exercises at the end of this chapter will ask you to provide a more specific pattern that would allow only values from 01 to 12 inclusive.

Next, the character class pattern [-./] matches a single character that is a hyphen, a period, or a forward slash.

Finally, the pattern [0123][0-9] matches days of the month beginning with 0, 1, 2, or 3. As written, the pattern would allow values for the day of the month such as 00, 34 or 38. A later exercise will ask you to create a more specific pattern to constrain values to 01 through 31.

Finding HTML Heading Elements

One potential use for characters classes is in finding HTML/XHTML heading elements. As you probably know, HTML and XHTML 1.0 have six heading elements: h1, h2, h3, h4, h5, and h6. In XHTML the h must be lowercase. In HTML it is permitted to be h or H.

First, assume that all the elements are written using a lowercase h. So it would be possible to match the start tag of all six elements, assuming that there are no attributes, using a fairly cumbersome regular expression with parentheses:

<(h1|h2|h3|h4|h5|h6)>

In this case the < character is the literal left angled bracket, which is the first character in the start tag. Then there is a choice of six two-character sequences representing the element type of each HTML/ XHTML heading element. Finally, a > is the final literal character of the start tag.

However, because there is a sequence of numbers from 1 to 6, you can use a character class to match the same start tags, either by listing each number literally:

<h[123456]>

or by using a range in the character class:

<h[1-6]>

The sample file, HTMLHeaders.txt, is shown here:

<h1>Some sample header text.</h1>

<h3>Some text.</h3>

<h6>Some header text.</h6>

<h4></h4>

<h5>Some text.</h5>

<h2>Some fairly meaningless text.</h2>

There is an example of each of the six headers.

132