Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 20

Again, the Boolean value returned by the Test() method is assigned to the MatchOrNot variable:

MatchOrNot = myRegexp.Test(TestString)

If there is a match, the MatchOrNot variable contains a value equivalent to Boolean True, so a message specifying the pattern and the match is added to the displayString variable:

If MatchOrNot Then

displayString = displayString & VBCrLf & “When the pattern is ‘“ & myRegExp.Pattern

&“‘ the input ‘“ _

&TestString & “‘ contains a match.”

But if there is no match, a message indicating that is added to the displayString variable:

Else

displayString = displayString & VBCrLf & “When the pattern is ‘“ & myRegExp.Pattern

&“‘ the input ‘“ _

&TestString & “‘ does not contain a match.” End If

Finally, the value of the displayString variable (which contains information about two attempted matches with two different values of the Pattern property) is displayed in a message box:

MsgBox displayString

End Function

When the test string is A99, it matches the pattern [A-Z]\d{2} and also matches the pattern

^[A-Z]\d{2}.

When the test string is A999, it matches the pattern [A-Z]\d{2} because there is an alphabetic character followed by two numeric digits. However, there is no match for the second pattern, ^[A-Z]\d{2}$,because there are three numeric digits, not two as required by the pattern, before the end-of-line position that matches the $ metacharacter.

The test string A2A fails to match either pattern because there is no alphabetic character followed by two numeric digits, as would be required by both patterns, [A-Z]\d{2} and ^[A-Z]\d{2}$.

Character Classes

VBScript has full support for character classes. The VBScript documentation does, however, refer to character classes as character sets.

To match any character from A through L, the character class [ABCDEFGHIJKL] can be used. Equally, a range can be used, [A-L].

Negated character classes can be used, too. The character class [^A-D] will match any character except A through D.

478

Regular Expressions and VBScript

Word Boundaries

VBScript supports the \b metacharacter to match the position either at the beginning or at the end of a sequence of word characters. Often, the sequence of alphanumeric characters will form what a human reader will view as a word, but regular expression engines do not have knowledge of the concept of a word. The \b metacharacter matches in one of two situations:

A position where the preceding character is contained in [A-Za-z0-9_] and the following character is contained in [^A-Za-z0-9_]. This is equivalent to the end of a word.

A position where the preceding character is contained in [^A-Za-z0-9_] and the following character is contained in [A-Za-z0-9_]. This is equivalent to the beginning of a word.

Lookahead

VBScript supports lookahead. Both positive and negative lookaheads are supported. The syntax for a positive lookahead is (?=theLookahead) and for a negative lookahead is (?!theNegativeLookahead).

The following example demonstrates both positive and negative lookahead. The test file,

Lookaheads.html, is shown here:

<html>

<head>

<title>Positive and Negative Lookahead</title> <script language=”vbscript” type=”text/vbscript”> Function MatchLookaheads

Dim myRegExp, TestName, Match, Matches, displayString displayString = “”

Set myRegExp = new RegExp

myRegExp.Pattern = “the(?=atre)” ‘matches, for example, the in theatre myRegExp.IgnoreCase = True

myRegExp.Global = True

TestString = InputBox(“Enter characters and numbers in the text box below.”) Set Matches = myRegexp.Execute(TestString)

displayString = displayString & “MATCH ATTEMPT 1: ‘the’ in ‘theatre’” & VBCrLf For Each Match in Matches

displayString = displayString & “Match found at position “ & Match.FirstIndex &

“.”

displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” & VBCrLf

Next

displayString = displayString & VBCrLf & VBCrLf

‘Begin a new match which produces a new Match collection. myRegExp.Pattern = “the(?!atre)” ‘matches the NOT in theatre Set Matches = myRegexp.Execute(TestString)

displayString = displayString & “MATCH ATTEMPT 2: ‘the’ not in ‘theatre’” & VBCrLf For Each Match in Matches

displayString = displayString & “Match found at position “ & Match.FirstIndex &

“.”

displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” & VBCrLf

479

Chapter 20

Next

MsgBox displayString

End Function

</script>

</head>

<body onload=”MatchLookaheads”>

</body>

</html>

Try It Out

Positive and Negative Lookahead

1.Open Lookaheads.html in Internet Explorer.

2.In the text box in the input box, enter the character sequence They love the theatre theatrically..

3.Click the OK button, and inspect the result displayed in the message box, as shown in Figure 20-15. The Match 1 section in the message box occurs when the pattern is the(?=atre). The Match 2 section in the message box occurs when the pattern is the(?!atre).

Figure 20-15

How It Works

When the page loads, the MatchLookaheads function is called:

<body onload=”MatchLookaheads”>

The code uses the RegExp object’s Execute() method twice. First, with the following value in the Pattern property:

myRegExp.Pattern = “the(?=atre)” ‘matches, for example, the in theatre

The Execute() method is executed:

Set Matches = myRegexp.Execute(TestString)

The displayString variable is assigned a label indicating that this is the first match attempt:

displayString = displayString & “MATCH ATTEMPT 1: ‘the’ in ‘theatre’” & VBCrLf

480

Regular Expressions and VBScript

For each Match object in the Matches collection (only one match in this case), information about the match is added to the displayString variable:

For Each Match in Matches

displayString = displayString & “Match found at position “ & Match.FirstIndex &

“.”

displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” &

VBCrLf

‘displayString = “” Next

displayString = displayString & VBCrLf & VBCrLf

The pattern being matched is the(?=atre), which matches the character sequence the when it is followed by the character sequence atre. This is positive lookahead. It matches the in, for example, theatre or theatres.

Then the second attempt at matching is made. The value of the Pattern property is assigned a pattern that includes a negative lookahead. This means that matches will occur when the character sequence the is not followed by the character sequence atre:

myRegExp.Pattern = “the(?!atre)” ‘matches the NOT in theatre

The Execute() method means that the former Matches collection is replaced with a new one. However, information about the former Matches collection has already been captured in the displayString variable for later displaying:

Set Matches = myRegexp.Execute(TestString)

Information about the second Matches collection is now added to the displayString variable:

displayString = displayString & “MATCH ATTEMPT 2: ‘the’ not in ‘theatre’” & VBCrLf For Each Match in Matches

displayString = displayString & “Match found at position “ & Match.FirstIndex &

“.”

displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” & VBCrLf

Next

The first match attempt matches the pattern the(?=atre), which includes the positive lookahead. In the test character sequence, They love the theatre theatrically., it matches the character sequence the in theatre. The Matches collection contains a single Match object.

The second match attempt matches the pattern the(?!atre), which includes the negative lookahead. In the test character sequence, They love the theatre theatrically., there are three matches. The second Matches collection, therefore, contains three Match objects corresponding to matches in The of They, the word the, and in the word theatrically. Notice that while matching is case insensitive, the case of the initial T of They is preserved.

481