Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Visual Basic .NET and Regular Expressions

For each Match object, the position and value of the match are displayed using the Index property and

ToString() method:

Console.WriteLine(“At position {0}, the match ‘{1}’ was found”,

myMatch.Index, myMatch.ToString)

The Match object’s Groups property is assigned to the myGroupCollection variable:

myGroupCollection = myMatch.Groups

The inner For Each loop processes each Group object contained in the GroupCollection object:

For Each myGroup In myGroupCollection

The value and position of each group are displayed using the values of the Group object’s Value and Index properties:

Console.WriteLine(“Group containing ‘{0}’ found at position ‘{1}’.”, myGroup.Value, myGroup.Index)

Next Console.WriteLine()

Next

The first group output contains the group corresponding to the entire regular expression pattern. That group is present in all successful matching processes. Additional groups occur in the GroupCollection when paired parentheses occur in the regular expression pattern. Because the regular expression pattern ([A-Z])(\d+) has two pairs of parentheses, each GroupCollection contains three groups. You can see in Figure 21-8 that three groups are displayed for each match.

The CaptureCollection and Capture Class

The CaptureCollection object contains a collection of one or more Capture objects. Each Capture object represents the content of one capturing group of paired parentheses.

There is no public constructor for an instance of the Capture object. And the Capture object is immutable. It can be created only by a matching process, and each Capture object is part of the

CaptureCollection collection.

Try It Out

The CaptureCollection Object and the Capture Class

The code contained in Module1.vb of the CapturesDemo project is shown here:

Imports System.Text.RegularExpressions

Module Module1

Sub Main()

Dim myRegex = New Regex(“([A-Z])+(\d)+”)

Console.WriteLine(“Enter a string on the following line:”)

Dim inputString = Console.ReadLine()

Dim myMatchCollection = myRegex.Matches(inputString)

Console.WriteLine()

499

Chapter 21

Console.WriteLine(“There are {0} matches.”, myMatchCollection.Count) Console.WriteLine()

Dim myMatch As Match

Dim myGroupCollection As GroupCollection Dim myGroup As Group

For Each myMatch In myMatchCollection

Console.WriteLine(“At position {0}, the match ‘{1}’ was found”, myMatch.Index, myMatch.ToString)

Console.WriteLine(“This match has {0} groups.”, myMatch.Groups.Count) myGroupCollection = myMatch.Groups

For Each myGroup In myGroupCollection

Dim myCaptureCollection As CaptureCollection = myGroup.Captures Dim myCapture As Capture

Console.WriteLine(“Group containing ‘{0}’ found at position ‘{1}’.”, myGroup.Value, myGroup.Index)

For Each myCapture In myCaptureCollection

Console.WriteLine(“

Capture: ‘{0}’ at position ‘{1}’.”,

myCapture.Value, myCapture.Index)

 

Next

 

Next

 

Console.WriteLine()

 

Next

Console.WriteLine()

Console.WriteLine(“Press Return to close this application.”)

Console.ReadLine()

End Sub

End Module

1.Create a new project in Visual Studio 2003 based on a console application template. Name the project CapturesDemo.

2.Edit the default module to match the preceding code. Save the code, and press F5 to run it.

3.In the command window, enter the test string ABC1 A123.

4.Press Return and inspect the results, as shown in Figure 21-9.

Figure 21-9

500

Visual Basic .NET and Regular Expressions

How It Works

The regular expression pattern for this example is importantly but subtly changed from the pattern in the preceding example:

Dim myRegex = New Regex(“([A-Z])+(\d)+”)

Notice that the character class [A-Z] is enclosed in paired parentheses and that the group is qualified by the + quantifier. That means that there is a group that captures a single character. If there is one uppercase alphabetic character in the test string, one group and one capture are created. If there are multiple alphabetic characters, multiple groups and multiple captures are created.

Similar considerations apply to the numeric part of the pattern, because (\d) creates a group that can occur one or more times, depending on the content of the test string. The pattern (\d)+ creates a group for each numeric digit captured, which is different from the pattern (\d+), which creates one group, whether there is one numeric digit or ten.

After accepting the user’s test string, ABC1 A123, the matches, groups, and captures are processed inside three nested For Each loops:

For Each myMatch In myMatchCollection

First, the value of each match is displayed:

Console.WriteLine(“At position {0}, the match ‘{1}’ was found”,

myMatch.Index, myMatch.ToString)

Then the number of groups for that match is displayed:

Console.WriteLine(“This match has {0} groups.”, myMatch.Groups.Count)

The myGroupCollection variable is assigned the Groups property of the myMatch variable:

myGroupCollection = myMatch.Groups

Each Group object in the GroupCollection object is processed next:

For Each myGroup In myGroupCollection

Each capture for the group is assigned to the myCaptureCollection variable:

Dim myCaptureCollection As CaptureCollection = myGroup.Captures

Dim myCapture As Capture

Each group is displayed. You may be surprised to see that only the final occurrence of a group that occurs more than once is displayed:

Console.WriteLine(“Group containing ‘{0}’ found at position

‘{1}’.”, myGroup.Value, myGroup.Index)

501

Chapter 21

Then each capture in the captures collection is displayed. Because each pair of parentheses captures only a single character, a character sequence such as ABC results in three captures being displayed:

For Each myCapture In myCaptureCollection

Console.WriteLine(“

Capture: ‘{0}’ at position ‘{1}’.”,

myCapture.Value, myCapture.Index)

 

The For Each loops each conclude with a Next statement:

Next

Next

Console.WriteLine()

Next

The RegexOptions Enumeration

The System.Text.RegularExpressions namespace includes a RegexOptions enumeration that controls the modes of operation of regular expression matching.

The following table summarizes the features of the RegexOptions enumeration.

Option

Description

 

 

None

Specifies that no options are set.

IgnoreCase

Specifies that matching is case insensitive.

Multiline

Treats each line as a separate string for matching purposes.

 

Therefore, the meaning of the ^ metacharacter is changed

 

(matches the beginning of each line position), as is the $

 

metacharacter (matches the end of each line position).

ExplicitCapture

Changes the capturing behavior of parentheses.

Compiled

Specifies whether or not the regular expression is compiled

 

to an assembly.

SingleLine

Changes the meaning of the period metacharacter so that it

 

matches every character. Normally, it matches every charac-

 

ter except \n.

IgnorePatternWhitespace

Interprets unescaped whitespace as not part of the pattern.

 

Allows comments inline preceded by #.

RightToLeft

Specifies that pattern matching proceeds from right to left.

ECMAScript

Enables (limited) ECMAScript compatibility.

CultureInvariant

Specifies that cultural differences in language are ignored.

 

 

Case-Insensitive Matching: The IgnoreCase Option

In Visual Basic .NET regular expressions, the default matching mode is case sensitive. To specify that matching be carried out in a case-insensitive way, the IgnoreCase option is used.

502

Visual Basic .NET and Regular Expressions

Try It Out

Case-Insensitive Matching

The code in Module1.vb in the IgnoreCaseDemo project is shown here:

Imports System.Text.RegularExpressions

Module Module1

Sub Main()

Dim myPattern As String = “[A-Z]+\d+” Console.WriteLine(“Enter a string on the following line:”) Dim inputString = Console.ReadLine()

Dim myMatchCollection = Regex.Matches(inputString, myPattern) Console.WriteLine(“This is case sensitive matching.”) Console.WriteLine(“There are {0} matches.”, myMatchCollection.Count) Console.WriteLine()

Dim myMatch As Match

For Each myMatch In myMatchCollection

Console.WriteLine(“At position {0}, the match ‘{1}’ was found”, myMatch.Index, myMatch.ToString)

Next Console.WriteLine()

myMatchCollection = Regex.Matches(inputString, myPattern, RegexOptions.IgnoreCase)

Console.WriteLine(“This is case insensitive matching.”) Console.WriteLine(“There are {0} matches.”, myMatchCollection.Count) Console.WriteLine()

For Each myMatch In myMatchCollection

Console.WriteLine(“At position {0}, the match ‘{1}’ was found”, myMatch.Index, myMatch.ToString)

Next Console.WriteLine()

Console.WriteLine(“Press Return to close this application.”) Console.ReadLine()

End Sub

End Module

1.Create a new project in Visual Studio 2003 using the console application template. Name the project CaseInsensitiveDemo.

2.Edit the code so that it reads the same as the preceding Module1.vb.

3.Save the code; then run it using the F5 key.

4.In the command window, enter the test text ABC123 abc123 DeF234.

5.Press the Return key, and inspect the results, as shown in Figure 21-10. Notice that when matching is case sensitive, there are two matches, and there are three matches when matching is case insensitive. Notice, too, that the case-sensitive match against DeF234 is F234, while the caseinsensitive match is DeF234.

503

Chapter 21

Figure 21-10

How It Works

The pattern assigned to the myPattern string (notice that it isn’t a Regex object) is [A-Z]+\d+. This, using the default case-sensitive matching in .NET, would match only when the user entered uppercase alphabetic characters:

Dim myPattern As String = “[A-Z]+\d+”

The MatchCollection object corresponding to the myMatchCollection variable is created using the shared Matches() method of the Regex class. No Regex object is instantiated. Notice that the Matches() method takes two arguments on this occasion (when case-sensitive matching is applied), the second argument being the string value containing the regular expression pattern:

Dim myMatchCollection = Regex.Matches(inputString, myPattern)

The character sequence ABC123 matches because only uppercase alphabetic characters are contained in the character sequence. The character sequence abc123, by the same measure, does not match. In the character sequence DeF234, the character sequence F234 matches because it contains one uppercase character followed by three numeric digits.

After the results of case-sensitive matching have been displayed, a new collection of Match objects is assigned to the myMatchCollection variable. Notice that on this occasion, the Matches() method takes three arguments. The third argument specifies one of the properties of the RegexOptions object, in this case, the IgnoreCase property:

myMatchCollection = Regex.Matches(inputString, myPattern,

RegexOptions.IgnoreCase)

When matching is case insensitive, the three character sequences, ABC123, abc123, and DeF234 all match the pattern [A-Z]+\d+.

504