Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Программирование

Файл:

Beginning Regular Expressions 2005.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

25.42 Mб

Скачать

☆

<<< < Предыдущая 113 114 115 116 117 118 119 120 121 122 123 124125 / 169125 126 127 128 129 130 131 132 133 134 135 136 137 > Следующая >>>

Visual Basic .NET and Regular Expressions

If you want to specify multiple options, you must separate the options with the word Or. So to specify case-insensitive matching that matches from right to left, you could use code such as the following:

myMatchCollection = Regex.Matches(inputString, myPattern, _

RegexOptions.IgnoreCase Or RegexOptions.RightToLeft)

Multiline Matching: The Effect on the ^ and $ Metacharacters

The ^ metacharacter normally matches the position before the first character at the beginning of a string, and the $ metacharacter normally matches the position after the last character at the end of a string.

When multiline matching is used, the ^ metacharacter matches the position before the first character at the beginning of each line, and the $ metacharacter matches the position after the last character on each line.

Inline Documentation Using the IgnorePatternWhitespace

Option

The IgnorePatternWhitespace option allows inline comments to be created that spell out the meaning of each part of the regular expression pattern.

Normally, when a regular expression pattern is matched, any whitespace in the pattern is significant. For example, a space character in the pattern is interpreted as a character to be matched. As a result of setting the IgnorePatternWhitespace option, all whitespace contained in the pattern is ignored, including space characters and newline characters. This allows a single pattern to be laid out to aid readability, to allow comments to be added, and to aid in maintenance of the regular expression pattern. To match a whitespace character, you can use the \s metacharacter.

In Visual Basic .NET, the syntax for adding inline comments is a little cumbersome. If you wanted to use the myRegex variable to match an alphabetic character followed by a numeric digit, you might typically write the following:

Dim myRegex = New Regex(“[A-Z]\d”)

However, to use the IgnorePatternWhitespace option to specify the same regular expression pattern and include comments inline, you must write something like the following:

Dim myRegex = New Regex(			_
“[A-Z] (?#		A character	class to match an uppercase alphabetic character)” & _
“\d	(?#	followed by	a numeric digit)”, & _

RegexOptions.IgnorePatternWhitespace)

The inline comments are preceded by the character sequence (?# and followed by a ) character. The Visual Basic .NET concatenation character, &, is used between the components of the pattern, and the line-continuation character (the underscore) is used to indicate that a statement is being continued on the following line.

505

Chapter 21

Try It Out

Using the IgnorePatternWhitespace Option

This example matches a U.S. Social Security number. The code contained in Module1.vb in the

IgnorePatternWhitespaceDemo project is shown here:

Imports	System.Text.RegularExpressions
Module Module1
Dim myRegex = _
New Regex _
	(“^	(?#	match the position before	the first character)” & _
	“\d{3} (?#		Three numeric digits, followed by)” & _
	“-	(?#	a literal hyphen)” & _
	“\d{2} (?#		then two numeric digits)”	& _
	“-	(?#	then a literal hyphen)” &	_
	“\d{4} (?#		then two numeric digits)”	& _
	“$	(?#	match the position after the last character)”, _

RegexOptions.IgnorePatternWhitespace) Sub Main()

Console.WriteLine(“Enter a string on the following line:”) Dim inputString = Console.ReadLine()

Dim myMatch = myRegex.Match(inputString) If myMatch.ToString.Length Then

Console.WriteLine(“The match, ‘“ & myMatch.Value & “‘ was found.”)

Else

Console.WriteLine(“There was no match”) End If

Console.WriteLine(“Press Return to close this application.”) Console.ReadLine()

End Sub

End Module

1.Create a new console application project in Visual Studio 2003. Name the project

IgnorePatternWhitespaceDemo.

2.Edit the code in the code window so that it matches the code in the preceding Module1.vb file. Save the code, and press F5 to run it.

3.At the command line, enter the test string 123-12-1234. Press Return, and inspect the results, as shown in Figure 21-11. Notice that there is a successful match.

Figure 21-11

4.In Visual Studio 2005, press F5 to run the code again.

5.At the command line, enter the test string A123-12-1234. Press Return, and inspect the results, as shown in Figure 21-12.

506

Visual Basic .NET and Regular Expressions

Figure 21-12

How It Works

The pattern to be matched, if written on a single line, is as follows:

^\d{3}-\d{2}-\d{4}

You may recognize this as a simple pattern that will match a valid U.S. Social Security number (SSN).

In this example, the pattern is written with inline comments when the myRegex variable is dimensioned. As you can see, it is much more complex to write the pattern in this way, but the inline comments make it easier for you or another developer to work out precisely what the pattern was intended to do.

The Visual Basic .NET syntax of (?# ... ) for inline comments is less clean than the simple # construct in Visual C# .NET, for example. I find that the Visual Basic .NET syntax tends to get in the way of readability of the comments. Lining up the left parentheses on each line helps maximize readability in Visual Basic .NET:

New Regex _
(“^	(?#	match the position before	the first character)” & _
“\d{3} (?#		Three numeric digits, followed by)” & _
“-	(?#	a literal hyphen)” & _
“\d{2} (?#		then two numeric digits)”	& _
“-	(?#	then a literal hyphen)” &	_
“\d{4} (?#		then two numeric digits)”	& _
“$	(?#	match the position after the last character)”, _
RegexOptions.IgnorePatternWhitespace)

The pattern is equivalent to ^\d{3}-\d{2}-\d{4}$ and so matches an SSN. Therefore, when the test string is 123-12-1234, there is a match, as indicated in Figure 21-11. This is under control of the If statement in the following code. When the Length property is not 0, a match has been found, so the myMatch variable’s Value property contains the matching sequence of characters:

If myMatch.ToString.Length Then

Console.WriteLine(“The match, ‘“ & myMatch.Value & “‘ was found.”)

When the Length property of myMatch.ToString is 0, no match has been found, and a message indicating that is output in the Else clause:

Else

Console.WriteLine(“There was no match”)

End If

Right to Left Matching: The RightToLeft Option

When using English, the normal progression of characters along a line is from left to right. In some other languages, the progression of characters is from right to left. To support use of regular expressions in such languages, the .NET Framework provides the functionality to conduct matching from right to

left. Unfortunately, my experience and that of others is that when using the RightToLeft option, the

matching behavior is not fully reliable.

507

Chapter 21

The Metacharacters Suppor ted in Visual Basic .NET

Visual Basic .NET has perhaps a more complete and extensive regular expressions implementation than any of the tools you have seen in earlier chapters of this book.

Much of the regular expression support in Visual Basic .NET can reasonably be termed standard. However, as with many Microsoft technologies, the standard syntax and techniques have been extended or modified in places.

The following table summarizes the metacharacters supported in Visual Basic .NET.

Metacharacter	Description

\d	Matches a numeric digit.
\D	Matches any character except a numeric digit.
\w	Equivalent to the character class [A-Za-z0-9_].
\W	Equivalent to the character class [^A-Za-z0-9_].
\b	Matches the position at the beginning of a sequence of \w characters
	or at the end of a sequence of \w characters. Colloquially, \b is
	referred to as a word-boundary metacharacter.
\B	Matches a position that is not a \b position.
\t	Matches a tab character.
\n	Matches a newline character.
\040	Matches an ASCII character, expressed in Octal notation. The
	metacharacter \040 matches a space character.
\x020	Matches an ASCII character, expressed in hexadecimal notation. The
	metacharacter \x020 matches a space character.
\u0020	Matches a Unicode character, expressed in hexadecimal notation
	with exactly four numeric digits. The metacharacter \u0020 matches
	a space character.
[...]	Matches any character specified in the character class.
[^...]	Matches any character but the characters specified in the character
	class.
\s	Matches a whitespace character.
\S	Matches any character that is not a whitespace character.
^	Depending on whether the MultiLine option is set, it matches the
	position before the first character in a line or the position before the
	first character in a string.

508

		Visual Basic .NET and Regular Expressions

	Metacharacter	Description

	$	Depending on whether the MultiLine option is set, it matches the
		position after the last character in a line or the position after the last
		character in a string.
	$number	Substitutes the character sequence matched by the last occurrence of
		group number number.
	${name}	Substitutes the character sequence matched by the last occurrence of
		the group named name.
	\A	Matches the position before the first character in a string. Its behavior
		is not affected by the setting of the MultiLine option.
	\Z	Matches the position after the last character in a string. Its behavior is
		not affected by the setting of the MultiLine option.
	\G	Specifies that matches must be consecutive, without any intervening
		nonmatching characters.
	?	A quantifier. Matches when there is zero or one occurrence of the pre-
		ceding character or group.
	*	A quantifier. Matches when there are zero or more occurrences of the
		preceding character or group.
	+	A quantifier. Matches when there are one or more occurrences of the
		preceding character or group.
	{n}	A quantifier. Matches when there are exactly n occurrences of the
		preceding character or group.
	{n,m}	A quantifier. Matches when there are at least n occurrences and a
		maximum of m occurrences of the preceding character or group.
	(substring)	Captures the contained substring.
	(?<name>substring)	Captures the contained substring and assigns it a name.
	(?:substring)	A non-capturing group.
	(?=...)	A positive lookahead.
	(?!...)	A negative lookahead.
	(?<=...)	A positive lookbehind.
	(?<!...)	A negative lookbehind.
	\N (where N is a number)	A back reference to a numbered group.
	\k<name>	A back reference that references a named back reference (same mean-
		ing as the following).
	\k’name’	A back reference that references a named back reference (same mean-
		ing as the preceding).
	!	Alternation.
	(?imnsx-imnsx)	An alternate technique to specify RegexOptions settings inline.

509

<<< < Предыдущая 113 114 115 116 117 118 119 120 121 122 123 124125 / 169125 126 127 128 129 130 131 132 133 134 135 136 137 > Следующая >>>

Соседние файлы в предмете Программирование

#
17.08.20132.9 Mб56Beginning Perl Web Development - From Novice To Professional (2006).pdf
#
17.08.20138.05 Mб109Beginning Programming for Dummies 2004.pdf
#
17.08.201315.78 Mб158Beginning Python (2005).pdf
#
17.08.201313.91 Mб134Beginning Python - From Novice To Professional (2005).pdf
#
17.08.201318.51 Mб233Beginning REALbasic - From Novice To Professional (2006).pdf
#
17.08.201325.42 Mб95Beginning Regular Expressions 2005.pdf
#
17.08.20137.52 Mб25Beginning SharePoint With Excel - From Novice To Professional (2006).pdf
#
17.08.201325.54 Mб67Beginning Ubuntu Linux - From Novice To Professional (2006).pdf
#
17.08.201314.97 Mб219Beginning Visual Basic 2005 (2006).pdf
#
17.08.201321.25 Mб386Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006).pdf
#
17.08.201338.67 Mб28Blog Design Solutions (2006).pdf