Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 22

Metacharacters Supported in Visual C# .NET

Visual C#.NET has a very complete and extensive regular expressions implementation, which exceeds in functionality many of the tools you saw in earlier chapters of this book.

Much of the regular expression support in Visual C# .NET can reasonably be termed standard. However, as with many Microsoft technologies, the standard syntax and techniques have been extended or modified in places.

The following table summarizes many of the metacharacters supported in Visual C# .NET.

Metacharacter

Description

 

 

\d

Matches a numeric digit.

\D

Matches any character except a numeric digit.

\w

Equivalent to the character class [A-Za-z0-9_].

\W

Equivalent to the character class [^A-Za-z0-9_].

\b

Matches the position at the beginning of a sequence of \w characters

 

or at the end of a sequence of \w characters. Colloquially, \b is

 

referred to as a word-boundary metacharacter.

\B

Matches a position that is not a \b position.

\t

Matches a tab character.

\n

Matches a newline character.

\040

Matches an ASCII character expressed in Octal notation. The

 

metacharacter \040 matches a space character.

\x020

Matches an ASCII character expressed in hexadecimal notation. The

 

metacharacter \x020 matches a space character.

\u0020

Matches a Unicode character expressed in hexadecimal notation with

 

exactly four numeric digits. The metacharacter \u0020 matches a

 

space character.

[...]

Matches any character specified in the character class.

[^...]

Matches any character but the characters specified in the character

 

class.

\s

Matches a whitespace character.

\S

Matches any character that is not a whitespace character.

^

Depending on whether the MultiLine option is set, matches the

 

position before the first character in a line or the position before the

 

first character in a string.

$

Depending on whether the MultiLine option is set, matches the

 

position after the last character in a line or the position after the last

 

character in a string.

 

 

542

 

 

C# and Regular Expressions

 

 

 

 

Metacharacter

Description

 

 

 

 

$number

Substitutes the character sequence matched by the last occurrence of

 

 

group number number.

 

${name}

Substitutes the character sequence matched by the last occurrence of

 

 

the group named name.

 

\A

Matches the position before the first character in a string. Its behavior

 

 

is not affected by the setting of the MultiLine option.

 

\Z

Matches the position after the last character in a string. Its behavior is

 

 

not affected by the setting of the MultiLine option.

 

\G

Specifies that matches must be consecutive, without any intervening

 

 

nonmatching characters.

 

?

A quantifier. Matches when there is zero or one occurrence of the pre-

 

 

ceding character or group.

 

*

A quantifier. Matches when there are zero or more occurrences of the

 

 

preceding character or group.

 

+

A quantifier. Matches when there are one or more occurrences of the

 

 

preceding character or group.

 

{n}

A quantifier. Matches when there are exactly n occurrences of the

 

 

preceding character or group.

 

{n,m}

A quantifier. Matches when there are at least n occurrences and a

 

 

maximum of m occurrences of the preceding character or group.

 

(substring)

Captures the contained substring.

 

(?<name>substring)

Captures the contained substring and assigns it a name.

 

(?:substring)

A non-capturing group.

 

(?=...)

A positive lookahead.

 

(?!...)

A negative lookahead.

 

(?<=...)

A positive lookbehind.

 

(?<!...)

A negative lookbehind.

 

\N where N is a number

A back reference to a numbered group.

 

\k<name>

A back reference that references a named back reference (same mean-

 

 

ing as the following).

 

\k’name’

A back reference that references a named back reference (same mean-

 

 

ing as the preceding).

 

!

Alternation.

 

(?imnsx-imnsx)

An alternative technique to specify RegexOptions settings inline.

 

 

 

543

Chapter 22

Using Named Groups

One of the features supported in the .NET Framework but not supported in many other regular expression implementations is the notion of named groups.

The syntax is (<nameOfGroup>pattern). Naming a group of characters can make understanding and maintenance of code easier than using numbered groups. For example, examine the following pattern:

${lastName}, ${firstName}

The purpose of this pattern in a replacement string is more easily understood than the purpose of the same replacement operation expressed as numbered, rather than named, groups:

${1}, ${2}

The following example reverses first name and last name using named groups.

Try It Out

Using Named Groups

1.Create a new project in Visual Studio 2003 using the Console Application template, and name the project NamedGroupsDemo.

2.In the code editor, add the following line after any default using statements:

using System.Text.RegularExpressions;

3.Enter the following code between the curly braces of the Main() method:

Console.WriteLine(@”This will find a match for the regular expression ‘^(?<firstName>\w+)\s+(?<lastName>\w+)$’.”); Console.WriteLine(“Enter a test string consisting of a first name then a last name.”);

string inputString;

inputString = Console.ReadLine();

string outputString = Regex.Replace(inputString, @”^(?<firstName>\w+)\s+(?<lastName>\w+)$”, “${lastName}, ${firstName}”); Console.WriteLine(“You entered the string: ‘“ + inputString +

“‘.”);

Console.WriteLine(“The replaced string is ‘“ + outputString + “‘.”);

Console.ReadLine();

4.Save the code, and press F5 to run it.

5.At the command line, enter the test string John Smith, and inspect the displayed result, as shown in Figure 22-15.

Figure 22-15

544

C# and Regular Expressions

How It Works

The content of the Main() method is explained here.

First, the pattern to be matched against is displayed, and the user is invited to enter a first name and last name. The pattern to be matched contains two named groups, represented respectively by

(?<firstName>\w+) and (?<lastName>\w+):

Console.WriteLine(@”This will find a match for the regular expression ‘^(?<firstName>\w+)\s+(?<lastName>\w+)$’.”);

Console.WriteLine(“Enter a test string consisting of a first name then a last name.”);

The inputString variable is declared; then the Console.ReadLine() method is used to capture the string entered by the user. That string value is assigned to the inputString variable:

string inputString;

inputString = Console.ReadLine();

The Regex class’s Replace() method is used statically, with three arguments. The first argument specifies the string in which replacement is to take place — in this case, the string specified by the inputString variable. The pattern to be used to match is specified by the second argument — in this case, the pattern ^(?<firstName>\w+)\s+(?<lastName>\w+)$. The third argument, which is formally a string value, uses the notation ${namedGroup} to represent each named group.

The ${firstName} group, not surprisingly, contains the alphabetic character sequence entered first, and the ${lastName} group contains the alphabetic character sequence entered second:

string outputString = Regex.Replace(inputString,

@”^(?<firstName>\w+)\s+(?<lastName>\w+)$”, “${lastName}, ${firstName}”);

The user is shown the string that was entered and the string produced when the Replace() method was applied:

Console.WriteLine(“You entered the string: ‘“ + inputString + “‘.”);

Console.WriteLine(“The replaced string is ‘“ + outputString + “‘.”);

Console.ReadLine();

Using Back References

Back references are supported in C# .NET. A typical use for back references is finding doubled words and removing them. The following example shows this.

Try It Out

Using Back References

1.Create a new project in Visual Studio 2003 using the Console Application template, and name the project BackReferenceDemo.

2.Add a using System.Text.RegularExpressions; statement.

545

Chapter 22

3.In the code editor, add the following code between the paired braces of the Main() method:

Console.WriteLine(“This example will find a doubled word.”); Console.WriteLine(“Using a backreference and the Replace() method the doubled word will be removed.”);

Console.WriteLine(“Enter a test string containing a doubled word.”);

string inputString;

inputString = Console.ReadLine();

string outputString = Regex.Replace(inputString, @”(\w+)\s+(\1)”, “${1}”);

Console.WriteLine(“You entered the string: ‘“ + inputString + “‘.”);

Console.WriteLine(“The replaced string is ‘“ + outputString + “‘.”);

Console.ReadLine();

4.Save the code, and press F5 to run it.

5.Enter the test string Paris in the the Spring (note the doubled the in the test string); press Return; and inspect the displayed information, as shown in Figure 22-16.

Figure 22-16

6.Press Return to close the application. In Visual Studio, press F5 to run the code again.

7.Enter the test string Hello Hello, press Return, and inspect the displayed information. Again, the doubled word is identified and replaced with a single occurrence of the same word.

How It Works

The Main() method code begins by displaying information to the user about the use of back references and invites the user to enter a string containing a doubled word:

Console.WriteLine(“This example will find a doubled word.”); Console.WriteLine(“Using a backreference and the Replace() method the doubled word will be removed.”);

Console.WriteLine(“Enter a test string containing a doubled word.”);

The inputString variable is declared. And the string that the user entered is assigned to the inputString variable:

string inputString;

inputString = Console.ReadLine();

The Regex class’s Replace() method is used statically and is applied to the inputString variable, and the result is assigned to the outputString variable.

546