Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 21

Lookahead and Lookbehind

Support for positive and negative lookahead and lookbehind in Visual Basic .NET is good. All four options are supported.

Positive lookahead uses the (?=theLookahead) syntax. To match the word Star when followed by a space character and the character sequence Training, you could use the following code:

Dim myRegex = New Regex(“Star(?= Training)”)

Dim myMatch = myRegex.Match(“The Star Training Company carries out great training.”)

Negative lookahead uses the (?!theLookahead) syntax. To match the character sequence Star when it is not followed by a space character and the character sequence Training, you could use the following code:

Dim myRegex = New Regex(“Star(?! Training)”)

Dim myMatch = myRegex.Match(“The Star Training Company carries out great training.”)

Positive lookbehind uses the (?<=theLookbehind) syntax. To match the character sequence Training when it is preceded by the character sequence Star followed by a space character, you could use the following code:

Dim myRegex = New Regex(“(?<=Star )Training)”)

Dim myMatch = myRegex.Match(“The Star Training Company carries out great training.”)

Negative lookbehind uses the (?<!theLookbehind) syntax. To match the character sequence Training when it is not preceded by the character sequence Star followed by a space character, you could use the following code:

Dim myRegex = New Regex(“(?<!Star )Training)”)

Dim myMatch = myRegex.Match(“The Star Training Company carries out great training.”)

Exercises

1.Specify a pattern that will match the character sequence old only when it is part of a word such as cold or bold. Hint: Provide two solutions, one of which uses lookbehind and lookahead.

2.Create a console application that replaces the character sequence Doctor or Doc with the character sequence Dr..

510

22

C# and Regular Expressions

Microsoft Visual C# .NET provides extensive, powerful, and flexible support for regular expression functionality. Visual C# .NET provides support comparable to Perl version 5, plus some extensions that are essentially specific to the .NET Framework (for example, right-to-left matching). The implementations of regular expressions are essentially playing an ongoing game of catch-up, and it is likely that at least some other languages will also implement features such as right-to-left matching in time.

In this chapter, you will learn how to do the following:

Use the objects contained in the System.Text.RegularExpresssions namespace

Use the metacharacters supported in C#

Examples shown in this chapter have been tested with Visual Studio 2003 and the

.NET Framework 1.1. I will assume that you have access to a copy of Visual Studio 2003 and have a working knowledge of at least the basics of Visual C# .NET. It isn’t the intent of this chapter to provide a tutorial on the basics of using Visual Studio

.NET 2003.

However, if you do not have access to a copy of Visual Studio 2003, there are copies of the .exe files you can run, although you won’t be able to view and edit the Visual C# .NET code if you use the .exe files.

The regular expression functionality in C# is based on the classes in the System.Text

.RegularExpressions namespace. Those classes will be explained in some detail, including several examples of how the classes, their properties, and methods can be used in code.

Chapter 22

The Classes of the System.Text

.RegularExpressions namespace

The regular expressions support in the .NET Framework class library is contained in the

System.Text.RegularExpressions namespace.

An Introductory Example

This example demonstrates the basics of one way to use regular expressions when using Visual C#. Other techniques are discussed and demonstrated later in the chapter, when the classes of the System.Text.RegularExpressions namespace and their members are discussed in more detail.

Try It Out

An Introductory C# Console Application Example

The following code is contained in Class1.cs in the SimpleMatch project:

using System;

using System.Text.RegularExpressions;

namespace SimpleMatch

{

///<summary>

///This is a simple regular expression example which uses the Regex object.

///</summary>

class Class1

{

///<summary>

///The main entry point for the application.

///</summary>

[STAThread]

static void Main(string[] args)

{

Console.WriteLine(@”This will find a match for the regular expression ‘[A-Z]\d’.”);

Console.WriteLine(“Enter a test string now.”);

Regex myRegex = new Regex(@”[A-Z]\d”, RegexOptions.IgnoreCase); string inputString;

inputString = Console.ReadLine();

Match myMatch = myRegex.Match(inputString); Console.WriteLine(“You entered the string: ‘“ + inputString +

“‘.”);

if (myMatch.Success)

Console.WriteLine(“The match ‘“ + myMatch.ToString() + “‘ was found in the string you entered.”);

Console.ReadLine();

}

}

}

512

C# and Regular Expressions

The following instructions walk you through all the steps necessary to create a simple console application in Visual Studio 2003 using Visual C# .NET. If you have done much programming in C#, you will find most of the steps pretty self-evident.

1.Open Visual Studio 2003, and from the File menu, select New; then select Project to create a new solution that contains a single project.

Figure 22-1 shows the appearance of the Project screen, but with the choices specified in Steps 2 through 5 already made.

Figure 22-1

2.In the Project Types pane, select Visual C# Projects.

3.In the Templates pane, select Console Application.

4.In the Name text box, type SimpleMatch as the name of the project.

5.In the Location text box, type C:\BRegExp\Ch22 as the location (or select another location, if you prefer).

513

Chapter 22

6.Click the OK button. After a short pause while Visual Studio 2003 is creating the files needed for the project, the code editor will open with the following template code already in place:

using System;

namespace SimpleMatch

{

///<summary>

///Summary description for Class1.

///</summary>

class Class1

{

///<summary>

///The main entry point for the application.

///</summary>

[STAThread]

static void Main(string[] args)

{

//

// TODO: Add code to start application here

//

}

}

}

7.Edit the preceding template code so that it contains the code shown earlier in Class1.cs.

8.Save the code using Ctrl+Shift+S. Press F5 to run the code.

9.At the command line, enter the test text K9. Then press Return. Inspect the results, as shown in Figure 22-2.

Figure 22-2

How It Works

When using C# you must specify the components of the .NET Framework class library that you are using. Visual Studio 2003 automatically adds the following line when the file Class1.cs is created:

using System;

And because you are using the Regex class from the System.Text.RegularExpresssions namespace, it is appropriate to add a using statement referencing that namespace, too:

using System.Text.RegularExpressions;

514

C# and Regular Expressions

The alternative approach is to use fully qualified names when referring to an object. For example, with the using System.Text.RegularExpressions; statement in the code, you can simply write the following to declare the myRegex object variable and assign it a value:

Regex myRegex = new Regex(@”[A-Z]\d”, RegexOptions.IgnoreCase);

If the using System.Text.RegularExpressions; statement is missing, and you attempt to run the code, you will receive a bundle of error messages, including the following, because the Regex class is not found in the System namespace, the only namespace that is declared by the default template code created by Visual Studio 2003:

The type or namespace name ‘Regex’ could not be found.

So to declare the myRegex variable and assign it a value, you would have to write the following code, using fully qualified names, because the Regex and RegexOptions classes are contained in the

System.Text.RegularExpressions namespace:

System.Text.RegularExpressions.Regex myRegex = new

System.Text.RegularExpressions.Regex(@”[A-Z]\d”,

System.Text.RegularExpressions.RegexOptions.IgnoreCase);

Similarly, it would be necessary to write the following to declare the myMatch object variable and assign it a value:

System.Text.RegularExpressions.Match myMatch = myRegex.Match(inputString);

In all but the most trivial code, it is easier to write and read code when the using System.Text.RegularExpressions; statement is present.

There are automatically generated stubs for documentation comments in Class1.cs and an automatically generated namespace corresponding to the project name and a class name — by default, Class1.

The content of the Main() method is where the work of this simple example is carried out:

static void Main(string[] args)

{

First, a message is written to the command window using the Console object’s WriteLine() method. The Console class is a member of the System namespace, which has already been referenced using the using System; statement, so you can simply write Console.Writeline() with appropriate content between the parentheses:

Console.WriteLine(@”This will find a match for the regular expression ‘[A-Z]\d’.”);

Notice that the first character inside the parentheses of the WriteLine() method is an @ character. This is used because without it, an error would be reported, because C# is unable to recognize the character sequence \d. In the absence of the @ character, you would have to write the string in the double quotes as “This will find a match for the regular expression ‘[A-Z]\\d’.”. In other words, you must write \\d for C# to recognize this as meaning the regular expression metacharacter \d.

515

Chapter 22

Personally, I prefer adding the @ character, because I can then use the familiar regular expression syntax that I use in other languages. Because I use regular expressions across various languages and tools, I tend to avoid the double-backslash notation.

Next, a straightforward information string is output:

Console.WriteLine(“Enter a test string now.”);

Next, an object variable, myRegex, is declared as inheriting from the Regex class. As explained earlier, writing Regex is a convenient abbreviation for the fully qualified name System.Text.RegularExpressions

.Regex. The regular expression pattern [A-Z]\d is the first argument for the Regex() constructor and specifies that pattern as the pattern against which matching will take place. The second argument of the Regex() constructor specifies that the option of case-insensitive matching is to be used:

Regex myRegex = new Regex(@”[A-Z]\d”, RegexOptions.IgnoreCase);

Next, a string variable, inputString, is declared:

string inputString;

The Console class’s ReadLine() method is used to read the text entered by the user. The value read is assigned to the inputString variable:

inputString = Console.ReadLine();

The object variable myMatch is declared as inheriting from the Match class. The value assigned to the myMatch variable is specified using the Regex class’s Match() method with the inputString variable as its argument. In other words, the myMatch variable contains the first match found in the inputString variable using the regular expression pattern [A-Z]\d that was assigned earlier to the myRegex variable:

Match myMatch = myRegex.Match(inputString);

Now that you have a match, you first output the value of the inputString variable to remind or inform the user of the string that was captured using the Console.ReadLine() method:

Console.WriteLine(“You entered the string: ‘“ + inputString + “‘.”);

An if statement is used that tests the value of the Success property of the myMatch object variable. If a match has been found (as indicated by the value of the Success property), a string is output using Console.WriteLine() to inform the user of the content of the match:

if (myMatch.Success)

Console.WriteLine(“The match ‘“ + myMatch.ToString() + “‘ was found in the string you entered.”);

The ReadLine() method is used so that the displayed match remains on-screen until the user presses the Return key:

Console.ReadLine();

}

516