Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Regular Expressions in Perl

Basics of Perl Regular Expression Usage

This section illustrates straightforward uses of regular expressions in Perl, for those readers who are not fluent in Perl. This chapter does not provide a full tutorial on how to use Perl. If you have little or no knowledge of Perl, I suggest that if you want to use regular expressions in real Perl applications, you take time to study a book such as Perl For Dummies, by Paul Hoffman (Wiley 2003).

To use regular expressions in Perl, you must use one or more of the regular expression operators.

Using the Perl Regular Expression

Operators

The Perl regular expression operators interact intimately with regular expression patterns. The following table lists and briefly describes the regular expression operators.

Operator

Description

 

 

m//

Used when matching a string against a regular expression

s///

Used when matching and then substituting a pattern

q// etc

Generalized quotes

split//

Splits a string into a list of strings

The simplest operator is the m// operator, which is used to test if there is a match between a string and a regular expression. As you will see later, the m of the m// operator isn’t essential in Perl code. However, I suggest that you use it routinely, because it makes clearer what is happening in the matching process.

Using the m// Operator

The m// operator is used together with the =~ operator to test whether a string contains a match for a specified regular expression.

Try It Out

Using the m// Operator

1.Create a new Perl file in Komodo 3.0 or in your chosen text editor, and edit the code to read as follows:

#!/usr/bin/perl -w use strict;

my $myString = “Hello world!”; if ($myString =~ m/world/)

{

print “There was a match.”;

}

667

Chapter 26

else

{

print “There was no match.”;

}

2.Save the code as SimpleMatch.pl.

3.Press F5, use the Browse button to select SimpleMatch.pl, and press Return to run the code in debug mode.

4.Inspect the results displayed in the Output pane (in the lower-right corner of the Komodo window), as shown in Figure 26-9. The displayed message simply states that a match was found.

Figure 26-9

How It Works

First, the string value Hello world! was assigned to the $myString variable, as before. Because the strict pragma is in force, you add a my before the variable name:

my $myString = “Hello world!”;

668

Regular Expressions in Perl

Then an if statement is used to determine whether a message indicating successful matching or failed matching is to be displayed. The test of the if statement is whether or not the $myString variable contains a match for the literal regular expression pattern world. The combination of the =~ operator and the m// operator can be read as matches.

Perl doesn’t have a Boolean datatype, but it behaves as though it does:

if ($myString =~ m/world/)

By default, matching in Perl is case sensitive.

If a match is found (there is a match, given the code in this example file), a message is displayed indicating that matching was successful. In Perl, the paired curly braces are required to enclose the statement block that is executed when the test returns the equivalent of true, even if only a single statement is to be executed:

{

print “There was a match.”;

}

If no match is found, a message to that effect is displayed. Again, the paired curly braces of the else clause are required, even though there is only a single statement in the else statement block:

else

{

print “There was no match.”;

}

The m// operator can be used with any of the regular expression matching modes that Perl supports. The following example shows how matching can be carried out case insensitively. The case-insensitive matching mode is indicated by a lowercase i following the second forward slash of the paired forward slashes that delimit the regular expression pattern:

$myTestString =~ m/world/i;

The example also introduces a very useful function, chomp, which you will use often in code that accepts input from the user.

Try It Out

Matching Case Insensitively

1.Type the following code in Komodo or your chosen text editor, and save the code as

MatchInsensitive.pl:

#!/usr/bin/perl -w use strict;

print “Enter a string. It will be matched against the pattern ‘/Star/i’.\n\n”; my $myTestString = <STDIN>;

chomp($myTestString);

if ($myTestString =~ m/Star/i)

{

669

Chapter 26

print “There is a match for ‘$myTestString’.”;

}

else

{

print “No match was found in ‘$myTestString’.”;

}

2.Either run the code inside Komodo 3.0 (by pressing F5, selecting MatchInsensitive.pl using the Browse button, and then pressing the Return key) or type perl MatchInsensitive.pl at the command line.

3.The first time that the code is run, enter the test string Startle, and press the Return key. Inspect the displayed message, as shown in Figure 26-10.

When entering text in the Komodo 3.0 Output pane, be sure that the focus has gone to the desired line. It is easy to type characters unintentionally into the Code pane, rather than the Output pane, with a resulting avalanche of syntax errors the next time you attempt to run the code.

Figure 26-10

670

Regular Expressions in Perl

4.Run the code again, enter the test string startle, and press the Return key. Inspect the displayed message. Again, a match is found, because the pattern Star, when matched case insensitively, matches the initial star of startle.

5.Run the code again, enter the test string Hello, and press the Return key. Inspect the displayed message, as shown in Figure 26-11.

Figure 26-11

How It Works

First, the print operator is used to display a message inviting the user to enter a test string:

print “Enter a string. It will be matched against the pattern ‘/Star/i’.\n\n”;

The variable $myTestString is assigned the sequence of characters that the user enters at the command line. The <STDIN> operator reads in a line of characters from the standard input. Typically, the standard input device is the keyboard. So <STDIN> reads a line of characters from the keyboard, ending when you press the Return key. One of the minor inconveniences about the line of characters provided by the standard input is that it includes the newline character. Perl treats a newline character as part of the character sequence to be matched. So the newline needs to be removed to achieve the matching behavior that you would likely expect:

my $myTestString = <STDIN>;

Perl provides the chomp operator to remove the newline character from the end of the sequence of characters that have been read in from the standard input:

chomp($myTestString);

The code file MatchInsensitiveLengths.pl, shown here and also included in the code download, displays the length of $myTestString before and after the chomp() function is used. Notice that when the test string is Startle, the length of the string is 8, one more than the number of visible characters. The newline character is the eighth character:

#!/usr/bin/perl -w use strict;

print “Enter a string. It will be matched against the pattern ‘/Star/i’.\n\n”; my $myTestString = <STDIN>;

my $myLength = length($myTestString);

print “The length before chomp() is $myLength.\n\n”; chomp($myTestString);

$myLength = length($myTestString);

print “The length after chomp() is $myLength.\n\n”;

671

Chapter 26

if ($myTestString =~ m/Star/i)

{

print “There is a match for ‘$myTestString’.\n\n”;

}

else

{

print “No match was found in ‘$myTestString’.”;

}

Figure 26-12 shows the screen’s appearance when you run MatchInsensitiveLengths.pl from the command line. Notice the length of the $myTestString before and after chomp() is used.

Figure 26-12

One of the difficulties for beginners when using Perl is that many constructs can be written in more than one way. The next couple of examples illustrate some of these variations, which you may meet when you have to handle code created by other developers.

The character m in the m// operator is, in fact, optional. I suggest, for the sake of clarity (the m hints at the idea of matching), that you use m// rather than just //, as in the following example.

Try It Out

Optional “m”

1.Type the following code in Komodo 3.0 or an alternative text editor, and save the code as

SimpleMatchNoM.pl:

#!/usr/bin/perl -w use strict;

my $myString = “Hello world!”; if ($myString =~ /world/)

{

print “There was a match.”;

}

else

{

print “There was no match.”;

}

2.Press F5 and then press the Return key to run the code.

3.Inspect the result. Because the behavior of matching with // instead of m// is no different, the screen’s appearance is the same as was shown in Figure 26-9.

672

Regular Expressions in Perl

The chomp() function is something you are likely to use frequently, because it is useful to remove the newline character that ends a line of user input. The following example shows an alternative syntax for chomp() which, while less obvious to occasional Perl programmers, is more succinct.

Try It Out

An Alternative chomp() Syntax

1.Type the following code in Komodo 3.0 or an alternative text editor, and save the code as

MatchAlternativeChomp.pl:

#!/usr/bin/perl -w use strict;

print “Enter a string. It will be matched against the pattern ‘/Star/i’.\n\n”; chomp (my $myTestString = <STDIN>);

if ($myTestString =~ m/Star/i)

{

print “There is a match for ‘$myTestString’.”;

}

else

{

print “No match was found in ‘$myTestString’.”;

}

2.Run the code inside Komodo or, at the command line, type perl MatchAlternativeChomp.pl.

3.Enter the test string Star Training, and press the Return key. Inspect the displayed results, as shown in Figure 26-13.

Figure 26-13

How It Works

The line of code:

chomp (my $myTestString = <STDIN>);

is functionally equivalent to:

my $myTestString = <STDIN>;

chomp ($myTestString);

The precedence of the assignment operator, =, means that the assignment happens first; then, when that assignment has taken place, the chomp() function is applied.

673

Chapter 26

There are also variants in how the print function can be used. It is possible to use the print operator conditionally in the following way. The following code is included in the file MatchAlternativeChomp2.pl in the code download:

print “Enter a string. It will be matched against the pattern ‘/Star/i’.\n\n”;

chomp (my $myTestString = <STDIN>);

The if statement is included in the same line as the print operator after the string to be printed:

print “There is a match for ‘$myTestString’.” if ($myTestString =~ m/Star/i);

The !~ operator in the test for the if statement means “There is not a match”:

print “There is no match for ‘$myTestString’.” if ($myTestString !~ m/Star/i);

It isn’t necessary to express the pattern to match against as a string. You have the option to match against a variable. Matching against a variable is useful when you want to match against the same pattern more than once in your code.

Try It Out

Matching Against a Variable

1.Type the following code in your chosen editor, and save the code as MatchUsingVariable.pl:

#!/usr/bin/perl -w use strict;

my $myPattern = “^\\d{5}(-\\d{4})?\$”; print “Enter a US Zip Code: “;

my $myTestString = <STDIN>; chomp ($myTestString);

print “You entered a Zip code.\n\n” if ($myTestString =~ m/$myPattern/);

print “The value you entered wasn’t recognized as a US Zip code.” if ($myTestString !~ m/$myPattern/);

2.Run the code in Komodo or at the command line. When prompted, enter the test string 12345, and inspect the displayed result.

3.Run the code again (F3 if you are using the Windows command line). When prompted, enter the test string 12345-6789, and inspect the displayed result.

4.Run the code again. When prompted, enter the test string Hello world! and inspect the result, as shown in Figure 26-14.

Figure 26-14

674