Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

 

 

Regular Expressions in Perl

 

 

 

 

Metacharacter

Description

 

 

 

 

\b

Matches a word boundary — in other words, the position between a word

 

 

character ([A-Za-z0-9_]) and a nonword character.

 

[...]

Character class. It matches one character of the set of characters inside the

 

 

square brackets.

 

[^...]

Negated character class. It matches one character that is not in the set of

 

 

characters inside the square brackets.

 

\A

A positional metacharacter that always matches the position before the first

 

 

character in the test string.

 

\Z

A positional metacharacter that matches after the final non-newline charac-

 

 

ter on a line or in a string.

 

\z

A positional metacharacter that always matches the position after the last

 

 

character in a string, irrespective of mode.

 

(?= ...)

Positive lookahead.

 

(?! ...)

Negative lookahead.

 

(?<= ...)

Positive lookbehind.

 

(?<! ...)

Negative lookbehind.

 

\p{charClass}

Matches a character that is in a specified Unicode character class or block.

 

\P{charClass}

Matches a character that is not in a specified Unicode character class or

 

 

block.

 

 

 

Using Quantifiers in Perl

Perl supports a fairly typical range of quantifiers.

The ? metacharacter matches the preceding character or group zero or one times. In other words, the preceding character or group is optional. To match bat and bats, you can use the pattern bats?. The ? metacharacter indicates that the s is optional.

The * metacharacter matches the preceding character or group zero or more times. In other words, the character or group can occur zero times or any number of times greater than zero. The pattern AB* will match the following character sequences, A, AB, ABB, ABBB, and so on.

The + metacharacter matches the preceding character or group one or more times. In other words, the character or group must occur at least one time but can occur any number of times greater than one. The pattern AB+ will match the character sequences AB, ABB, ABBB, and so on. But it will not match A, because there must be at least one B character for matching to succeed.

To match any of the ?, *, or + metacharacters, simply add a backslash character before the quantifier. So you would write \?, \*, and \+, respectively.

685

Chapter 26

The quantifier syntax, which uses curly braces, is also available. The pattern [A-Z]\d{3} will match if there are exactly three numeric digits following an uppercase alphabetic character. The pattern [A-Z]\d{1,3} will match between one and three digits following an uppercase alphabetic character. So it will match A1, A12, and A123.

The pattern [A-Z]\d{2,} will match an uppercase alphabetic character followed by two or more numeric digits. So it will match A12, A123, A1234, A12345, and so on. But it will not match A1, because there must be at least two numeric digits for a successful match.

Using Positional Metacharacters

Perl supports both the ^ and $ positional metacharacters. The ^ metacharacter matches the position immediately before the first character of a line or string. The $ metacharacter matches the position immediately after the last non-newline character of a line or string.

The \A positional metacharacter matches the position immediately before the start of a string.

The \z positional metacharacter matches the position immediately after the last character of a string.

Try It Out

Using Positional Metacharacters

1.Type the following code into your chosen text editor:

#!/usr/bin/perl -w use strict;

print “\nThis example demonstrates the use of the ^ and \$ positional metacharacters.\n\n”;

my $myPattern = “cape”;

my $myTestString = “escape”;

print “In ‘$myTestString’ there is a match for the pattern ‘$myPattern’.\n\n” if ($myTestString =~ m/$myPattern/);

$myPattern = “^cape”;

print “When the pattern is ‘$myPattern’ there is no match for ‘$myTestString’.\n\n” if ($myTestString !~ m/$myPattern/);

$myPattern = “cape\$”;

print “But there is a match for ‘$myTestString’ when the pattern is ‘$myPattern’.\n\n” if ($myTestString =~ m/$myPattern/);

2.

3.

Save the code as PositionalMetacharacters.pl.

Run the code, and inspect the displayed results, as shown in Figure 26-21.

Figure 26-21

686

Regular Expressions in Perl

How It Works

First, a simple informational message is displayed to the user:

print “\nThis example demonstrates the use of the ^ and \$ positional

metacharacters.\n\n”;

Then the first pattern to be used is defined. It is a simple character sequence without any positional metacharacters:

my $myPattern = “cape”;

The test string is defined:

my $myTestString = “escape”;

Then matching takes place. The if statement ensures that a message is displayed only when there is a successful match:

print “In ‘$myTestString’ there is a match for the pattern ‘$myPattern’.\n\n” if

($myTestString =~ m/$myPattern/);

The pattern is modified so that it includes a ^ positional metacharacter. It will now match only when the character sequence has cape as its first four characters:

$myPattern = “^cape”;

So a message is displayed indicating that matching failed:

print “When the pattern is ‘$myPattern’ there is no match for ‘$myTestString’.\n\n”

if ($myTestString !~ m/$myPattern/);

The pattern is changed again. Now it will match only if cape appears as the last four characters of the test string:

$myPattern = “cape\$”;

There is a match when matching against escape, so a message indicating a successful match is displayed:

print “But there is a match for ‘$myTestString’ when the pattern is

‘$myPattern’.\n\n” if ($myTestString =~ m/$myPattern/);

Captured Groups in Perl

In Perl, captured groups are specified using paired parentheses. The first captured group is produced by the paired parentheses with the leftmost opening parenthesis. Additional captured groups are added for each pair of parentheses, with the numbering corresponding to the order of the opening parenthesis of a pair.

687

Chapter 26

Captured groups can be accessed from outside the regular expression using the numbered variables $1, $2, and so on.

In Perl, the whole match is available in the $& variable.

Try It Out

Captured Groups in Perl Basics

1.Type the following code in your chosen text editor:

#!/usr/bin/perl -w use strict;

my $myPattern = “([A-Z])(\\d)”; my $myTestString = “B99”; $myTestString =~ m/$myPattern/;

print “The pattern is ‘$myPattern’.\n”;

print “The test string is ‘$myTestString’.\n”;

print “The whole match is ‘$&’, contained in the \$& variable.\n”; print “The first captured group is ‘$1’, contained in ‘\$1’.\n”; print “The second captured group is ‘$2’, contained in ‘\$2’\n”;

2.Save the code as CapturedGroupsDemo.pl.

3.Run the code, and inspect the displayed results, as shown in Figure 26-22. Notice that the whole match for the pattern (([A-Z])(\d)) is retrieved using the $1 variable.

Figure 26-22

How It Works

The pattern to be matched against is assigned to the $myPattern variable:

my $myPattern = “([A-Z])(\\d)”;

The test string is assigned to the $myTestString variable:

my $myTestString = “B99”;

The $myTestString variable is matched against the $myPattern variable:

$myTestString =~ m/$myPattern/;

The values of the test string and pattern are displayed to the user:

print “The pattern is ‘$myPattern’.\n”;

print “The test string is ‘$myTestString’.\n”;

688