- •Introduction
- •Who This Book Is For
- •What This Book Covers
- •How This Book Is Structured
- •What You Need to Use This Book
- •Conventions
- •Source Code
- •Errata
- •p2p.wrox.com
- •What Are Regular Expressions?
- •What Can Regular Expressions Be Used For?
- •Finding Doubled Words
- •Checking Input from Web Forms
- •Changing Date Formats
- •Finding Incorrect Case
- •Adding Links to URLs
- •Regular Expressions You Already Use
- •Search and Replace in Word Processors
- •Directory Listings
- •Online Searching
- •Why Regular Expressions Seem Intimidating
- •Compact, Cryptic Syntax
- •Whitespace Can Significantly Alter the Meaning
- •No Standards Body
- •Differences between Implementations
- •Characters Change Meaning in Different Contexts
- •Regular Expressions Can Be Case Sensitive
- •Case-Sensitive and Case-Insensitive Matching
- •Case and Metacharacters
- •Continual Evolution in Techniques Supported
- •Multiple Solutions for a Single Problem
- •What You Want to Do with a Regular Expression
- •Replacing Text in Quantity
- •Regular Expression Tools
- •findstr
- •Microsoft Word
- •StarOffice Writer/OpenOffice.org Writer
- •Komodo Rx Package
- •PowerGrep
- •Microsoft Excel
- •JavaScript and JScript
- •VBScript
- •Visual Basic.NET
- •Java
- •Perl
- •MySQL
- •SQL Server 2000
- •W3C XML Schema
- •An Analytical Approach to Using Regular Expressions
- •Express and Document What You Want to Do in English
- •Consider the Regular Expression Options Available
- •Consider Sensitivity and Specificity
- •Create Appropriate Regular Expressions
- •Document All but Simple Regular Expressions
- •Document What You Expect the Regular Expression to Do
- •Document What You Want to Match
- •Test the Results of a Regular Expression
- •Matching Single Characters
- •Matching Sequences of Characters That Each Occur Once
- •Introducing Metacharacters
- •Matching Sequences of Different Characters
- •Matching Optional Characters
- •Matching Multiple Optional Characters
- •Other Cardinality Operators
- •The * Quantifier
- •The + Quantifier
- •The Curly-Brace Syntax
- •The {n} Syntax
- •The {n,m} Syntax
- •Exercises
- •Regular Expression Metacharacters
- •Thinking about Characters and Positions
- •The Period (.) Metacharacter
- •Matching Variably Structured Part Numbers
- •Matching a Literal Period
- •The \w Metacharacter
- •The \W Metacharacter
- •Digits and Nondigits
- •The \d Metacharacter
- •Canadian Postal Code Example
- •The \D Metacharacter
- •Alternatives to \d and \D
- •The \s Metacharacter
- •Handling Optional Whitespace
- •The \S Metacharacter
- •The \t Metacharacter
- •The \n Metacharacter
- •Escaped Characters
- •Finding the Backslash
- •Modifiers
- •Global Search
- •Case-Insensitive Search
- •Exercises
- •Introduction to Character Classes
- •Choice between Two Characters
- •Using Quantifiers with Character Classes
- •Using the \b Metacharacter in Character Classes
- •Selecting Literal Square Brackets
- •Using Ranges in Character Classes
- •Alphabetic Ranges
- •Use [A-z] With Care
- •Digit Ranges in Character Classes
- •Hexadecimal Numbers
- •IP Addresses
- •Reverse Ranges in Character Classes
- •A Potential Range Trap
- •Finding HTML Heading Elements
- •Metacharacter Meaning within Character Classes
- •The ^ metacharacter
- •How to Use the - Metacharacter
- •Negated Character Classes
- •Combining Positive and Negative Character Classes
- •POSIX Character Classes
- •The [:alnum:] Character Class
- •Exercises
- •String, Line, and Word Boundaries
- •The ^ Metacharacter
- •The ^ Metacharacter and Multiline Mode
- •The $ Metacharacter
- •The $ Metacharacter in Multiline Mode
- •Using the ^ and $ Metacharacters Together
- •Matching Blank Lines
- •Working with Dollar Amounts
- •Revisiting the IP Address Example
- •What Is a Word?
- •Identifying Word Boundaries
- •The \< Syntax
- •The \>Syntax
- •The \b Syntax
- •The \B Metacharacter
- •Less-Common Word-Boundary Metacharacters
- •Exercises
- •Grouping Using Parentheses
- •Parentheses and Quantifiers
- •Matching Literal Parentheses
- •U.S. Telephone Number Example
- •Alternation
- •Choosing among Multiple Options
- •Unexpected Alternation Behavior
- •Capturing Parentheses
- •Numbering of Captured Groups
- •Numbering When Using Nested Parentheses
- •Named Groups
- •Non-Capturing Parentheses
- •Back References
- •Exercises
- •Why You Need Lookahead and Lookbehind
- •The (? metacharacters
- •Lookahead
- •Positive Lookahead
- •Negative Lookahead
- •Positive Lookahead Examples
- •Positive Lookahead in the Same Document
- •Inserting an Apostrophe
- •Lookbehind
- •Positive Lookbehind
- •Negative Lookbehind
- •How to Match Positions
- •Adding Commas to Large Numbers
- •Exercises
- •What Are Sensitivity and Specificity?
- •Extreme Sensitivity, Awful Specificity
- •Email Addresses Example
- •Replacing Hyphens Example
- •The Sensitivity/Specificity Trade-Off
- •Sensitivity, Specificity, and Positional Characters
- •Sensitivity, Specificity, and Modes
- •Sensitivity, Specificity, and Lookahead and Lookbehind
- •How Much Should the Regular Expressions Do?
- •Abbreviations
- •Characters from Other Languages
- •Names
- •Sensitivity and How to Achieve It
- •Specificity and How to Maximize It
- •Exercises
- •Documenting Regular Expressions
- •Document the Problem Definition
- •Add Comments to Your Code
- •Making Use of Extended Mode
- •Know Your Data
- •Abbreviations
- •Proper Names
- •Incorrect Spelling
- •Creating Test Cases
- •Debugging Regular Expressions
- •Treacherous Whitespace
- •Backslashes Causing Problems
- •Considering Other Causes
- •The User Interface
- •Metacharacters Available
- •Quantifiers
- •The @ Quantifier
- •The {n,m} Syntax
- •Modes
- •Character Classes
- •Back References
- •Lookahead and Lookbehind
- •Lazy Matching versus Greedy Matching
- •Examples
- •Character Class Examples, Including Ranges
- •Whole Word Searches
- •Search-and-Replace Examples
- •Changing Name Structure Using Back References
- •Manipulating Dates
- •The Star Training Company Example
- •Regular Expressions in Visual Basic for Applications
- •Exercises
- •The User Interface
- •Metacharacters Available
- •Quantifiers
- •Modes
- •Character Classes
- •Alternation
- •Back References
- •Lookahead and Lookbehind
- •Search Example
- •Search-and-Replace Example
- •Online Chats
- •POSIX Character Classes
- •Matching Numeric Digits
- •Exercises
- •Introducing findstr
- •Finding Literal Text
- •Quantifiers
- •Character Classes
- •Command-Line Switch Examples
- •The /v Switch
- •The /a Switch
- •Single File Examples
- •Simple Character Class Example
- •Find Protocols Example
- •Multiple File Example
- •A Filelist Example
- •Exercises
- •The PowerGREP Interface
- •A Simple Find Example
- •The Replace Tab
- •The File Finder Tab
- •Syntax Coloring
- •Other Tabs
- •Numeric Digits and Alphabetic Characters
- •Quantifiers
- •Back References
- •Alternation
- •Line Position Metacharacters
- •Word-Boundary Metacharacters
- •Lookahead and Lookbehind
- •Longer Examples
- •Finding HTML Horizontal Rule Elements
- •Matching Time Example
- •Exercises
- •The Excel Find Interface
- •Escaping Wildcard Characters
- •Using Wildcards in Data Forms
- •Using Wildcards in Filters
- •Exercises
- •Using LIKE with Regular Expressions
- •The % Metacharacter
- •The _ Metacharacter
- •Character Classes
- •Negated Character Classes
- •Using Full-Text Search
- •Using The CONTAINS Predicate
- •Document Filters on Image Columns
- •Exercises
- •Using the _ and % Metacharacters
- •Testing Matching of Literals: _ and % Metacharacters
- •Using Positional Metacharacters
- •Using Character Classes
- •Quantifiers
- •Social Security Number Example
- •Exercises
- •The Interface to Metacharacters in Microsoft Access
- •Creating a Hard-Wired Query
- •Creating a Parameter Query
- •Using the ? Metacharacter
- •Using the * Metacharacter
- •Using the # Metacharacter
- •Using the # Character with Date/Time Data
- •Using Character Classes in Access
- •Exercises
- •The RegExp Object
- •Attributes of the RegExp Object
- •The Other Properties of the RegExp Object
- •The test() Method of the RegExp Object
- •The exec() Method of the RegExp Object
- •The String Object
- •Metacharacters in JavaScript and JScript
- •SSN Validation Example
- •Exercises
- •The RegExp Object and How to Use It
- •Quantifiers
- •Positional Metacharacters
- •Character Classes
- •Word Boundaries
- •Lookahead
- •Grouping and Nongrouping Parentheses
- •Exercises
- •The System.Text.RegularExpressions namespace
- •A Simple Visual Basic .NET Example
- •The Classes of System.Text.RegularExpressions
- •The Regex Object
- •Using the Match Object and Matches Collection
- •Using the Match.Success Property and Match.NextMatch Method
- •The GroupCollection and Group Classes
- •The CaptureCollection and Capture Class
- •The RegexOptions Enumeration
- •Case-Insensitive Matching: The IgnoreCase Option
- •Multiline Matching: The Effect on the ^ and $ Metacharacters
- •Right to Left Matching: The RightToLeft Option
- •Lookahead and Lookbehind
- •Exercises
- •An Introductory Example
- •The Classes of System.Text.RegularExpressions
- •The Regex Class
- •The Options Property of the Regex Class
- •Regex Class Methods
- •The CompileToAssembly() Method
- •The GetGroupNames() Method
- •The GetGroupNumbers() Method
- •GroupNumberFromName() and GroupNameFromNumber() Methods
- •The IsMatch() Method
- •The Match() Method
- •The Matches() Method
- •The Replace() Method
- •The Split() Method
- •Using the Static Methods of the Regex Class
- •The IsMatch() Method as a Static
- •The Match() Method as a Static
- •The Matches() Method as a Static
- •The Replace() Method as a Static
- •The Split() Method as a Static
- •The Match and Matches Classes
- •The Match Class
- •The GroupCollection and Group Classes
- •The RegexOptions Class
- •The IgnorePatternWhitespace Option
- •Metacharacters Supported in Visual C# .NET
- •Using Named Groups
- •Using Back References
- •Exercise
- •The ereg() Set of Functions
- •The ereg() Function
- •The ereg() Function with Three Arguments
- •The eregi() Function
- •The ereg_replace() Function
- •The eregi_replace() Function
- •The split() Function
- •The spliti() Function
- •The sql_regcase() Function
- •Perl Compatible Regular Expressions
- •Pattern Delimiters in PCRE
- •Escaping Pattern Delimiters
- •Matching Modifiers in PCRE
- •Using the preg_match() Function
- •Using the preg_match_all() Function
- •Using the preg_grep() Function
- •Using the preg_quote() Function
- •Using the preg_replace() Function
- •Using the preg_replace_callback() Function
- •Using the preg_split() Function
- •Supported Metacharacters with ereg()
- •Using POSIX Character Classes with PHP
- •Supported Metacharacters with PCRE
- •Positional Metacharacters
- •Character Classes in PHP
- •Documenting PHP Regular Expressions
- •Exercises
- •W3C XML Schema Basics
- •Tools for Using W3C XML Schema
- •Comparing XML Schema and DTDs
- •How Constraints Are Expressed in W3C XML Schema
- •W3C XML Schema Datatypes
- •Derivation by Restriction
- •Unicode and W3C XML Schema
- •Unicode Overview
- •Using Unicode Character Classes
- •Matching Decimal Numbers
- •Mixing Unicode Character Classes with Other Metacharacters
- •Unicode Character Blocks
- •Using Unicode Character Blocks
- •Metacharacters Supported in W3C XML Schema
- •Positional Metacharacters
- •Matching Numeric Digits
- •Alternation
- •Using the \w and \s Metacharacters
- •Escaping Metacharacters
- •Exercises
- •Introduction to the java.util.regex Package
- •Obtaining and Installing Java
- •The Pattern Class
- •Using the matches() Method Statically
- •Two Simple Java Examples
- •The Properties (Fields) of the Pattern Class
- •The CASE_INSENSITIVE Flag
- •Using the COMMENTS Flag
- •The DOTALL Flag
- •The MULTILINE Flag
- •The UNICODE_CASE Flag
- •The UNIX_LINES Flag
- •The Methods of the Pattern Class
- •The compile() Method
- •The flags() Method
- •The matcher() Method
- •The matches() Method
- •The pattern() Method
- •The split() Method
- •The Matcher Class
- •The appendReplacement() Method
- •The appendTail() Method
- •The end() Method
- •The find() Method
- •The group() Method
- •The groupCount() Method
- •The lookingAt() Method
- •The matches() Method
- •The pattern() Method
- •The replaceAll() Method
- •The replaceFirst() Method
- •The reset() Method
- •The start() Method
- •The PatternSyntaxException Class
- •Using the \d Metacharacter
- •Character Classes
- •The POSIX Character Classes in the java.util.regex Package
- •Unicode Character Classes and Character Blocks
- •Using Escaped Characters
- •Using Methods of the String Class
- •Using the matches() Method
- •Using the replaceFirst() Method
- •Using the replaceAll() Method
- •Using the split() Method
- •Exercises
- •Obtaining and Installing Perl
- •Creating a Simple Perl Program
- •Basics of Perl Regular Expression Usage
- •Using the m// Operator
- •Using Other Regular Expression Delimiters
- •Matching Using Variable Substitution
- •Using the s/// Operator
- •Using s/// with the Global Modifier
- •Using s/// with the Default Variable
- •Using the split Operator
- •Using Quantifiers in Perl
- •Using Positional Metacharacters
- •Captured Groups in Perl
- •Using Back References in Perl
- •Using Alternation
- •Using Character Classes in Perl
- •Using Lookahead
- •Using Lookbehind
- •Escaping Metacharacters
- •A Simple Perl Regex Tester
- •Exercises
- •Index
20
Regular Expressions
and VBScript
Regular expressions were introduced to VBScript in version 5.0. They can be used to parse character sequences (strings) and can be used to provide flexible replace functionality.
VBScript can be used on the client side in Web pages when the Internet Explorer browser provides the VBScript interpreter or in the Windows Script Host (WSH).
The VBScript interpreter (vbscript.dll) does not allow file access, but the associated file, scrrun.dll, allows VBScript when used with WSH, for example, to have file access and allows directory manipulation.
In this chapter, you will learn the following:
How to use the properties and methods of the RegExp object
How to use the Match object and the Matches collection
The metacharacters supported in VBScript and how to use them
How to use VBScript regular expressions to solve some text-handling problems
The RegExp Object and How to Use It
The RegExp object, the Match object, and the Matches collection all relate to how regular expressions are used in VBScript. This section focuses on the RegExp object.
The RegExp object has three properties and three methods. The properties are as follows:
Pattern property
Global property
IgnoreCase property
Chapter 20
Each of these properties is described in the following sections. The three methods are as follows:
Execute method
Replace method
Test method
Each of these methods is described and demonstrated in the following sections. To carry out simple matching, the Pattern property and the Test() method are often used.
The RegExp Object’s Pattern Property
The VBScript RegExp object differs in functionality from the tools and languages discussed earlier in this book. For example, there is no syntax in VBScript like the following JScript declaration and assignment statement:
var myRegExp = /\d{3}/;
Instead, VBScript uses the value of the RegExp object’s Pattern property to hold a string value, which is the regular expression pattern.
So the VBScript equivalent of the preceding JScript code would look like this:
Dim myRegExp
Set myRegExp = new RegExp myRegExp.Pattern = “\d{3}”
The following example shows a very simple function that uses the Pattern property as part of a simple replace operation.
Try It Out |
A Simple Match Operation |
The sample file, TestForA.html, shows how the Pattern property is used:
<html>
<head>
<title>Test For Upper Case A</title>
<script language=”vbscript” type=”text/vbscript”> Function MatchTest
Dim myRegExp, TestString Set myRegExp = new RegExp myRegExp.Pattern = “A”
TestString = “Andrew”
If myRegExp.Test(TestString) = True Then
MsgBox “The test string ‘“ & TestString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.”
Else
MsgBox “There is no match.” End If
End Function
456
Regular Expressions and VBScript
</script>
</head>
<body onload=”MatchTest”>
</body>
</html>
Open TestForA.html in Internet Explorer, and notice the message box that is displayed when the page loads.
Figure 20-1 shows the appearance of the message box that is displayed. It correctly indicates that there is a match for the pattern A in the character sequence Andrew.
Figure 20-1
How It Works
When the HTML page loads, the MatchTest() function is called:
<body onload=”MatchTest”>
The functionality is all contained in the MatchTest() function:
Function MatchTest
First, the variables myRegExp (which contains the regular expression pattern in its Pattern property) and the TestString (which is supplied literally) are dimensioned:
Dim myRegExp, TestString
Next, in a Set statement, a reference to a new RegExp object is assigned to the variable myRegExp:
Set myRegExp = new RegExp
Next, a single literal character, A, is assigned to the myRegExp object’s Pattern property:
myRegExp.Pattern = “A”
Then the character sequence Andrew is assigned to the variable TestString:
TestString = “Andrew”
An If statement is used to display a message box when matching is successful. Notice that the Test() method of the myRegExp variable is used in the logical test of the If statement. The Test() method
457
Chapter 20
tests whether or not the string, which is the method’s single argument, contains a match for the value of the Pattern property of the same RegExp object, in this case, the RegExp object referenced by the myRegExp variable:
If myRegExp.Test(TestString) = True Then
The message to be displayed is constructed by concatenating literal text with the values of the TestString variable and the value of the myRegExp.Pattern property:
MsgBox “The test string ‘“ & TestString & “‘ matches the pattern ‘“ &
myRegExp.Pattern & “‘.”
An Else clause is also provided with an alternate message indicating that there is no match. However, with the supplied values for myRegExp.Pattern and TestString, the Else clause is never needed in this example. In later examples, where the test string is supplied by the user, you will need an Else clause to be in place:
Else
MsgBox “There is no match.”
Finally, the End If and End Function statements complete the MatchTest function:
End If
End Function
The RegExp Object’s Global Property
The RegExp object’s Global property can have the Boolean values True or False. The Global property’s default value is False, which means that only a single match for the value of the Pattern property is sought. When the value of the Global property is True, matching continues to be attempted throughout the test string, and multiple matches may be returned.
Try It Out |
The Global Property |
The test file for this example, MatchGlobal.html, is shown here:
<html>
<head>
<title>Carry out a non-global replace and a global replace.</title> <script language=”vbscript” type=”text/vbscript”>
Dim myRegExp, InputString, ChangedString
Function MatchGlobal
Set myRegExp = new RegExp myRegExp.Pattern = “A” DoReplaceDefault DoReplaceGlobal
End Function
Function DoReplaceDefault
InputString = InputBox(“Enter a string. It will be tested once to see if it contains” &VBCrLf & “any ‘A’ characters. Any ‘A’ will be replaced by ‘B’”) myRegExp.Global = False
458
Regular Expressions and VBScript
ChangedString = myRegExp.Replace(InputString, “B”) If myRegExp.Test(InputString) = True Then
MsgBox “The test string ‘“ & InputString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.” _& VBCrLf
&“The changed string is “ & ChangedString
Else
MsgBox “There is no match. ‘“ & InputString & “‘ does not match “ &VBCrLf _
&“the pattern ‘“ & myRegExp.Pattern & “‘.”
End If
End Function
Function DoReplaceGlobal
InputString = InputBox(“Enter a string. It will be tested to see if it contains” &VBCrLf & “any ‘A’ characters. Any ‘A’ will be replaced by ‘B’”) myRegExp.Global = True
ChangedString = myRegExp.Replace(InputString, “B”) If myRegExp.Test(InputString) = True Then
MsgBox “The test string ‘“ & InputString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.” & VBCrLf _
&“The changed string is “ & ChangedString
Else
MsgBox “There is no match. ‘“ & InputString & “‘ does not match “ &VBCrLf _
&“the pattern ‘“ & myRegExp.Pattern & “‘.”
End If
End Function
</script>
</head>
<body onload=”MatchGlobal”>
</body>
</html>
As well as using the Global property, the code makes use of the RegExp object’s Replace() method.
When MatchGlobal.html is opened, two message boxes will be displayed. The first attempts only a single match, and the second matches as many times as there are matches in the test string.
1.Open MatchGlobal.html in Internet Explorer.
2.In the message box that appears, enter the character sequence THE APPLE IS A TASTY FRUIT. Because matching in VBScript is, by default, case sensitive, be sure to use all uppercase characters.
3.Click the OK button, and inspect the message box that is displayed.
Figure 20-2 shows the screen’s appearance after Step 3. Notice that only a single occurrence of uppercase A has been replaced.
Figure 20-2
459
Chapter 20
4.Click the OK button in the message box, and reenter the same string, THE APPLE IS A TASTY FRUIT, in the input box.
5.Click the OK button, and inspect the result displayed in the message box.
Figure 20-3 shows the screen’s appearance after Step 5. Notice that each occurrence of uppercase A has been replaced with uppercase B.
Figure 20-3
How It Works
The code in this example is split into three functions, MatchGlobal, DoReplaceDefault, and
DoReplaceGlobal.
Because the myRegExp variable is used in multiple functions, it, together with other variables, is declared globally:
Dim myRegExp, InputString, ChangedString
The MatchGlobal function uses the Set statement to create a reference, stored in the myRegExp variable, to a new RegExp object:
Function MatchGlobal
Set myRegExp = new RegExp myRegExp.Pattern = “A” DoReplaceDefault DoReplaceGlobal
End Function
Assigning the string value A to be the value of the Pattern property of myRegExp makes it the simple literal pattern that is to be matched.
The following DoReplaceDefault function accepts a string typed by the user into an input box:
Function DoReplaceDefault
InputString = InputBox(“Enter a string. It will be tested once to see if it contains” &VBCrLf & “any ‘A’ characters. Any ‘A’ will be replaced by ‘B’”) myRegExp.Global = False
ChangedString = myRegExp.Replace(InputString, “B”) If myRegExp.Test(InputString) = True Then
MsgBox “The test string ‘“ & InputString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.” _& VBCrLf
& “The changed string is “ & ChangedString
460
Regular Expressions and VBScript
Else
MsgBox “There is no match. ‘“ & InputString & “‘ does not match “ &VBCrLf _ & “the pattern ‘“ & myRegExp.Pattern & “‘.”
End If
End Function
The input box is displayed using the VBScript InputBox() function. Notice that the information given to the user states that the string will be tested for a match only once.
To carry out the replacement of any (first) occurrence of A in the string that the user enters into the input box, the RegExp object’s Replace() method is used:
ChangedString = myRegExp.Replace(InputString, “B”)
Notice that it isn’t necessary to express the pattern to match as an argument to the Replace() method. That pattern was defined earlier when a pattern was assigned to myRegExp’s Pattern property:
myRegExp.Pattern = “A”
The two arguments to the Replace() method are, respectively, the string in which the replace operation is to be carried out (in this case, the value of the InputString variable) and the character sequence to be used to replace the first occurrence of text that matches the value of the Pattern property (in this case, the character A, which is the value of the Pattern property, is replaced by the character B, the second argument to the Replace() method).
The first occurrence of A in THE APPLE IS A TASTY FRUIT is the initial character of APPLE. That A is replaced by the second argument of the Replace() method, B. So the value of the following ChangedString variable is THE BPPLE IS A TASTY FRUIT, with the only change being the creation of the character sequence BPPLE in place of the character sequence APPLE:
ChangedString = myRegExp.Replace(InputString, “B”)
This is the default behavior, to match, and in this case, replace once.
The following DoReplaceGlobal function does almost the same thing, except that matching is attempted on the input string an unlimited number of times:
Function DoReplaceGlobal
InputString = InputBox(“Enter a string. It will be tested to see if it contains” &VBCrLf & “any ‘A’ characters. Any ‘A’ will be replaced by ‘B’”) myRegExp.Global = True
ChangedString = myRegExp.Replace(InputString, “B”) If myRegExp.Test(InputString) = True Then
MsgBox “The test string ‘“ & InputString & “‘ matches the pattern ‘“ &
myRegExp.Pattern & “‘.” & VBCrLf _
&“The changed string is “ & ChangedString
Else
MsgBox “There is no match. ‘“ & InputString & “‘ does not match “ &VBCrLf _
&“the pattern ‘“ & myRegExp.Pattern & “‘.”
End If
End Function
461
Chapter 20
This change in processing behavior occurs because the value of the Global property of the RegExp object has been set to True:
myRegExp.Global = True
Each time the value of the Pattern property, in this example, the uppercase character A, is matched, it is replaced in the variable ChangedString by the uppercase character B:
ChangedString = myRegExp.Replace(InputString, “B”)
Thus, each A in the input string THE APPLE IS A TASTY FRUIT is replaced by B, giving the value of the ChangedString variable as THE BPPLE IS B TBSTY FRUIT.
The RegExp Object’s IgnoreCase Property
The RegExp object’s IgnoreCase property allows case-insensitive matching to be carried out.
The preceding MatchGlobal.html example used an input string that was all uppercase. More naturally, instead of THE APPLE IS A TASTY FRUIT, you might input the string The apple is a tasty fruit..
The following sample file, CaseReplace.html, allows you to try out case-sensitive (the default) and case-insensitive replacement:
<html>
<head>
<title>Carry out a case-sensitive replace and a case-insensitive replace.</title> <script language=”vbscript” type=”text/vbscript”>
Dim myRegExp, InputString, ChangedString
Function MatchCaseOptions Set myRegExp = new RegExp myRegExp.Pattern = “A” myRegExp.Global = True DoReplaceSensitive DoReplaceInsensitive
End Function
Function DoReplaceSensitive
InputString = InputBox(“Enter a string. It will be tested once to see if it contains” &VBCrLf & “any ‘A’ characters. Any ‘A’ will be replaced by ‘B’”) ChangedString = myRegExp.Replace(InputString, “B”)
If myRegExp.Test(InputString) = True Then
MsgBox “The test string ‘“ & InputString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.” & VBCrLf _
&“The changed string is “ & ChangedString
Else
MsgBox “There is no match. ‘“ & InputString & “‘ does not match “ &VBCrLf _
&“the pattern ‘“ & myRegExp.Pattern & “‘.”
End If
End Function
462
Regular Expressions and VBScript
Function DoReplaceInsensitive myRegExp.IgnoreCase = True
InputString = InputBox(“Enter a string. It will be tested to see if it contains” &VBCrLf & “any ‘A’ characters. Any ‘A’ will be replaced by ‘B’”)
ChangedString = myRegExp.Replace(InputString, “B”) If myRegExp.Test(InputString) = True Then
MsgBox “The test string ‘“ & InputString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.” & VBCrLf _
&“The changed string is “ & ChangedString
Else
MsgBox “There is no match. ‘“ & InputString & “‘ does not match “ &VBCrLf _
&“the pattern ‘“ & myRegExp.Pattern & “‘.”
End If
End Function
</script>
</head>
<body onload=”MatchCaseOptions”>
</body>
</html>
Try It Out |
Using the IgnoreCase Property of the RegExp Object |
First, attempt to match an input string using the default value of the IgnoreCase property.
1.Open CaseReplace.html in Internet Explorer, and in the displayed input box, enter the test string The apple is a tasty fruit..
2.Click the OK button, and inspect the information displayed in the message box., as shown in Figure 20-4. Notice that there is no match at this point.
Figure 20-4
Next, attempt to match an input string with the value of the IgnoreCase property set to True. In other words, matching will be case insensitive.
3.Click the OK button to dismiss the message box that was displayed in Figure 20-4.
4.In the displayed input box, enter the test string The apple is a tasty fruit..
463
Chapter 20
5.Click the OK button, and inspect the information displayed in the message box, as shown in Figure 20-5.
Figure 20-5
How It Works
When CaseReplace.html is loaded, the function MatchCaseOptions is called:
<body onload=”MatchCaseOptions”>
Notice that in MatchCaseOptions, the value of the Global property is set to True, so all matches will be replaced:
Function MatchCaseOptions Set myRegExp = new RegExp myRegExp.Pattern = “A” myRegExp.Global = True DoReplaceSensitive
DoReplaceInsensitive
End Function
The DoReplaceSensitive function does a global attempted match similar to the one you saw in the preceding example.
The DoReplaceInsensitive function does a global attempted match but does it case insensitively because the value of the IgnoreCase property is set to True:
Function DoReplaceInsensitive
myRegExp.IgnoreCase = True
Every occurrence of the character A (whether lowercase or uppercase) in the input string The apple is a tasty fruit. is replaced by uppercase B, as you saw in Figure 20-5.
The RegExp Object’s Test() Method
The Test() method executes an attempted regular expression match against a specified test string and returns a Boolean value. The Boolean value returned by the Test() method can be used in a logical test. For example, you saw in an earlier example the following code, which tested whether the test string held in the TestString variable matched the value of the Pattern property and, depending on the Boolean value returned, would display a message box indicating a successful match or a failure to match.
464
Regular Expressions and VBScript
If myRegExp.Test(TestString) = True Then
MsgBox “The test string ‘“ & TestString & “‘ matches the pattern ‘“ & myRegExp.Pattern & “‘.”
Else
MsgBox “There is no match.”
End If
The RegExp Object’s Replace() Method
The Replace() method replaces the part of a string that matches the pattern held in the Pattern property of the RegExp object with another string. The string in which replacement is to take place is the first argument to the Replace() method, and the replacement string is the second argument to the
Replace() method.
The Replace() method, when used together with grouping parentheses, can be used to reverse the order of parts of the string in the first argument to the method. The groups captured in the matching string can be used in reverse order in the replacement string.
Try It Out |
Using the Replace() Method to Reverse Order |
The test file, ReverseName.html, is shown here:
<html>
<head>
<title>Reverse Surname and First Name</title> <script language=”vbscript” type=”text/vbscript”> Function ReverseName
Dim myRegExp, TestName, Match Set myRegExp = new RegExp
myRegExp.Pattern = “(\S+)(\s+)(\S+)”
TestString = InputBox(“Enter your name below, in the form” & VBCrLf & _
“first name, then a space then last name.” & VBCrLf & “Don’t enter an initial or middle name.”)
Match = myRegexp.Replace(TestString, “$3,$2$1”) If Match <> “” Then
MsgBox “Your name in last name, first name format is:” & VBCrLf & Match Else
MsgBox “You didn’t enter your name.” & VBCrLF & “Press OK then F5 to run the example again.”
End If
End Function
</script>
</head>
<body onload=”ReverseName”>
</body>
</html>
465
Chapter 20
1.Open ReverseName.html in Internet Explorer.
2.In the input box that is displayed, enter your name in first name–last name format, with the first and last names separated by at least one space character.
Figure 20-6 shows the name John Smith input in the desired format.
Figure 20-6
3.Click the OK button, and inspect the results, as shown in Figure 20-7.
If a name has been entered in the requested format, firstname lastname, the name is displayed in a message box in the format lastname, firstname.
Figure 20-7
How It Works
When the Web page loads, the ReverseName function is called:
<body onload=”ReverseName”>
The pattern (\S+)(\s+)(\S+) is assigned to the myRegExp variable’s Pattern property:
myRegExp.Pattern = “(\S+)(\s+)(\S+)”
The preceding pattern captures three groups. The first, specified by (\S+), matches and captures one or more nonwhitespace characters. The second, specified by (\s+), matches and captures one or more whitespace characters. The third, specified by (\S+), matches and captures one or more nonwhitespace characters. If the user enters his name in the requested format of firstname followed by a space character, followed by lastname, there should be a match contained in the TestString variable. The groups are held, respectively, in the special variables $1, $2, and $3:
TestString = InputBox(“Enter your name below, in the form” & VBCrLf & _
“first name, then a space then last name.” & VBCrLf & “Don’t enter an initial or middle name.”)
466
Regular Expressions and VBScript
You can use the $3, $2, and $1 variables in the replace operation, as follows:
Match = myRegexp.Replace(TestString, “$3,$2$1”)
Because the last name should be contained in $3 and the first name contained in $1, you can reverse those and add a comma using the pattern $3, $2$1. If you wanted to standardize on a single space character in the output (rather than echoing whatever whitespace the user entered into the input box), you could alter the pattern to $3, $1.
The If statement controls whether a message about a successful match is displayed or a message indicating that the user didn’t enter a valid name is displayed:
If Match <> “” Then
MsgBox “Your name in last name, first name format is:” & VBCrLf & Match Else
MsgBox “You didn’t enter your name.” & VBCrLF & “Press OK then F5 to run the example again.”
End If
The RegExp Object’s Execute() Method
The RegExp object’s Execute() method executes regular expression matching against a specified string. The regular expression pattern is held as the value of the RegExp object’s Pattern property. The Execute() method returns a Matches collection, which contains a Match object for each match in the string being tested.
Try It Out |
The Execute() Method |
The test file, ExecuteDemo.html, is shown here:
<html>
<head>
<title>Demo of the Execute() Method</title> <script language=”vbscript” type=”text/vbscript”> Function ExecuteDemo
Dim myRegExp, TestName, Match, Matches, displayString displayString = “”
Set myRegExp = new RegExp myRegExp.Pattern = “[A-Z]\d” myRegExp.IgnoreCase = True myRegExp.Global = False
TestString = InputBox(“Enter characters and numbers in the text box below.”) Set Matches = myRegexp.Execute(TestString)
For Each Match in Matches
displayString = displayString & “Match found at position “ & Match.FirstIndex & VBCrLf
displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” MsgBox displayString
displayString = “” Next
End Function
467
Chapter 20
</script>
</head>
<body onload=”ExecuteDemo”>
</body>
</html>
1.Open ExecuteDemo.html in Internet Explorer, and in the text box within the input box, enter the character sequence A9.
2.Click the OK button, and inspect the text displayed in the message box, as shown in Figure 20-8. Notice that the match is said to occur at position 0, indicating that character positions are numbered from zero.
Figure 20-8
3.Click OK, and press F5 to reload the page and run the script again.
4.In the text box within the input box, enter the character sequence A9 A10 A11 A12 A13.
5.Click the OK button, and inspect the text displayed in the message box, as shown in Figure 20-9. Notice that the result displayed is the same as before. This is so because the value of the myRegExp.Global property is set to False. (In a moment, you will repeat this example, but with the myRegExp.Global property set to True.)
Figure 20-9
How It Works
When the Web page loads, the ExecuteDemo function is called:
<body onload=”ExecuteDemo”>
First, the variables used in ExecuteDemo are dimensioned:
Function ExecuteDemo
Dim myRegExp, TestName, Match, Matches, displayString
468
Regular Expressions and VBScript
The value of the displayString variable is set to the empty string, and a reference to a new RegExp object is set:
displayString = “”
Set myRegExp = new RegExp
The regular expression pattern [A-Z]\d, which matches an alphabetic character followed by a numeric digit, is assigned to the myRegExp object’s Pattern property of the myRegExp object:
myRegExp.Pattern = “[A-Z]\d”
The IgnoreCase property is set to True, which means that the matching process will be case insensitive:
myRegExp.IgnoreCase = True
The value of the Global property is set to False. This means that the Execute() method will return only one match:
myRegExp.Global = False
Input from the user is sought using the InputBox() function, and the input is assigned to the
TestString variable:
TestString = InputBox(“Enter characters and numbers in the text box below.”)
The Execute() method is used to produce a Matches collection:
Set Matches = myRegexp.Execute(TestString)
Each Match in the Matches collection is then processed in a For Each loop. If there is no match, this loop produces no displayed output. The displayString variable is used to construct a string for display:
For Each Match in |
Matches |
displayString = |
displayString & “Match found at position “ & Match.FirstIndex & |
VBCrLf |
|
displayString = |
displayString & “The match value is ‘“ & Match.Value & “‘.” |
|
|
Then the MsgBox function is used to display the value of the displayString variable:
MsgBox displayString
Finally, the value of the displayString variable is again set to the empty string, ready to be used in the next iteration of the For Each loop. If the value of the displayString variable was not reset to the empty string, the information from each match would be concatenated into one long display string. You may prefer that approach, which, when the value of the Global property is set to True, will display the information about all Match objects in the Matches collection in a single message box:
displayString = “” Next
End Function
The following example modifies the code so that all matches in the test string are returned.
469
Chapter 20
Try It Out |
The Execute() Method with Global Equal to True |
The test file, ExecuteDemoGlobal.html, is shown here:
<html>
<head>
<title>Demo of the Execute() Method</title> <script language=”vbscript” type=”text/vbscript”> Function ExecuteDemo
Dim myRegExp, TestName, Match, Matches, displayString displayString = “”
Set myRegExp = new RegExp myRegExp.Pattern = “[A-Z]\d” myRegExp.IgnoreCase = True myRegExp.Global = True
TestString = InputBox(“Enter characters and numbers in the text box below.”) Set Matches = myRegexp.Execute(TestString)
For Each Match in Matches
displayString = displayString & “Match found at position “ & Match.FirstIndex &
“.”
displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” & VBCrLf
‘displayString = “” Next
MsgBox displayString End Function
</script>
</head>
<body onload=”ExecuteDemo”>
</body>
</html>
1.Open the file ExecuteDemoGlobal.html in Internet Explorer.
2.Enter the test string A9 A10 A11 A12 A13 into the input box.
3.Click the OK button, and inspect the results displayed in the message box, as shown in Figure 20-10. Notice that each match in the Matches collection is now listed in the message box.
Figure 20-10
470
Regular Expressions and VBScript
How It Works
The code works similarly to the code in ExecuteDemo.html. The crucial difference is that the value of the myRegExp variable’s Global property is set to a value of True:
myRegExp.Global = True
This means that the Execute() method will return all matches found. For each match, a Match object will be returned in the Matches collection. As Figure 20-10 shows, there are five matches on this occasion.
Using the Match Object and the Matches
Collection
The Matches collection and its contained Match objects can be created only by using the Execute() method, described in the previous section.
Each Match object has three read-only properties:
FirstIndex — The position of the first character in a match
Length — The length of the match
Value — The value of the match
These contain information about the value of the match, where its first character is located, and the length of the matching character sequence.
The use of the FirstIndex and Value properties was demonstrated in the examples using the Execute() method in the preceding section.
The test file, MatchLength.html, is shown here:
<html>
<head>
<title>The Length Property of a Match Object</title> <script language=”vbscript” type=”text/vbscript”> Function MatchLength
Dim myRegExp, TestName, Match, Matches, displayString displayString = “”
Set myRegExp = new RegExp myRegExp.Pattern = “[A-Z]\d+” myRegExp.IgnoreCase = True myRegExp.Global = True
TestString = InputBox(“Enter characters and numbers in the text box below.”) Set Matches = myRegexp.Execute(TestString)
For Each Match in Matches
displayString = displayString & “Match found at position “ & Match.FirstIndex &
“.”
displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” & VBCrLf
471
Chapter 20
displayString = displayString & “Its length is “ & Match.Length & “ characters.” &VBCrLf & VBCrLf
‘displayString = “” Next
MsgBox displayString End Function
</script>
</head>
<body onload=”MatchLength”>
</body>
</html>
Try It Out |
The Length Property |
1.Open MatchLength.html in Internet Explorer.
2.Enter the following character sequence into the text box of the input box: A9 B10 C110 D1123456 E1234567890 A3.
3.Click the OK button, and inspect the results displayed in the message box, as shown in Figure 20-11.
Figure 20-11
How It Works
When the Web page loads, the MatchLength function is called:
<body onload=”MatchLength”>
The regular expression pattern assigned to the Pattern property:
myRegExp.Pattern = “[A-Z]\d+”
472
Regular Expressions and VBScript
matches a single alphabetic character of any case, because the value of the IgnoreCase property is set to True, followed by one or more numeric digits:
myRegExp.IgnoreCase = True
The test string, A9 B10 C110 D1123456 E1234567890 A3, contains six matches. Because the value of the myRegExp variable’s Global property is set to True, all six matches are represented as Match objects in the Matches collection:
myRegExp.Global = True
The Matches collection is created when the following line of code is executed:
Set Matches = myRegexp.Execute(TestString)
The Execute() method creates the Matches collection, which, in this example, contains six Match objects, each of which is processed in the same way as specified in the following For Each loop:
For Each Match in Matches
displayString = displayString & “Match found at position “ & Match.FirstIndex &
“.”
displayString = displayString & “The match value is ‘“ & Match.Value & “‘.” & VBCrLf
displayString = displayString & “Its length is “ & Match.Length & “ characters.” &VBCrLf & VBCrLf
‘displayString = “” Next
The displayString variable initially holds the empty string. Each time through the For Each loop, information is added to the displayString variable about the position of the match (as held in the FirstIndex property) and the length of the match (held in the Length property).
Finally, the value of the displayString variable is displayed in a message box, which displays information about the position and length of each match:
MsgBox displayString
Suppor ted Metacharacters
The following table summarizes the metacharacters supported in VBScript.
Metacharacter |
Description |
|
|
^ |
Matches the position at the beginning of an input string. |
$ |
Matches the position at the end of an input string. |
? |
A quantifier. It matches when there is zero or one occurrence of the |
|
preceding character or group. |
|
|
|
Table continued on following page |
473