Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Программирование

Файл:

Beginning Regular Expressions 2005.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

25.42 Mб

Скачать

☆

<<< < Предыдущая 136 137 138 139 140 141 142 143 144 145 146 147148 / 169148 149 150 151 152 153 154 155 156 157 158 159 160 > Следующая >>>

Chapter 24

9.Attempt to validate WordUnicode3.xml against WordUnicode3.xsd (see Figure 24-9). On this occasion, there is no match.

Figure 24-9

How It Works

The files WordUnicode.xml and WordUnicode.xsd attempt to validate the German word Führer (leader) against the pattern \w+. This shows that in W3C XML Schema, the metacharacter matches some letters that aren’t used in English.

The files WordUnicode2.xml and WordUnicode2.xsd attempt to validate the German word Führer (leader) against the pattern \p{L}. Because the word Führer contains only Unicode letters, there is a match.

The files WordUnicode3.xml and WordUnicode3.xsd attempt to validate the German word Führer against the pattern \p{L} while also specifying the use of the Unicode character block BasicLatin, indicated by the pattern \p{IsBasicLatin}. Because the word Führer contains the letter ü, which is not in the range U+0000 through U+007F (it is U+00FC), there is no match, and validation fails.

Metacharacters Supported in W3C XML Schema

The metacharacters supported in W3C XML Schema include a few that relate directly to XML and are not implemented in most other regular expression implementations.

The following table summarizes the metacharacters supported in W3C XML Schema version 1.0. See also information in the preceding section about Unicode support in W3C XML Schema.

612

		Regular Expressions in W3C XML Schema

	Metacharacter	Description

	^	Not supported outside negated character classes (see discussion on
		positional metacharacters).
	$	Not supported (see discussion on positional metacharacters).
	\d	Matches a numeric digit.
	\D	Matches a character that is not a numeric digit.
	\s	Matches a whitespace character.
	\S	Matches a character that is not a whitespace character.
	\w	Matches a “word” character.
	\W	Matches a character that is not a “word” character.
	\| (Pipe character)	Alternation. Allows a choice among two or more options of the pre-
		ceding and following groups or characters.
	?	Quantifier. Specifies that there is zero or one occurrence of the pre-
		ceding character or group.
	*	Quantifier. Specifies that there are zero or more occurrences of the
		preceding character or group.
	+	Quantifier. Specifies that there are one or more occurrences of the
		preceding character or group.
	{n,m}	Quantifier. Specifies that there is a minimum of n occurrences and a
		maximum of m occurrences of the preceding character or group.
	. (period character)	Matches any character or any character except the newline character.
	[...]	Positive character class. One character contained between the square
		brackets is matched once.
	[^...]	Negated character class. One character not contained between the
		square brackets is matched once.
	\i	Matches a character allowed as a first character in an XML name.
		Equivalent to the character class [A-Za-z_].
	\I	Matches a character not allowed as a first character in an XML name.
		Equivalent to the character class [^A-Za-z_].
	\c	Matches an XML 1.0 name character. Includes the character class
		[A-Za-z0-9.:_].
	\C	Matches a character that is not an XML 1.0 name character.

Positional Metacharacters

In W3C XML Schema, the positional metacharacters, ^ and $, are not supported as beginning-of-line (or beginning-of-string) or end-of-line (or end-of-string) positional metacharacters due to a difference in how matching takes place in W3C XML Schema compared to many other regular expression implementations.

613

Chapter 24

In many regular expression implementations, the pattern [A-Z][0-9] will match any string containing an uppercase alphabetic character followed by a numeric digit. However, in W3C XML Schema, there is a match only if the whole string is matched by the pattern. In other words, when matching in W3C XML Schema, the pattern [A-Z][0-9] is interpreted as though it were ^[A-Z][0-9]$.

Because all W3C XML Schema regular expression patterns are interpreted as though both the ^ and $ metacharacters were already present, they are not supported separately from that implicit mechanism.

The ^ metacharacter can, however, be used in a negated character class.

Matching Numeric Digits

The \d metacharacter can be used to match a numeric digit. For example, the sample document Document.xml contains a number attribute that must be a single numeric digit:

<?xml version=”1.0” encoding=”UTF-8”?>

<Section number=”1”>Content</Section> <Section number=”2”>Content</Section> <Section number=”3”>Content</Section>

</Document>

The corresponding W3C XML Schema document, Document.xsd, uses the \d metacharacter in an xs:pattern element to specify that the value of the Section element’s number attribute is a single numeric digit:

<?xml version=”1.0” encoding=”UTF-8”?>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”>

<xs:element name=”Document”> <xs:complexType>

<xs:sequence>

<xs:element ref=”Section” maxOccurs=”unbounded”/> </xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name=”Section”> <xs:complexType>

<xs:simpleContent>

<xs:extension base=”xs:string”> <xs:attribute name=”number” use=”required”>

<xs:simpleType>

<xs:restriction base=”xs:NMTOKEN”> <xs:pattern value=”\d” /> </xs:restriction>

</xs:simpleType>

</xs:attribute>

</xs:extension>

</xs:simpleContent>

</xs:complexType>

</xs:element>

</xs:schema>

614

Regular Expressions in W3C XML Schema

The value of the xs:restriction element’s base attribute is shown as the type xs:NMTOKEN, but other types could be used in this situation, such as xs:byte.

Alternation

Alternation is supported in W3C XML Schema. The example using BookPattern.xml and BookPattern. xsd earlier in this chapter shows how alternation can be used with the xs:pattern element.

Using the \w and \s Metacharacters

The \w metacharacter represents word characters, including uppercase and lowercase A through Z. The \s metacharacter represents a whitespace character.

The pattern \w+\s+\w+ can be used to represent a name displayed as the first name followed by a space character(s), followed by last name. A sample document, Name.xml, is shown here:

<?xml version=”1.0” encoding=”UTF-8”?>

<Name>John Smith</Name> <Name>Alicia Manton</Name> <Name>Pierre Laval</Name>

</Names>

A corresponding schema, Name.xsd, uses the pattern \w+\s+\w+ to specify how the value of the Name element is to be constructed:

<?xml version=”1.0” encoding=”UTF-8”?>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”>

<xs:element name=”Names”> <xs:complexType>

<xs:sequence>

<xs:element ref=”Name” maxOccurs=”unbounded”> <xs:simpleType>

<xs:restriction base=”xs:string”> <xs:pattern value=”\w+\s+\w+” /> </xs:restriction>

</xs:simpleType>

</xs:element>

</xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

The pattern matches a sequence of word characters followed by one or more whitespace characters, followed by a sequence of word characters.

The pattern specified wouldn’t match names such as Maria Von Trapp because \w+\s+\w+ means, in effect, ^\w+\s+\w+$.

615

<<< < Предыдущая 136 137 138 139 140 141 142 143 144 145 146 147148 / 169148 149 150 151 152 153 154 155 156 157 158 159 160 > Следующая >>>

Соседние файлы в предмете Программирование

#
17.08.20132.9 Mб56Beginning Perl Web Development - From Novice To Professional (2006).pdf
#
17.08.20138.05 Mб109Beginning Programming for Dummies 2004.pdf
#
17.08.201315.78 Mб158Beginning Python (2005).pdf
#
17.08.201313.91 Mб134Beginning Python - From Novice To Professional (2005).pdf
#
17.08.201318.51 Mб233Beginning REALbasic - From Novice To Professional (2006).pdf
#
17.08.201325.42 Mб95Beginning Regular Expressions 2005.pdf
#
17.08.20137.52 Mб25Beginning SharePoint With Excel - From Novice To Professional (2006).pdf
#
17.08.201325.54 Mб67Beginning Ubuntu Linux - From Novice To Professional (2006).pdf
#
17.08.201314.97 Mб219Beginning Visual Basic 2005 (2006).pdf
#
17.08.201321.25 Mб386Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006).pdf
#
17.08.201338.67 Mб28Blog Design Solutions (2006).pdf