Добавил:

fench Опубликованный материал нарушает ваши авторские права? Сообщите нам.

Вуз:

Сумский государственный университет

Предмет:

Программирование

Файл:

Beginning Regular Expressions 2005.pdf

Скачиваний:

Добавлен:

17.08.2013

Размер:

25.42 Mб

Скачать

☆

<<< < Предыдущая 134 135 136 137 138 139 140 141 142 143 144 145146 / 169146 147 148 149 150 151 152 153 154 155 156 157 158 > Следующая >>>

Chapter 24

Derivation by Restriction

When using W3C XML Schema, there are often several ways to specify a specific desired structure. Of the methods of derivation in the preceding list, derivation by restriction is the most commonly used.

One method of restriction is to specify an enumeration. The following XML instance document, BookEnum.xml, is associated with a W3C XML Schema document that contains an enumeration:

<?xml version=”1.0” encoding=”UTF-8”?>

<Chapter number=”1”>Some content</Chapter> <Chapter number=”2”>Some content</Chapter> <Chapter number=”3”>Some content</Chapter> <Chapter number=”4”>Some content</Chapter> <Chapter number=”5”>Some content</Chapter>

</Book>

The associated W3C XML Schema document, BookEnum.xsd, created by XMLSpy, constrains the values of the number attribute of the Chapter element to be an enumeration of values from 1 through 5:

<?xml version=”1.0” encoding=”UTF-8”?>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”>

<xs:element name=”Book”> <xs:complexType>

<xs:sequence>

<xs:element ref=”Chapter” maxOccurs=”unbounded”/> </xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name=”Chapter”> <xs:complexType>

<xs:simpleContent>

<xs:extension base=”xs:string”> <xs:attribute name=”number” use=”required”>

<xs:simpleType>

<xs:restriction base=”xs:NMTOKEN”> <xs:enumeration value=”1”/> <xs:enumeration value=”2”/> <xs:enumeration value=”3”/> <xs:enumeration value=”4”/> <xs:enumeration value=”5”/>

</xs:restriction>

</xs:simpleType>

</xs:attribute>

</xs:extension>

</xs:simpleContent>

</xs:complexType>

</xs:element>

</xs:schema>

602

Regular Expressions in W3C XML Schema

The value of the number attribute is a simple type value. The schema document that XMLSpy creates uses the xs:NMTOKEN datatype, because the sample values of 1, 2, 3, 4, and 5 in the XML instance document allow for that datatype. However, the same constraint on values could be applied using the xs:pattern element as in BookPattern.xsd, shown here:

<?xml version=”1.0” encoding=”UTF-8”?>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”>

<xs:element name=”Book”> <xs:complexType>

<xs:sequence>

<xs:element ref=”Chapter” maxOccurs=”unbounded”/> </xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name=”Chapter”> <xs:complexType>

<xs:simpleContent>

<xs:extension base=”xs:string”> <xs:attribute name=”number” use=”required”>

<xs:simpleType>

<xs:restriction base=”xs:NMTOKEN”> <xs:pattern value=”(1|2|3|4|5)” /> </xs:restriction>

</xs:simpleType>

</xs:attribute>

</xs:extension>

</xs:simpleContent>

</xs:complexType>

</xs:element>

</xs:schema>

An XML instance document associated with BookPattern.xsd is provided as BookPattern.xml in the code download. The only change from BookEnum.xml is that the xsi:noNamespaceSchemaLocation attribute points to the BookPattern.xsd file:

<Book xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”

xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\BookPattern.xsd”>

The xs:pattern element is featured prominently in the remainder of this chapter, because it is the W3C XML Schema element that uses regular expressions. The value of the xs:pattern element’s value attribute is a regular expression pattern — hence, the name of the element.

In the pattern shown in the preceding code listing, notice that the value of the value attribute is a fairly simple example of alternation, (1|2|3|4|5), which allows the value to be any one value of 1, 2, 3, 4, or 5.

Before looking at the range of metacharacters supported in W3C XML Schema and how those metacharacters can be used, read about how Unicode is relevant to regular expressions in W3C XML Schema documents.

603

Chapter 24

Unicode and W3C XML Schema

XML documents consist of sequences of Unicode characters. Unicode contains many thousands of characters. In reality, few, if any, applications can display all Unicode characters, and very few human beings could easily understand all Unicode characters. To make Unicode more manageable, the characters are divided into Unicode character classes and Unicode blocks. Each of these is discussed later in this section.

Full information about Unicode is located at www.unicode.org. At the time of this writing, the current version of the Unicode Standard is version 4.0.1. Further information about the Unicode Standard is located at www.unicode.org/ standard/standard.html.

Unicode Overview

The Unicode Standard defines the universal character set. The aim of Unicode is to allow the interchange of text content across all the languages of planet Earth. Unicode specifies a text encoding for most characters of most languages, as well as characters to assist in interoperability with older character encodings.

The Windows Character Map utility provides a convenient way to examine the Unicode codes for many individual characters. Figure 24-6 shows the uppercase A selected. Notice in the lower part of the figure that uppercase A is U+0041. The number following the U and the + sign must consist of at least four numeric digits. The number is a sequence of hexadecimal digits. In this example, uppercase A is hexadecimal 0041, which is 65 in decimal notation.

Figure 24-6

604

Regular Expressions in W3C XML Schema

In XML, uppercase A can also be written as A. In most situations, it is simpler to express characters commonly used in English literally.

A Unicode character class indicates the type of usage for a set of characters — for example, lowercase letters. A Unicode character block indicates a language or other means of expression associated with that block of characters.

Using Unicode Character Classes

When using a Unicode character class in W3C XML Schema documents, the character class is specified as follows:

\p{characterClass}

The following table summarizes the Unicode character classes supported in W3C XML Schema.

Unicode Character Class	Description

C	Other characters
Cc	Control characters
Cf	Format characters
Cn	Unassigned code points
L	Letters
Ll	Lowercase letters
Lm	Modifier letters
Ln	Other letters
Lt	Title-case letters
Lu	Uppercase letters
M	All marks
Mc	Space-combining marks
Me	Enclosing marks
Mn	Nonspacing marks
N	Numbers
Nd	Decimal digits
Nl	Number letters
No	Other numbers
P	Punctuation
Pc	Connector punctuation
	Table continued on following page

605

<<< < Предыдущая 134 135 136 137 138 139 140 141 142 143 144 145146 / 169146 147 148 149 150 151 152 153 154 155 156 157 158 > Следующая >>>

Соседние файлы в предмете Программирование

#
17.08.20132.9 Mб56Beginning Perl Web Development - From Novice To Professional (2006).pdf
#
17.08.20138.05 Mб109Beginning Programming for Dummies 2004.pdf
#
17.08.201315.78 Mб158Beginning Python (2005).pdf
#
17.08.201313.91 Mб134Beginning Python - From Novice To Professional (2005).pdf
#
17.08.201318.51 Mб233Beginning REALbasic - From Novice To Professional (2006).pdf
#
17.08.201325.42 Mб95Beginning Regular Expressions 2005.pdf
#
17.08.20137.52 Mб25Beginning SharePoint With Excel - From Novice To Professional (2006).pdf
#
17.08.201325.54 Mб67Beginning Ubuntu Linux - From Novice To Professional (2006).pdf
#
17.08.201314.97 Mб219Beginning Visual Basic 2005 (2006).pdf
#
17.08.201321.25 Mб386Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006).pdf
#
17.08.201338.67 Mб28Blog Design Solutions (2006).pdf