Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Regular Expressions 2005.pdf
Скачиваний:
95
Добавлен:
17.08.2013
Размер:
25.42 Mб
Скачать

Chapter 24

W3C XML Schema Basics

When XML version 1.0 was released early in 1998, it already had a schema language associated with it. The schema for an XML 1.0 document was a document type definition (DTD). The DTD had several limitations, not least of which was that it had very limited facilities to specify the type of XML data and that it lacked functionality to further constrain XML content.

A schema, in the context of XML documents, is a document that specifies the permitted structure and content of a class of XML documents.

There are two fundamental ways in which W3C XML Schema can work to constrain values. It can constrain the value space or constrain the lexical space. To help distinguish between these two concepts, consider the idea of a value of 100. The value is the same whether you write it as 100.0, 100.00, 100.000, and so on. There is one value in the value space and three (shown here but there are many more) representations of that value in the lexical space. Regular expressions in W3C XML Schema operate on the lexical space, not on the value space.

Tools for Using W3C XML Schema

This chapter illustrates the use of XML editors to create XML instance documents and their corresponding W3C XML Schema documents. Validating the XML instance documents against the schema will allow you to look at the regular expression support in W3C XML Schema.

You can create XML documents and the associated W3C XML Schema documents using a simple text editor. However, using specialized XML editors provides support for some or all of the following functionality: syntax color coding, checking of well-formedness, validation of XML instance documents, association of XML instance documents with a schema, and creation of a W3C XML Schema document from an XML instance document.

The examples of XML documents and associated W3C XML Schema documents shown in this chapter have been created using XMLSpy, XMLWriter, and StylusStudio. Other XML editors have similar facilities that support W3C XML Schema creation from an instance document (or allow you to author a schema from scratch) and test whether or not an instance XML document does or does not validate against a schema, whether that is a DTD or a W3C XML Schema document.

When using XMLSpy or StylusStudio, you can create an XML instance document and then create a W3C XML Schema document from the XML instance document. Of course, depending on how typical the XML instance document is of the class of XML instance documents, you may have to do some editing of the W3C XML Schema document that is created for you.

592

Regular Expressions in W3C XML Schema

Trial downloads of XMLSpy, XMLWriter, and StylusStudio are available from www.xmlspy.com/download.html, www.xmlwriter.com/download/download. shtml, and www.stylusstudio.com/xml_download.html, respectively.

Comparing XML Schema and DTDs

If you had a simple XML document, PersonDataForDTD.xml, like the following, the line with the DOCTYPE declaration would indicate the location of a DTD for the XML instance document:

<!DOCTYPE PersonData SYSTEM “C:\BRegExp\Ch24\PersonData.dtd”> <PersonData>

<Person>

<LastName>Smith</LastName>

<FirstName>John</FirstName>

</Person>

</PersonData>

If you are unfamiliar with the syntax for the DOCTYPE declaration and have a tool like XMLSpy or Stylus Studio, you can use the software to create the DTD and associate the XML instance document with the DTD.

The first line of PersonDataForDTD.xml references a DTD located at C:\BRegExp\Ch24\ PersonData.dtd. If you have downloaded the code files to a different location, you will need to edit the code to be able to validate the XML in XMLSpy or a similar XML editor.

The DTD, PersonData.dtd, for that instance document is shown here:

<?xml version=”1.0” encoding=”UTF-8”?> <!ELEMENT FirstName (#PCDATA)> <!ELEMENT LastName (#PCDATA)> <!ELEMENT Person (LastName, FirstName)> <!ELEMENT PersonData (Person)>

The PersonData element is shown, in the final line, to contain Person elements. In turn, on the second- to-last line, the Person element is shown to contain LastName and FirstName elements. In the second and third lines, the FirstName and LastName are shown to contain PCDATA (parsed character data). Essentially, all that says is that the content of the FirstName and LastName elements is a sequence of Unicode characters that will be parsed by the XML parser.

DTDs can’t, for example, specify that an element is to contain a character sequence that is a valid credit card number, phone number, e-mail address, and so on. That limitation of DTDs was one of the reasons why W3C XML Schema was developed.

593

Chapter 24

XMLSpy and StylusStudio can create, on request, a W3C XML Schema document to reflect the structure in a sample XML instance document. In XMLSpy, you can create a W3C XML Schema document automatically.

Figure 24-1 shows how to create a schema in XMLSpy for a sample XML instance document,

PersonDataForSchema.xml:

<?xml version=”1.0” encoding=”UTF-8”?> <PersonData >

<Person>

<LastName>Smith</LastName>

<FirstName>John</FirstName>

</Person>

</PersonData>

Figure 24-1

594

Regular Expressions in W3C XML Schema

A dialog box is then displayed, as shown in Figure 24-2. To create a W3C XML Schema document, select the radio button for W3C Schema, as shown in the figure.

Figure 24-2

XMLSpy asks if you want to associate the XML instance document with the W3C XML Schema document it has created. Figure 24-3 shows the dialog box. If you want that, XMLSpy adds the necessary code to the PersonData element to allow a validating parser, which tools such as XMLSpy, XMLWriter, and StylusStudio have built in, to locate the W3C XML Schema document and carry out the validation process.

Figure 24-3

595

Chapter 24

The created W3C XML Schema document, PersonData.xsd, is shown here:

<?xml version=”1.0” encoding=”UTF-8”?>

<xs:schema xmlns:xs=”http://www.w3.org/2001/XMLSchema” elementFormDefault=”qualified”>

<xs:element name=”FirstName” type=”xs:string”/> <xs:element name=”LastName” type=”xs:string”/> <xs:element name=”Person”>

<xs:complexType>

<xs:sequence>

<xs:element ref=”LastName”/> <xs:element ref=”FirstName”/>

</xs:sequence>

</xs:complexType>

</xs:element>

<xs:element name=”PersonData”> <xs:complexType>

<xs:sequence>

<xs:element ref=”Person”/> </xs:sequence>

</xs:complexType>

</xs:element>

</xs:schema>

If you compare this W3C XML Schema document to the DTD PersonData.dtd shown earlier, you will see that the corresponding W3C XML Schema document is much longer. The verbosity of W3C XML Schema attracted criticism but must simply be accepted now that the specification has been finalized.

The reason for the W3C XML Schema document being saved first, as shown in Figure 24-3, is that information about the location of the saved W3C XML Schema file is added to the XML instance file.

The modified file, PersonDataAssocSchema.xml, is shown here:

<?xml version=”1.0” encoding=”UTF-8”?>

<PersonData xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\PersonData.xsd”>

<Person>

<LastName>Smith</LastName>

<FirstName>John</FirstName>

</Person>

</PersonData>

XMLSpy adds a namespace declaration for the XML Schema instance namespace:

xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”

The xsi:noNamespaceSchemaLocation attribute, which is in the XML Schema instance namespace, is also added to the document element, with its value, a URI, indicating the location of the W3C XML Schema document. In this case, the W3C XML Schema document is located at C:\BRegExp\Ch24\

596

Regular Expressions in W3C XML Schema

PersonData.xsd. If you want to validate the XML document and the schema is in some other location, you will need to change the value of the xsi:noNamespaceSchemaLocation attribute appropriately:

xsi:noNamespaceSchemaLocation=”C:\BRegExp\Ch24\PersonData.xsd”

After XMLSpy has associated a W3C XML Schema document with an XML instance document, you can use XMLSpy to validate the XML instance document. The cursor in Figure 24-3 is hovering over the relevant toolbar button. Toward the bottom of Figure 24-3, you can see the message indicating that the document is valid according to the schema.

You can similarly validate an XML instance document, PersonDataAssocSchema.xml, in Stylus Studio (shown in Figure 24-4) or XMLWriter (shown in Figure 24-5). The arrow cursor in each figure shows you the relevant toolbar button to validate an XML instance document.

Figure 24-4

Whether you already have an XML editor or choose to use the trial downloads for XMLSpy, StylusStudio, or XMLWriter, you should now be in a position to validate an XML instance document against its schema. So you can now try out the examples in this chapter.

597