Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Mastering UML with Rational Rose 2002.pdf
Скачиваний:
137
Добавлен:
02.05.2014
Размер:
9.68 Mб
Скачать

Chapter 16: XML DTD Code Generation and Reverse Engineering

Introduction to XML DTD

XML evolved as the need arose to structure data on the Web. HTML is very useful for displaying information, but it contains only a limited number of tags that you can use when creating a document. XML is much more flexible; you can create whatever tags you need to effectively describe the data in the document.

The tags are defined in the DTD file for the XML file. A DTD document is comprised of elements that define the types of data that can be included in the XML file.

Elements

An element is defined in three key pieces. The first is the ELEMENT keyword, which indicates that the text to follow defines an element. The second is the name of the element. Each element name must be unique. Further, XML does not allow an element to begin with the characters "xml" in upper or lower case. Finally, the content model defines the items that make up the element. For example, here we have an element called "book" that is made up of a title, table of contents, introduction, and section.

<!ELEMENT book (title, tableofcontents, introduction, section)>

The title, tableofcontents, introduction, and section make up the content model. An element can also contain text in its content model. We can indicate this by using the notation #PCDATA in the content model. For example:

<!ELEMENT title (#PCDATA)>

Here we have an element called title, which is simply a string of text. An element may contain other elements, text (PCDATA), or both in its content model.

This is useful, but it doesn't let us know whether the items in the content model are required or how many items can be contained within the element. There are three symbols we can use here to get more information:

A plus sign (+) indicates that the item is required and that there may be more than one.

An asterisk (*) indicates that the item is not required and that there may be more than one.

A question mark (?) indicates that the item is not required and that there can be only one.

Using these symbols, we return to our example:

<!ELEMENT book (title+, tableofcontents?, introduction?, section+)>

Our example says that a book must have a title. It may or may not have a table of contents or introduction, and will never have more than one table of contents or introduction. It must have at least one section, but can have more than one.

As you can see, we can get a lot of information about the element by including these three symbols in the content model. They are included in the DTD to spell out the rules that apply to the elements and the items in

542

Chapter 16: XML DTD Code Generation and Reverse Engineering

their content models.

Notice that the items in the content model are separated by commas. Commas indicate that the items must appear in the order they are listed in the content model. Our book must first have a title, then a table of contents, then an introduction, and finally its sections.

In some situations, however, you may want to indicate that there is a choice involved. To show a choice, you can use a choice operator (|). The notation would then be:

<!ELEMENT A (B|C)>

This notation suggests that element A is comprised of B or C.

Attributes

An element may have one or more attributes. An attribute is simply a piece of information about the element. Like attributes in the object model, an entity's attribute has a name, data type, and optional default value.

An attribute is declared using the following notation:

<!ATTLIST ElementName AttributeName DataType DefaultValue>

For example:

<!ATTLIST Author Name CDATA> <!ATTLIST Employee EmpID ID>

If an element has more than one attribute, they are listed as follows:

<!ATTLIST Employee Name CDATA Address CDATA Phone CDATA>

There are three additional keywords that can be added to an attribute. The keyword #REQUIRED indicates that the attribute is mandatory. The #IMPLIED keyword indicates that the attribute is not required. Finally, the #FIXED keyword indicates that the attribute's value cannot change. If the attribute is fixed, it must be given a default value.

To assign a default value to an attribute, enter the value at the end of the attribute declaration. For example:

<!ATTLIST book language CDATA "Spanish">

This declaration assigns the default value "Spanish" to a book's language.

Sometimes you want to set a list of valid values for an attribute. In our example, let's assume books must be in Spanish, English, or Japanese. We would specify this as follows:

<!ATTLIST book language CDATA (Spanish | English | Japanese) "Spanish">

Here, the language must be Spanish, English, or Japanese, and the default is Spanish.

Entities and Notations

An entity is used when you want to use a simple word to represent a more complex string. It is a way to enter

543

Chapter 16: XML DTD Code Generation and Reverse Engineering

a lot of information by simply typing the entity name. Entities help simplify documents and keep you from repetitive typing.

An entity may be internal or external. An internal entity is defined in the DTD. An external entity is defined outside the DTD and corresponding XML document (for example, in another XML document). The SYSTEM keyword indicates that the entity is an external entity. External entities may be parsed or unparsed.

Some examples of entities include text strings, external files, and special characters.

Text Strings

If there is a long text string that is repeated many times, an entity can be used to represent the string. For example, instead of typing "the quick brown fox jumps over the lazy dog," you can just type "&lazydog." The format of this type of entity looks like this:

<!ENTITY EntityName "entity text">

For example,

<!ENTITY lazydog "the quick brown fox jumps over the lazy dog">

We define the entity "lazydog" as the string "the quick brown fox jumps over the lazy dog." Now to use the entity, all we have to do is type an ampersand (&) followed by the entity name. Anywhere we type "&lazydog," the XML parser will replace "&lazydog" with the full phrase.

External Files

An entity can be used to represent an external XML file. In this situation, we need to add the SYSTEM keyword. The entity declaration looks like this:

<!ENTITY EntityName SYSTEM "entity location">

For example, if you have the text from the Declaration of Independence in another file, you can define an entity and use that entity name rather than type all of the text. Our entity declaration would look like this:

<!ENTITY Independence SYSTEM "/independence.xml">

Now all we need to do is use the keyword &Independence wherever we want a reference to the external file.

Special Characters

When you need to use a special character, such as ®, you can define an entity that, when used, will be replaced by that special character. This saves you the headache of trying to remember the decimal value of the special character.

Parsed and Unparsed Entities and Notations

An entity may be parsed or unparsed. A parsed entity is one that follows the rules we described earlier in this section; when the XML parser encounters the entity, it replaces the entity name with the text or file the entity represents.

544

Chapter 16: XML DTD Code Generation and Reverse Engineering

The XML parser will ignore an unparsed entity. So why use unparsed entities? They provide a way to include things such as graphics files, video, audio, or other files that are not in XML format. When the XML parser sees an unparsed entity, it will call an application that can process the entity. For example, it may call an application to run the video, which will have an entity declaration that looks like this:

<!ENTITY MyVideo SYSTEM "C:\Videos\Vacation.vid">

The XML parser knows which application to call on to process the entity because of a construct called a notation. A notation identifies the application to be used to process a particular entity, and provides the application location. A notation is documented as follows:

<!NOTATION NotationName SYSTEM "notation location">

For example:

<!NOTATION Video SYSTEM "C:\MyVideoPlayer.exe">

The picture is almost complete, but we're still missing one piece. How does the XML parser know that the Video notation applies to the MyVideo entity? We need to add one last piece to the entity declaration, to tie it to the Video notation. We use the NDATA keyword, followed by the notation name. So now our example contains the two lines:

<!ENTITY MyVideo SYSTEM "C:\Videos\Vacation.vid" NDATA Video> <!NOTATION Video SYSTEM "C:\MyVideoPlayer.exe">

DTD−to−UML Mapping

When reverse engineering or generating DTD, Rose will map the different DTD constructs to classes with the appropriate stereotype. In the remainder of this chapter, we will discuss in detail how the DTD constructs map to Rose model elements. Table 16.1 lists the XML DTD constructs and their corresponding model elements.

Table 16.1: DTD−to−Rose Mapping

DTD Construct

Rose Element

Element

Class with stereotype DTDElement

Attribute

Attribute of element class

Entity

Class with stereotype DTDEntity

Notation

Class with stereotype DTDNotation

Empty element type

Class with stereotype DTDElementEmpty

Any element type

Class with stereotype DTDElementAny

Parsed character element type

Class with stereotype DTDElementPCDATA

Content model with items separated by ","

Class with stereotype DTDSequenceGroup

Content model with items separated by "&"

Class with stereotype DTDSet

Content model with items separated by "|"

Class with stereotype DTDChoiceGroup

545