Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006)

.pdf
Скачиваний:
387
Добавлен:
17.08.2013
Размер:
21.25 Mб
Скачать

400

C H A P T E R 1 5 W O R K I N G W I T H X M L

When you use the XmlDocument/_XPathNavigator combination, for example, you need to load the entire document into memory, which can be a bit of a chore if you are downloading it over a slow link and the document is huge. XmlReader lets you walk through a document node by node, without loading the whole document, processing just as much of the document as you need, and no more.

XMLReader lets you move progressively through an XML document, examining each node, attribute, or element. If you’ve done database programming before, it may help you to think of XmlReader as being cursor based. What that means is that XmlReader maintains a pointer, if you like, into the part of the document about to be read. For example, if we had this XML document

<SomeDocument>

<SomeElement> SomeValue

</SomeElement>

</SomeDocument>

we could call the XmlReader method ReadToFollowing("SomeElement") to effectively position the reader just before the opening tag of the SomeElement element. To move past the tag, we could call ReadStartElement() to position the reader immediately after the

<SomeElement> tag.

XmlReader has a bunch of methods for moving around documents like this, and they are shown in Table 15-1.

Table 15-1. The Methods of the XmlReader Class

Method

Description

Read()

Reads the next node, and returns False if there are no more nodes to

 

read.

ReadContentAs...()

A group of methods that read the content at the reader’s current

 

position as the specified type. For example, to grab a string you’d call

 

ReadContentAsString(); for a float, you’d call ReadContentAsFloat(),

 

and so on.

ReadElementContentAs()

This is the same as ReadContent() but will move the reader to the

 

content of the named element if it’s not already there. Just as with the

 

ReadContentAs...() methods, you call the method matching the data

 

type you are most interested in (that is, ReadElementContentAsString(),

 

ReadElementContentAsInt(), and so on).

ReadElementString()

Because reading string elements is by far the most common

 

operation people need to do, XmlReader includes a handy

 

ReadElementString() method that takes the name of an element

 

and returns its contents from the document.

ReadStartElement()

Positions the reader immediately before the next element node. You

 

can also pass in the name of an element and the reader will jump to

 

immediately before that element.

C H A P T E R 1 5 W O R K I N G W I T H X M L

401

Method

Description

ReadEndElement()

This moves the reader to immediately after the close tag for the

 

current node.

ReadSubTree()

Call this method when the reader is positioned on a node with

 

children to get a new XmlReader returned just for processing the

 

child nodes.

ReadToDescendent()

This is the method to call when you want to move the reader to a

 

specific child node. Just pass in the name of the element you want,

 

and the reader will jump to it if it exists, or return False if it is unable.

ReadToFollowing()

Moves the reader to the next node matching the name passed in,

 

regardless of whether or not that node is a child node or sibling. As

 

with the other Read methods, this returns True if the move is

 

successful, False if not.

ReadToNextSibling()

This method is used to move the reader to the next element at the

 

same level as the current one, with the name specified. It returns

 

False if the read fails.

Skip()

If the reader is currently positioned on an element with children,

 

calling Skip() skips them and moves the reader to the next node at

 

the same level.

MoveToAttribute()

Passes in the name of the attribute of the current element to position

 

the reader at that element.

MoveToContent()

Moves the reader to the next content node, skipping white space,

 

processing instructions, and comments.

 

 

XmlReader is an abstract class, which of course means that you can’t create one in the usual way. Instead, to create an XmlReader, you call a static method called Create() and pass in a stream or a string. Create() will return an appropriate subclass of XmlReader for you to use. In programming circles Create() is known as a factory method, a method that will basically create an object suited to your needs without you having to worry about working with a bunch of different types of XmlReader.

Let’s see all this in action and write a simple console application to list out all the article types in the Ablog RSS feed.

Try It Out: Using XmlReader

Let’s focus on the code (not the UI) with a console application. Start up a new console project in Visual Basic 2005 Express.

As before, the first thing you need to do is grab the XML document (the Apress blog RSS feed). Unlike before, the XmlReader doesn’t work with the entire document but prefers streams, just reading in as much data as necessary to do the job. So, you’ll need to open a stream to read the data. You can do this with the WebClient class in the System.Net namespace. Because we want to work with streams, you’ll also need to use the System.IO namespace. Finally, because we’re working with XML, you’ll of course need to bring

402

C H A P T E R 1 5 W O R K I N G W I T H X M L

in the System.Xml namespace. So, the first order of business is of course to add three using statements to the top of Module1.vb:

Imports System.IO

Imports System.Net

Imports System.Xml

Module Module1

Sub Main()

End Sub

End Module

With the using statements sorted out, you can start work on the code. Let’s open a stream to grab the RSS feed first of all. Go ahead and add a couple of lines of code to the Main() function:

Sub Main()

Dim client As New WebClient() Dim rssFeedStream As Stream = _

client.OpenRead("http://blogs.apress.com/wp-rss2.php")

End Sub

The code here creates a new WebClient and then asks it to give us a stream that we can work with to access the RSS feed. After you have a stream (whether it’s from a website, a file, a memory block—it doesn’t matter) you can attach an XmlReader to it:

Sub Main()

Dim client As New WebClient() Dim rssFeedStream As Stream = _

client.OpenRead("http://blogs.apress.com/wp-rss2.php")

Dim reader As XmlReader = XmlReader.Create(rssFeedStream)

End Sub

As you can see, creating a reader from a stream is pretty trivial. You just have to pass the stream into the reader’s constructor and you’re good to go.

C H A P T E R 1 5 W O R K I N G W I T H X M L

403

You’ll remember when you looked at the XML format of the RSS feed earlier that the document begins with a bunch of stuff we’re not particularly interested in. There are various XML directives at the top of the document, for example, to tell us that this is indeed an XML document, to reference the schema it uses, and so on. We can tell the reader to skip all this extraneous information with a call to MoveToContent():

Sub Main()

Dim client As New WebClient() Dim rssFeedStream As Stream = _

client.OpenRead("http://blogs.apress.com/wp-rss2.php")

Dim reader As XmlReader = XmlReader.Create(rssFeedStream) reader.MoveToContent()

End Sub

What this actually does is make the reader suck data down from the stream until it gets to a content node. It skips any white space in the document, skips the directives, and positions the reader just before the first real XML element. This is perfect. What we’re most interested in, though, are the item elements in the document. This contains information about the various blog posts that we want.

There’s a method on the XmlReader listed in Table 15-1 called ReadToFollowing(). You can pass the name of an element into this method and it will make the reader advance through the document until it finds an element with a matching name. If the move is successful, the method returns True. If not, you get False. This makes it ideal for use in a loop. We can create a loop that repeatedly calls MoveToFollowing("item") to jump through each and every item in the document. Go ahead and add that loop now:

Sub Main()

Dim client As New WebClient() Dim rssFeedStream As Stream = _

client.OpenRead("http://blogs.apress.com/wp-rss2.php")

Dim reader As XmlReader = XmlReader.Create(rssFeedStream) reader.MoveToContent()

While reader.ReadToFollowing("item")

End While

End Sub

404

C H A P T E R 1 5 W O R K I N G W I T H X M L

Now the really neat part. A feature that I absolutely love about the XmlReader class is that it can generate new readers. Why would you want to do that? Well, as I’ve indicated, the XmlReader maintains information about where it is in a document. In this case you’re looping through the items, but when you find an item, you need to jump in and process the elements it contains. You don’t really want to affect what your reader is pointing at though; the while loop here is slick and simple to understand, so why complicate matters?

Instead, inside the loop you can actually generate a new reader just to process this tag and any child tags it contains, leaving your main reader unaffected. So, what you’ll do here is call out to a new function, passing that function a brand new reader just for processing items in the feed. The way that you do this is by calling ReadSubtree(). That gives you a reader just for the child items within the current element:

Sub Main()

Dim client As New WebClient() Dim rssFeedStream As Stream = _

client.OpenRead("http://blogs.apress.com/wp-rss2.php")

Dim reader As XmlReader = XmlReader.Create(rssFeedStream) reader.MoveToContent()

While reader.ReadToFollowing("item")

ProcessItem(reader.ReadSubtree())

End While

End Sub

Private Sub ProcessItem(ByVal reader As XmlReader)

End Sub

So you created a new method stub called ProcessItem() that takes a reader as a parameter, and inside the while loop you generate a new reader for the child elements of the item tag in our RSS feed. Perfect. All you need to do now is write code into the ProcessItem() method to extract data about the feed and print it out to the console.

How can you extract data with an XmlReader()? Well, you can use ReadToFollowing() again to move the new reader to the specific item that you are most interested in. Then, you can use one of the ReadElementContentAs...() methods. These methods are some of the other strengths of the reader; you can access data from the XML document as native .NET types. If you want to grab a string, for example, you can call ReadElementContentAsString(). If you need an integer, call

C H A P T E R 1 5 W O R K I N G W I T H X M L

405

ReadElementContentAsInt(), and so on. We’re most interested in strings, so let’s add the code to find the item, link, and title elements, extract their contents, and print them to the console:

Private Sub ProcessItem(ByVal reader As XmlReader) reader.ReadToFollowing("title")

Dim title As String = _ reader.ReadElementContentAsString("title", _ reader.NamespaceURI)

reader.ReadToFollowing("link") Dim link As String = _

reader.ReadElementContentAsString("link", _ reader.NamespaceURI)

Console.WriteLine("{0}" + vbCrLf + " {1}", title, link)

End Sub

Notice how the call to ReadElementContentAsString() requires two parameters: the name of the item that you want to read and the namespace of it. This is the XML namespace. XML documents can reference lots of different schemas and namespaces to define the elements the document can hold. Again, if this is new to you, check out http://www.xml.org to read all about it. In this case, you just pass in the main namespace of the document itself as the second parameter.

Finally, let’s add a couple of console calls back up in Main() to finish the application:

Sub Main()

Dim client As New WebClient() Dim rssFeedStream As Stream = _

client.OpenRead("http://blogs.apress.com/wp-rss2.php")

Dim reader As XmlReader = XmlReader.Create(rssFeedStream) reader.MoveToContent()

While reader.ReadToFollowing("item") ProcessItem(reader.ReadSubtree())

End While

Console.WriteLine("All done!")

Console.ReadLine()

End Sub

406 C H A P T E R 1 5 W O R K I N G W I T H X M L

Run the program now and you’ll see it print out the item titles and links as shown in Figure 15-8.

Figure 15-8. Run the program to see the links and titles of articles printed out.

Looking back through the code, using XmlReader certainly requires a lot more typing than hitting the document object model directly with XmlDocument and XPathNavigator. But look at the advantages. I think the code we just wrote is better structured and easier to follow. We’re grabbing data from an XML document as native .NET data types. Better yet, RSS feeds are XML documents that can at times get huge. Do we really want to load a whole chunk of data into memory to process it? XmlReader lets us process a moving stream, without loading up memory, processing just as much as we need.

The counterpart to the XmlReader, with the same benefits, is of course the XmlWriter.

C H A P T E R 1 5 W O R K I N G W I T H X M L

407

Writing XML

Here’s the big thing with the XmlWriter. Typically, if you want to produce an XML document from code, you face two issues. The first is formatting. Although XML is a machine-readable data format, it’s not unusual for people to want to be able to read (poke around is probably a better way to put it) the XML source. Second, you always produce XML documents from data you already manage. Perhaps you’re writing out a set of variables you are managing to build a configuration file for your application. Perhaps you’re using XML to store the contents of entire object trees. Maybe you’re actually producing an RSS feed. The point is that the XML document you produce is a bloated copy of data you already have.

If you produce a document by using the XmlDocument object by hand, calling its Insert...() methods, you are actually putting into memory an even bigger copy of data you already have in memory. XmlWriter, just like the reader, is stream based. You write with it and it gets fired into the stream, without dramatically increasing memory footprint. It’s also fast because you don’t need to first construct a data model in memory, then open a stream, then convert the in-memory structure to a textual structure to go out on the stream. In general, XmlWriter is just a better way of working.

Let’s take a look.

Try It Out: Using XmlWriter

You know, the easiest way of producing an XML document is probably to just build a bunch of strings and write them to a file. That’s not really very ”pure” though. Using XmlWriter also gives you the option to validate your XML document, making sure all the elements are closed properly, that attributes are declared properly, and so on. You could do it all the string way, but with a complex XML document you’d probably be damning yourself to a few very long hours of debugging. You’d also have to format the string document by hand, something XmlWriter can do for you automatically.

408

C H A P T E R 1 5 W O R K I N G W I T H X M L

Start a new console application project and add an Imports System.Xml namespace to the top of

Module1.vb:

Imports System.Xml

Module Module1

Sub Main()

End Sub

End Module

Creating an XmlWriter works the same way as it does on an XmlReader; you call a static method called Create(). The difference is the parameters you can pass to the method. XmlWriter.Create() can take a filename, a StringBuilder to write into a string, a stream, or even a writer already set up to talk to a file. We’ll just write straight out to a file. Create the XmlWriter in the Main() function:

Sub Main()

Dim writer As XmlWriter = _

XmlWriter.Create("c:\test.xml")

End Sub

Here you’re just going to write out to a file in the C drive’s root directory, called Test.xml.

Every XML file should start with an XML directive identifying it as an XML file, and the XML standard that the file conforms to. XmlWriter can do all that for us with a call to WriteStartDocument():

Sub Main()

Dim writer As XmlWriter = _

XmlWriter.Create("c:\test.xml")

writer.WriteStartDocument()

End Sub

Now you can get onto producing the meat of the document. You’ll produce a Customers document that contains details of a customer—it could easily be expanded to include a whole bunch of customers, but I’m trying to keep the code short.

C H A P T E R 1 5 W O R K I N G W I T H X M L

409

To write an element that will contain other elements, you need to make two calls. The first is WriteStartElement(). You pass into this the name of the element you want to create, and the writer will write out that name surrounded by tag symbols (<>). When you’re finished adding content to the new element, a call to WriteEndElement() writes out the close tag. It’s a good idea to always write the two calls together so that you don’t accidentally leave one out.

Our document will have a root element called Customers, which will contain Customer tags. Change the code as follows to write out the start and end tags:

Sub Main()

Dim writer As XmlWriter = _

XmlWriter.Create("c:\test.xml")

writer.WriteStartDocument()

writer.WriteStartElement("Customers")

writer.WriteEndElement()

End Sub

You can use the same pattern to build up your first customer in the document:

Sub Main()

Dim writer As XmlWriter = _

XmlWriter.Create("c:\test.xml")

writer.WriteStartDocument()

writer.WriteStartElement("Customers")

writer.WriteStartElement("Customer")

writer.WriteEndElement()

writer.WriteEndElement()

End Sub