Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Visual Basic 2005 Express Edition - From Novice To Professional (2006)

.pdf
Скачиваний:
387
Добавлен:
17.08.2013
Размер:
21.25 Mб
Скачать

460

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

Go ahead and add a new class to the solution—call it Book.vb. This class is going to hold the book titles and URLs and also override ToString() so that you can add the objects straight into the list box. The code looks like this:

Public Class Book

Public BookTitle As String

Public BookURL As String

Public Sub New(ByVal title As String, ByVal url As String)

BookTitle = title

BookURL = url

End Sub

Public Overrides Function ToString() As String

Return BookTitle

End Function

End Class

With the new Book class in place, you can focus on getting information out of the web page. For that you’ll use a regular expression (or regex as it is more commonly known).

The first step is to create a Regex object. As I said in the sidebar, there isn’t scope here to give an in-depth tutorial on regexes, but I will cover the basics of using them in .NET.

Add the following line of code into our click event handler, and an Imports statement to gain access to the regex classes:

Imports System.Net

Imports System.Text.RegularExpressions

Public Class Form1

Private Sub bookCheckButton_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles bookCheckButton.Click

Dim newBooksPage As String = _

GetWebPage("http://www.apress.com/book/forthcoming.html")

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

461

' Use a regex here to find all the book info Dim newBooksRegEx As Regex = _

New Regex( _

"<a href=""(/book/bookDisplay\.html\?bID=[0-9]+)"">([^<]+)</a>",

_

RegexOptions.Singleline)

End Sub

Private Function GetWebPage(ByVal url As String) As String

Dim web As New WebClient()

Return web.DownloadString(url)

End Function

End Class

This creates a new Regex object, passing in two parameters: a string containing the regular expression itself, and the SingleLine option. The reason for the latter is that the data you get back from WebClient ends up as a single long line of text data, not multiple lines, so you need to tell the Regex object that you are going to work with a very big single line.

Now, although the regex itself (the string passed as the first option) looks horribly confusing, it’s actually quite easy to follow. Regexes are interpreted character by character, from left to right. So that string says that you are going to look for the << character, followed by the a character, followed by a space, followed by h, then r, then e, then f, and so on. Basically, you specify in the regex the exact format of the actual HTML code that wraps each book on Apress’s forthcoming books page. In addition, the regex contains parentheses to wrap up the parts of any match that contain the book’s URL and title.

Now that you have created the regex itself, you can apply your string against it and grab the matches. Time for some more code:

Private Sub bookCheckButton_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles bookCheckButton.Click

Dim newBooksPage As String = _

GetWebPage("http://www.apress.com/book/forthcoming.html")

462 C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

' Use a regex here to find all the book info Dim newBooksRegEx As Regex = _

New Regex( _

"<a href=""(/book/bookDisplay\.html\?bID=[0-9]+)"">([^<]+)</a>",

_

RegexOptions.Singleline)

Dim books As MatchCollection = newBooksRegEx.Matches(newBooksPage) For Each bookMatch As Match In books

bookList.Items.Add( _

New Book(bookMatch.Groups(2).Value, _ bookMatch.Groups(1).Value))

Next

End Sub

Calling matches on the regex object and passing in a string (in this case the web page you downloaded) returns a MatchCollection object called books. You can iterate through this, pulling up each match that was found. The parentheses in the regex give match groups that you can use in the loop itself to extract the book’s URL and title, and you can use those values to build an instance of your very own Book class. This gets added to the list box.

You’re nearly done. The last piece of the puzzle is to use this information to pop open a browser when the user double-clicks on a book in the list. Head back to the visual form designer and select the ListBox control. Now, double-click the DoubleClick event in the event browser to drop into the list’s double-click code, and start typing:

Private Sub bookList_DoubleClick(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles bookList.DoubleClick

If bookList.SelectedIndex <> -1 Then

Dim selectedBook As Book = CType(bookList.SelectedItem, Book) browser.Navigate("http://www.apress.com/" + _

selectedBook.BookURL, True)

End If

End Sub

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

463

AN ALTERNATE APPROACH TO DOWNLOADS

Just like files on your hard disk, Internet resources can be treated as streams. If you wanted more control over the download process, you could instead call OpenStream() on the web client. This would return a Stream object, which you could attach a reader to, just as if you were reading data from a file. From that point out, reading data from the remote server really does work the same as if you were reading data from a file on disk; just call ReadToEnd() on the Reader object and the net result would be a string representing the entire web page.

Given how simple DownloadString() is to use, you may be wondering why on earth anyone would want to use streams to download in the first place. The answer is that streams let you write a program independent of where the data comes from. Your app could just as well load and display a text file from disk as it could download a web page over the Internet. The actual code would be the same; only the method of obtaining a stream to read in the first place would change. Internet Explorer works this way, for example. You can click Open on Internet Explorer’s File menu and enter either a filename or a web page to open. Presumably, behind the scenes, streams are used to treat both types of resources exactly the same way.

A WORD ON REGEXES (AND SPIDERS)

The solution that we’re developing here is the genesis of something known as a spider. Spiders are small programs, usually without any user interface of note, that can go out on the Web and repurpose the information they find. In effect, spiders turn the whole World Wide Web into their user’s own personal database on anything and everything. Spiders can do anything from grabbing stock quote information, to checking TV listing guides, to monitoring auctions on eBay. The sky is the limit with them, because their capabilities are constrained only by the amount of work that their developers are willing to put into them.

Spiders tend to use regexes, or regular expressions, to do most of their work because regexes are incredibly powerful tools when it comes to extracting tidbits of information from the mass of data that is the modern web page. However, regexes look hideous, and a complete discussion of how to use them, I’m afraid, deserves an entire book of its own and just doesn’t fit into the scope of this one.

The basic format of a regex is simple. A regex is a string that contains characters that you want to match and special meta characters that tell the regex engine what to do. For example, parentheses are used to form groups of characters (something we use in the “Try It Out” to identify the URL of a book and its name). Square brackets are used to identify specific characters or character ranges, the most commonly used being [0-9] to mean numbers, and [a-z] meaning all lowercase letters of the alphabet. The symbols for plus sign, period, asterisk (+ . *) and question mark (?) also have special meanings and so must be preceded with a backslash (\) if you are interested in looking for those characters, rather than the regex engine interpreting them as special regex commands.

For more information on all this and much more, search the online help for regex. If you plan on doing a lot of Internet spider-type work, then I’d heartily recommend picking up Jeffrey Friedl’s Mastering Regular Expressions book (O’Reilly Media, 2002).

464

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

The DoubleClick event fires on the list box whenever the user double-clicks it, whether it contains items or not. The first line of code then needs to make sure that the user did actually click an item, by checking that the ListBox’s SelectedIndex property is not -1 (meaning nothing selected).

Providing that’s all fine, the book is grabbed from the list, using the list’s SelectedItem property. Finally, the magic. Remember that you set the WebBrowser control to not be visible. The reason for this is that you aren’t going to use it. The WebBrowser control’s Navigate() method can take two parameters. The first parameter is the URL to go to, and the second is a True or False indicator letting the control know whether it should open up a new browser. So, in this code you just pass True in here and you can get the WebBrowser control to run your system’s default web browser and show the user the book page in question.

Run the program and you’ll see a result much like mine in Figure 17-11.

Figure 17-11. Run the application. When you double-click an item in the list, a web browser

appears, taking you directly to the page in question.

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

465

As you can see, the WebClient can play a vital, incredibly useful role in many Internet applications. It’s so easy to use that you’ll often find that the actual code needed to work with it is minimal compared to everything else you need to make a full application.

Bear in mind that this solution is, of course, dependent on Apress never ever changing their website URL (www.apress.com). In a true production-grade application, you’d probably drag the base URL out into a configuration file, but the code was written to get a point across more than to be used as a valuable tool for years to come.

Handling Other Types of Data with WebClient

In the preceding “Try It Out,” you saw a great example of how to download a web page into a string. A web page is, after all, just text, and strings are great for storing that. WebClient, though, can handle any kind of data, and can both upload it and download it. Table 17-3 lists the methods of WebClient.

Table 17-3. Methods of System.Net.WebClient

Method

Description

OpenRead()

Opens a stream for reading, just like opening a file stream

DownloadData()

Returns data from the remote web server as a byte array

DownloadFile()

Saves data from the remote web server as a file on your computer

DownloadString()

Downloads the file as a string from the remote server

OpenWrite()

Opens a connection to a remote server you want to send data to

UploadString()

Sends a string to the remote server

UploadData()

Takes a byte array to upload to the server

UploadFile()

Uploads a local file to the remote server

 

 

There are methods I haven’t listed here. For every Upload and Download method, there is an alternate Async method that can be used to run on a separate thread. For example, DownloadDataAsync(), DownloadFileAsync(), UploadFileAsync(), and so on. The

WebClient class raises a number of events to let you check on the status of these Async calls, and also handle the end result. For each Async call, there is a corresponding Completed event. For example, when a call to DownloadDataAsync() completes, the WebClient object will fire a DownloadDataCompleted event.

466

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

In the case of DownloadDataCompleted and DownloadStringCompleted, the actual data returned from the call is passed into the event handler. For example, here’s an event handler for DownloadStringCompleted:

Public Sub OnDownloadStringCompleted(ByVal sender As Object, _

ByVal e As DownloadStringCompletedEventArgs)

Console.WriteLine(e.Result)

End Sub

As you can see, the actual string downloaded is held in the Result property of the

DownloadStringCompletedEventArgs.

The DownloadDataCompleted event works in a similar way, but you need to cast the result to a byte array, because Result is declared as a generic Object type:

Public Sub OnDownloadDataCompleted(ByVal sender As Object, _

ByVal e As DownloadDataCompletedEventArgs)

Dim data() As Byte = CType(e.Result, Byte())

End Sub

What about the DownloadFileCompleted event? Both DownloadDataAsync() and DownloadStringAsync() need to return data, but DownloadFileAsync() just stores a file directly to your computer’s hard disk, so what does the event give us? Actually, it gives you pretty much the same as the other two completed events. Both

DownloadDataCompletedEventArgs and DownloadStringCompletedEventArgs are subclasses of AsyncCompletedEventArgs, which is the type of object passed to DownloadFileCompleted events. There are two important properties in this class that you should check in any kind of Async-completed event handler: Cancelled and Error.

Cancelled is a Boolean (true/false) value that you need to check to see whether the operation was cancelled. WebClient itself has a CancelAsync() method that can get called to cancel a download operation. When it’s called, the download is cancelled, and the completed event is fired with Cancel set to True.

Error is actually an Exception object. So if Error is not Nothing, you can grab it and treat it just like a standard exception, raising it for other code to catch if you want to.

WebClient doesn’t just fire off events when operations complete. It will also fire events to let you track the progress of an upload or download: DownloadProgressChanged() and

UploadProgressChanged().

Both these events get a ProgressChangedEventArgs object passed to them that has a ProgressPercentage property attached. You can use this value directly to update a progress bar, or even just display a percentage complete value to the user.

Let’s move away from the theory and put all this into practice, extending our previous “Try It Out” to work with asynchronous operations.

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

467

WHY ARE SO MANY CHANGES NECESSARY?

It’s my fault—I admit it. If you take a look at the code we just developed, you’ll notice that pretty much everything hangs off our button’s Click event. The button’s click event handler includes most of the code to download the page, then extract data from it, and then build the list box. That’s three distinct operations! It’s good practice to have your code structured so that it consists of methods that do just one thing and do it very well.

For example, we really should have a download method (got that one), a method to extract data from the page, and then another method to populate the list box. That’s the refactoring that we are going to do right now. So, by moving to an asynchronous download model, we’re not only improving the usability of the program for the user, but also improving the structure of our code.

Try It Out: Asynchronous WebClient Operations

Load up the NewBooks project you just worked on. We’re going to have to refactor the code in the project quite a bit to work asynchronously.

The first thing you need to do is move all the code out of the Click event. When the button is clicked, you want just one thing to happen: the download.

Add in two new methods—GetBookDetailsFromWebPage() and AddBooksToListBox()—by using the code that was in the Click event. The new code looks like this:

Private Sub bookCheckButton_Click(ByVal sender As System.Object, _ ByVal e As System.EventArgs) Handles bookCheckButton.Click

GetWebPage("http://www.apress.com/book/forthcoming.html")

End Sub

Private Function GetBookDetailsFromWebPage(ByVal webPage As String) _ As MatchCollection

' Use a regex here to find all the book info Dim newBooksRegEx As Regex = _

New Regex( _

"<a href=""(/book/bookDisplay\.html\?bID=[0-9]+)"">([^<]+)</a>", _ RegexOptions.Singleline)

Return newBooksRegEx.Matches(webPage)

468

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

End Function

Private Sub AddBooksToListBox(ByVal books As MatchCollection)

For Each bookMatch As Match In books bookList.Items.Add( _

New Book(bookMatch.Groups(2).Value, _ bookMatch.Groups(1).Value))

Next

End Sub

Pay special attention to the method signatures here. GetBookDetailsFromWebPage() takes a string containing the web page source as a parameter. It runs the regex you had previously over that string and returns the MatchCollection.

The MatchCollection is then used as a parameter on the AddBooksToListBox() method, iterating over each item in the collection to build up your list box.

The eagle-eyed may also have noticed that we’re no longer expecting a return value from our GetWebPage() method. The reason is that that method is no longer responsible for grabbing the web page source—it simply starts the process. You’ll rewrite the entire GetWebPage() method from scratch so you can see how it works. First clear out all the code and change the function type to a subroutine, like this:

Private Sub GetWebPage(ByVal url As String)

Dim web As New WebClient()

End Sub

You’ll still need to create an instance of WebClient here, so I’ve left that line of code in. Because you’ll be downloading the web page asynchronously, the first thing you need to do here is hook up the event handlers.

Add the highlighted code shown here:

private void GetWebPage(string url)

{

WebClient web = new WebClient();

AddHandler web.DownloadStringCompleted, AddressOf DownloadComplete AddHandler web.DownloadProgressChanged, AddressOf ProgressChanged

}

C H A P T E R 1 7 T H E I N T E R N E T A N D V I S U A L B A S I C

469

Next, you’ll need two new methods to handle those events:

Private Sub GetWebPage(ByVal url As String)

Dim web As New WebClient()

AddHandler web.DownloadStringCompleted, AddressOf DownloadComplete AddHandler web.DownloadProgressChanged, AddressOf ProgressChanged

End Sub

Private Sub DownloadComplete(ByVal sender As Object, _

ByVal e As DownloadStringCompletedEventArgs)

End Sub

Private Sub ProgressChanged(ByVal sender As Object, _

ByVal e As DownloadProgressChangedEventArgs)

End Sub

Now, the final thing your GetWebPage() method needs to do is clear out anything already in the list box and start the download:

Private Sub GetWebPage(ByVal url As String)

Dim web As New WebClient()

AddHandler web.DownloadStringCompleted, AddressOf DownloadComplete AddHandler web.DownloadProgressChanged, AddressOf ProgressChanged

bookList.Items.Clear() web.DownloadStringAsync(New Uri(url))

End Sub

Now here’s a strange thing. When you call DownloadString(), you can just pass in the URL. However, DownloadStringAsync (as with all the other Async methods) expects a special object known as a Uri. You can create this on the fly, passing in your URL to its constructor.

You’re finished with the GetWebPage() method. In fact, you’re pretty much finished with all the big changes now. All you need to complete the app is to put some meaningful code into the two new event handlers. First, let’s tackle ProgressChanged(). All you want to do here is copy your current download progress value into