Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
(ebook) Visual Studio .NET Mastering Visual Basic.pdf
Скачиваний:
120
Добавлен:
17.08.2013
Размер:
15.38 Mб
Скачать

THE HASHTABLE COLLECTION 497

Iterating an ArrayList

To iterate through the elements of an ArrayList collection, you can set up a For…Next loop like the following one:

For i = 0 To ArrayList.Count – 1

{ process item ArrayList(i) }

Next

This is a trivial operation, but the processing itself can get as complicated as the type of objects stored in the collection requires. The current item at each iteration is the ArrayList(i). If you don’t know its exact type, assign it to an Object variable and then process it.

You could also use the For Each…Next loop with an Object variable, as shown next:

Dim itm As Object

For Each itm In ArrayList { process item itm }

Next

If all the items in the ArrayList are of the same type, you can use a variable of the same type to iterate through the collection, instead of a generic Object variable. If all the elements were Decimals, for example, you can declare the itm variable as Decimal.

An even better method is to create an enumerator for the collection and use it to iterate through its items. This technique applies to all collections and is discussed in the section “Enumerating Collections,” later in this chapter.

The ArrayList class addresses most of the problems associated with the Array class, but one last problem remains—that of accessing the items in the collection through a meaningful key. This is the problem addressed by the HashTable collection.

The HashTable Collection

The ArrayList is a more convenient form of an array. It’s dynamic, it allows you to insert items anywhere and remove items from the collection with a single method call, and it supports all the convenient features of an array, like sorting and searching.

Yet, both collections have a drawback: namely, you must access their elements by an index. Another collection, the HashTable collection, is similar to the ArrayList, but it allows you to access the items by a key. Each item has a value and a key. The value is the same value you store in an array, but the key is a meaningful entity for accessing the items in the collection.

The HashTable exposes most of the properties and methods of the ArrayList, with a few notable exceptions. The Count property returns the number of items in the collection as usual, but the HashTable collection doesn’t expose a Capacity property. The HashTable collection uses fairly complicated logic to maintain the list of items, and it adjusts its capacity automatically. Fortunately, you need not know how the items are stored in the collection. In short, it creates automatically a unique key for each item. This key is derived from the item being added, and it’s possible that two items will produce the same key—not very likely, but the possibility is not zero. The HashTable class uses a complicated algorithm to handle all possible cases, but you need not be concerned with these details. The Framework provides all these classes so that you won’t have to write low-level code.

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com

498 Chapter 11 STORING DATA IN COLLECTIONS

To use a HashTable in your code, you need not import any class. Just declare a HashTable variable with the following statement:

Dim hTable As New HashTable

To add an item to the HashTable, use the Add method, whose syntax is

hTable.Add(key, value)

value is the item you want to add (it can be any object), and key is a value you supply, which represents the item. This is the value you’ll use later to retrieve the item. If you’re setting up a structure for storing temperatures in various cities, use the city names as keys:

Dim Temperatures As New HashTable

Temperatures.Add(“Houston”, 81)

Temperatures.Add(“Los Angeles”, 78)

Notice that you can have duplicate values, but the keys must be unique. If you attempt to use an existing key, an argument exception will be raised. To find out whether a specific value or key is already in the collection, use the ContainsKey and ContainsValue methods. The syntax of the two methods is quite similar:

hTable.ContainsKey(object)

hTable.ContainsValue(object)

The HashTable collection exposes the Contains method too, which is identical to the ContainsKey method.

To find out whether a specific key is in use already, use the ContainsKey method, as shown in the following statements, which add a new item to the HashTable only if it’s key doesn’t exist already:

Dim value As New Rectangle(100, 100, 50, 50) Dim key As String = “object1”

If Not hTable.ContainsKey(key) Then hTable.Add(key, value)

End If

The Values and Keys properties allow you to retrieve all the values and the keys in the HashTable. Both properties are collections and expose the usual members of a collection. To iterate through the values stored in the HashTable hTable, use the following loop:

Dim itm As Object

For Each itm In hTable.Values

Console.WriteLine(itm)

Next

There is only one method to remove items from an ArrayList: the Remove method, which accepts as argument the key of the item to be removed:

hTable.Remove(key)

To extract items from a HashTable, use the CopyTo method. This method copies the items to a one-dimensional array, and its syntax is

newArray = HTable.CopyTo(arrayName)

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com

THE HASHTABLE COLLECTION 499

You must set up the array that will accept the items beforehand, because this method can throw several different exceptions for various error conditions. The array that accepts the values must be one-dimensional, and there should be enough space in the array for the HashTable’s values. Moreover, the array’s type must be Object, because this is the type of the items you can store in a HashTable.

Listing 11.7 demonstrates how to scan the keys of a HashTable through the Keys property and then use these keys to access the items through the Item property (and passing the key as argument).

Listing 11.7: Iterating a HashTable

Private Function ShowHashTableContents(ByVal table As Hashtable) As String Dim msg As String

Dim element, key As Object

msg = “The HashTable contains “ & table.Count.tostring & “ elements:” & vbCrLf For Each key In table.keys

element = table.Item(key) msg = msg & vbCrLf

msg = msg & “ Element Type = “ & element.GetType.ToString & vbCrLf msg = msg & “ Element Key= “ & Key.ToString

msg = msg & “ Element Value= “ & element.ToString & vbCrLf Next

Return(msg) End Sub

To print the contents of a HashTable variable on the Output window, call the ShowHashTableContents() function, passing the name of the HashTable as argument, and then print the string returned by the function:

Dim HT As New HashTable

{ statements to populate HashTable } Console.WriteLine(ShowHashTableContents(HT))

VB.NET at Work: The WordFrequencies Project

In this section, you’ll develop an application that counts word frequencies in a text. The WordFrequencies application scans text files and counts the occurrences of each word in the text. As you will see, the HashTable is the natural choice for storing this information, because you want to access a word’s frequency by the word. To retrieve (or update) the frequency of the word elaborate, for example, you will use the expression:

Words(“ELABORATE”).Value

Arrays and ArrayLists are out of the question, because they can’t be accessed by a key. You could also use the SortedList collection, which is described later in this chapter, but this collection maintains its items sorted at all times. If you need this functionality as well, you can modify the application accordingly. The items in a SortedList are also accessed by keys, so you won’t have to introduce substantial changes in the code.

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com

500 Chapter 11 STORING DATA IN COLLECTIONS

Let me start with a few remarks. First, all words we locate in the various text files will be converted to uppercase. Because the keys of the HashTable are case-sensitive, converting them to uppercase makes them unique. This way, we don’t risk counting the same word in different cases as two or more different words.

The frequencies of the words can’t be calculated instantly, because we need to know the total number of words in the text. Instead, each value in the HashTable is the number of occurrences of a specific word. To calculate the actual frequency of the same word, you must divide this value by the number of occurrences of all words, but this can happen only after we have scanned the entire text file and counted the occurrences of each word. Since this operation will introduce delays in the application, I’ve decided to keep track of number of occurrences only and calculate the word frequencies when requested.

When the code runs into another instance of the word elaborate, it simply increases the matching item of the HashTable by one:

Words(“ELABORATE”).Value = Words(“ELABORATE”).Value + 1

The application’s interface is shown in Figure 11.3. To scan another text file and process its words, click the Read Text File button. You’ll be prompted to select the name of the file to be processed with an Open dialog box. Then, you can click the Show Word Count button to count the number of occurrences of each word in the text. The last button on the form sorts the words according to their count.

Figure 11.3

The WordFrequencies project demons trates how to use the HashTable collection.

The application maintains a single HashTable collection, the Words collection, and it updates this collection rather than counting word occurrences from scratch. The Frequency Table menu contains the commands to save the collection’s items to a disk file and read the same data from the file. Use one of the Save commands to save the HashTable to a disk file, and use the equivalent Load command to read the data from the disk file into the HashTable. The commands in this menu can store the data either to a text file (Save SOAP/Load SOAP commands) or to a binary file (Save Binary/Load Binary). Use

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com

THE HASHTABLE COLLECTION 501

these commands to store the data generated in a single session, load the data in a later session, and process more files. These commands will be discussed in detail at the end of the chapter, where we’ll explore the Serialization class. For now, you can use the commands to continue processing text files in multiple sessions.

The WordFrequencies application uses techniques and classes we haven’t discussed yet. The topic of reading from (or writing to) files is discussed in the following chapter. You don’t really have to understand the code that opens a text file and reads its lines; just focus on the segments that manipulate the text file. To test the project, I used some very large files I downloaded from the Project Gutenberg Web site (http://promo.net/pg/). This site contains entire books in electronic format (plain text files), and you can borrow some files to test any program that manipulates text (in addition to reading them, of course).

The code reads the text into a string variable, the str variable. Then, it calls the Split method of the String class to split the text into individual words. The Split method uses the space, comma, period, quote, exclamation mark, colon, semicolon, and newline characters as delimiters. The individual words are stored in the Words array. The program goes through each word in the array and determines whether it’s a valid word by calling the IsValidWord() function. This function returns False if one of the characters in the word is not a letter; strings like “B2B” or “U2” are not considered proper words. IsValidWord() is a custom function, and you can edit it as you wish.

Any valid word becomes a key to the WordFrequencies HashTable. The corresponding value is the number of occurrences of the specific word in the HashTable. If a key (a new word) is added to the table, its value is set to 1. If the key exists already, then its value is increased by 1, with the following If statement:

If Not WordFrequencies.ContainsKey(word) Then

WordFrequencies.Add(word, 1)

Else

WordFrequencies(word) = CType(WordFrequencies(word), Integer) + 1

End If

The code that reads the text file and splits it into individual words is shown in Listing 11.8. The code prompts the user to select a text file with the Open dialog box and then reads the entire text into a string variable, the txtLine variable, and the individual words are isolated with the Split method of the String class.

Listing 11.8: Splitting a Text File into Words

Private Sub Button1_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Button1.Click

OpenFileDialog1.DefaultExt = “TXT”

OpenFileDialog1.Filter = “Text|*.TXT|All Files|*.*”

OpenFileDialog1.ShowDialog()

If OpenFileDialog1.FileName = “” Then Exit Sub

Dim str As StreamReader

Dim txtFile As File

Dim txtLine As String

Dim Words() As String

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com

502 Chapter 11 STORING DATA IN COLLECTIONS

Dim Delimiters() As Char = {CType(“ “, Char), CType(“.”, Char), _ CType(“,”, Char), CType(“‘“, Char), _ Ctype(“!”, Char), Ctype(“;”, Char), _ Ctype(“:”, Char), Chr(10), Chr(13)}

str = File.OpenText(OpenFileDialog1.FileName) txtLine = str.ReadLine()

txtLine = str.ReadToEnd

Words = txtLine.Split(Delimiters) Dim iword As Integer, word As String

For iword = 0 To Words.GetUpperBound(0) word = Words(iword).ToUpper

If IsValidWord(word) Then

If Not WordFrequencies.ContainsKey(word) Then WordFrequencies.Add(word, 1)

Else

WordFrequencies(word) = CType(WordFrequencies(word), Integer) + 1 End If

End If Next

End Sub

This event handler calculates the count of the unique words and displays them on a TextBox control. In a document with 130,000 words, it didn’t take more than a couple of seconds to perform all the calculations. The process of displaying the list of unique words on a TextBox control was very fast too, thanks to the StringBuilder class. The code behind the Show Word Count button (Listing 11.9) displays the list of words along with the number of occurrences of each word in the text.

Listing 11.9: Displaying the Count of Each Word in the Text

Private Sub Button2_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Button2.Click Dim wEnum As IDictionaryEnumerator

Dim occurrences As Integer

Dim allWords As New System.Text.StringBuilder() wEnum = WordFrequencies.GetEnumerator

While wEnum.MoveNext

allWords.Append(wEnum.Key.ToString & vbTab & “—>” & vbTab & _ wEnum.Value.ToString & vbCrLf)

End While

TextBox1.Text = allWords.ToString End Sub

The last button on the form calculates the frequency of each word in the HashTable, sorts them according to their frequencies, and displays the list; its code is detailed in Listing 11.10.

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com

THE HASHTABLE COLLECTION 503

Listing 11.10: Sorting the Words According to Frequency

Private Sub Button3_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Button3.Click Dim wEnum As IDictionaryEnumerator

Dim Words(WordFrequencies.Count) As String

Dim Frequencies(WordFrequencies.Count) As Double Dim allWords As New System.Text.StringBuilder() Dim i, totCount As Integer

wEnum = WordFrequencies.GetEnumerator While wEnum.MoveNext

Words(i) = CType(wEnum.Key, String) Frequencies(i) = CType(wEnum.Value, Integer) totCount = totCount + Frequencies(i)

i = i + 1 End While

For i = 0 To Words.GetUpperBound(0) Frequencies(i) = Frequencies(i) / totCount

Next

Words.Sort(Frequencies, Words) TextBox1.Clear()

For i = Words.GetUpperBound(0) To 0 Step -1 allWords.Append(Words(i) & vbTab & “—>” & vbTab & _

Format(100 * Frequencies(i), “#.000”) & vbCrLf)

Next

TextBox1.Text = allWords.ToString End Sub

Handling Large Sets of Data

Incidentally, my first attempt was to display the list of unique words on a ListBox control. The process was incredibly slow. The first 10,000 words were added in a few seconds, but as the number of items increased, the time it took to add them to the control increased exponentially (or so it seemed).

Adding thousands of items to a ListBox control is a very slow process. It’s likely that you will run into situations where a seemingly simple task will turn out to be detrimental to your application’s performance. You should try different approaches, but also consider a total overhaul of your user interface. Ask yourself, who needs to see a list with 10,000 words? You can use the application to do the calculations and then retrieve the count of selected words, or display the 100 most common ones, or even display 100 words at a time. I’m displaying the list of words because this is a demonstration, but a real application shouldn’t display such a long list. The core of the application counts unique words in a text file, and it does it very efficiently.

Appending each word to a TextBox control was slow too, so I’ve used a string variable to store the text, then assign it to the control. This variable is the allWords variable, which was declared with the StringBuilder type. As you will learn in the following chapter, the StringBuider class manipulates strings like the String class, but it’s very fast.

Copyright ©2002 SYBEX, Inc., Alameda, CA

www.sybex.com