Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Beginning Programming for Dummies 2004.pdf
Скачиваний:
109
Добавлен:
17.08.2013
Размер:
8.05 Mб
Скачать

292 Part V: Algorithms: Telling the Computer What to Do

The binary-search algorithm can work only on a sorted list.

Hashing

Finding something is always easier if you know where you stored it last. That’s why finding your car keys is easier if they’re hanging on a hook by the front door than if you must search the entire house because you can’t remember where you put them.

Hashing works on a similar principle. If you want to store an item, hashing first calculates a numeric value (known as a hash function) that identifies that item. Then the program uses this numeric value to store the item in a specific location in a data structure (such as an array or a linked list). Now, instead of needing to search an entire array or linked list to find a single item, the program just needs to look in the specific location using the hash function.

Suppose, for example, that you want to store an integer in an array. If you store the integer anywhere in the array, you must search the entire array to find that number again. But if you use hashing, the task is much simpler. First, calculate a hash function by using the following formula:

HashValue = Number to store MOD 5

This formula tells the computer to take the number that you want to store, divide it by five, and use the remaining value as the hash function. So if you want to store the number 26, the hash function is 1 (26 / 5 = 5 with a remainder of 1).

MOD is a special division command that tells the computer to divide a number and return the remainder. The command 26 MOD 5, for example, returns the value of 1. Because Liberty BASIC doesn’t support the MOD command, you need to use the following formula instead, which mimics the MOD command that you find in other languages:

HashValue = Number to Store - (INT(Number to Store / 5) * 5)

No matter what number you choose to store, the hash function calculates one of the following five values — 0, 1, 2, 3, or 4. You can create an array and use the hash function to determine the exact position at which to store an item in the array. Because the hash function of the number 26 equals 1, you can store this integer in array location number 1, as shown in Figure 21-2.

The value of the hash function determines where to store the item in a data structure.

Chapter 21: Searching 293

Figure 21-2:

Hashing calculates a value to determine where to store data initially and where to find the data later.

Array location 0 1 2 3 4

26

1.Store the number 26 in the array.

2.Calculate the hash function of 26: 26 MOD 5 = 1

3.Store the number 26 in array location 1.

If you want to find the number 26 in the array again, hashing calculates the same hash function (which calculates to 1), telling the program to look for the number 26 in the first array location. Because hashing doesn’t require searching every item in a list, hashing can prove much faster than sequential and binary searching, especially for long lists of data.

Dealing with collisions

Ideally, a hash function calculates a unique value. But a hash function probably calculates the same value for different items. For example, calculating a hash function by using MOD 5 — dividing a number by 5 and using the remainder as the hash function — returns a value of 2 for the numbers 7 and 32.

If you’re storing large numbers of items, more than one item is likely to share the same hash function. If two items share the same hash function, programmers say that a collision is occurring. To handle collisions, you need to create a data structure that can store multiple items with the same hash function. Figure 21-3 shows two data structures to handle collisions: a two-dimensional array and a one-dimensional array of linked lists. (Chapter 16 contains information about creating and using two-dimensional arrays. Chapter 18 contains information about linked lists. Remember that Liberty BASIC can’t create linked lists, but other languages — such as C/C++, C#, Pascal, and Java — can create linked lists.)

In Figure 21-3, the numbers 7 and 32 share the same hash function (which is 2). Because the number 7 is in array element number 2 (which represents the hash function 2), the program must store the number 32 beneath the number 7 underneath the same hash function of 2. Both a two-dimensional array and an array of linked lists enables the program to store multiple numbers underneath the same hash function location of 2.

294 Part V: Algorithms: Telling the Computer What to Do

 

 

 

 

Array location

 

 

 

 

Array location

 

 

 

 

0

1

2

3

 

4

 

0

1

2

3

 

4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

26

7

 

 

 

 

 

 

 

26

7

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

32

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

32

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Figure 21-3:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A two-dimensional array

 

 

 

 

 

 

 

 

 

 

A two-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

dimensional

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

array or an

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

array of

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

linked lists

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

can handle

 

 

 

 

 

 

 

 

 

An array of linked lists

collisions.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Ideally, you want to use a data structure that can expand or shrink to handle items sharing identical hash functions.

Searching by using a hash function

After you store items in a data structure by using hashing, you can search for any of those items by calculating a hash function for the item that you want to find. Then you look for the item in the data structure that stores items sharing the same hash function.

If every item has a unique hash function, the hash function can pinpoint the exact location of an item extremely quickly. Even if multiple items share the same hash function (refer to Figure 21-2), hashing now limits the search to a smaller list of items with identical hash functions. As a result, hashing usually can search faster than sequential or binary searching.

The following Liberty BASIC program creates five random integers — in this case, 41, 50, 57, 75, 67 — and stores them in a two-dimensional array, as shown in Figure 21-4.

If you create an array in Liberty BASIC, the first array location is number one, the second location is number two, and so on. So to mimic an array that holds a hash function of 0 through 4, the following program calculates a hash function and adds one (1) to its value:

Chapter 21: Searching 295

Figure 21-4:

A twodimensional array can store values that its hash function organizes.

Array location 0 1 2 3 4

50 41 57

75 67

A two-dimensional array

MaxSize = 5

REDIM MyArray(MaxSize, MaxSize)

FOR I = 1 TO MaxSize ‘ Vertical

FOR J = 1 TO MaxSize ‘ Horizontal

MyArray(I, J) = 0

NEXT J

NEXT I

Count = 1

FOR J = 1 TO MaxSize StopNow = 0

StoreMe = INT(RND(1) * 100) + 1

HashValue = StoreMe - (INT(StoreMe / 5) * 5) + 1 WHILE StopNow <> 1

IF MyArray(Count, HashValue) = 0 THEN

MyArray(Count, HashValue) = StoreMe StopNow = 1

ELSE

Count = Count + 1 END IF

WEND

PRINT StoreMe; SPACE$(1); NEXT J

PRINT

PRINT

FOR I = 1 TO MaxSize ‘ Vertical

FOR J = 1 TO MaxSize ‘ Horizontal

PRINT MyArray(I, J); SPACE$(5);

NEXT J

PRINT

NEXT I

END

296 Part V: Algorithms: Telling the Computer What to Do

The preceding program uses hashing to store five random numbers in a twodimensional array, as follows:

1.The first and second lines define the variable MaxSize and set its value to 5. Then they create the two-dimensional array MyArray that can hold five by five (25) integers.

2.The third through seventh lines fill MyArray with zeroes. Notice that the I variable defines the row number and the J variable defines the column numbers in Figure 21-4. (Remember: Because Liberty BASIC arrays always start counting with the number one, the array location that 0 identifies in Figure 21-4 is actually the array location number one in Liberty BASIC; the array location that 1 identifies is actually the array location number two — and so on.)

3.The eighth line creates the variable Count and sets its value to one.

4.The ninth line starts a FOR-NEXT loop that starts the hash algorithm to store an integer in the two-dimensional array.

5.The tenth line creates the variable StopNow and sets its value to zero.

6.The eleventh line creates a random number from 1 to 100 and stores it in the variable StoreMe.

7.The twelfth line calculates a hash function and stores it in the variable HashValue. HashValue is always one of five numbers: 1, 2, 3, 4, or 5.

8.The thirteenth line starts a WHILE-WEND loop that tries to determine where to store the StoreMe number in MyArray.

9.The fourteenth through nineteenth lines try to store the StoreMe number in the first row (which the Count variable represents) of MyArray in the column that the value of HashValue represents. If a number is already in the array location, these instructions try to store the number in the next row down until they succeed in finding an empty spot in the array.

10.The twentieth line ends the WHILE-WEND loop that starts in the thirteenth line.

11.The twenty-first line prints the five random numbers that the program creates.

12.The twenty-second line represents the end of the FOR-NEXT loop that starts in the ninth line.

13.The twenty-third and twenty-fourth lines print two blank lines.

14.The twenty-fifth through thirtieth lines print the entire array so that you can see how hashing sorts the five random numbers that the program creates.

15.The thirty-first line marks the end of the program.