Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 8

The implementation of dequeue()begins by locating the item to be returned, which is simple; it is already at index 0 in the list. Although this is the item you return, it is not necessarily the item you delete from the underlying list. The only item that ever gets deleted is the one at the very end of the list. If there is only one item in the queue, the item you return is the one you delete; in all other cases, you need to exchange the last item with the first item and sink it down through the heap to reestablish the heap condition.

sink() is unfortunately a lot more complex than swim() because there are a couple of interesting cases to consider. The item in question might have no children, or it may only have one child. If it has a right child, it must also have a left child, so having only a right child is one case we can ignore.

You start by calculating the index of your children. If these indices fall outside the valid range of items in the queue, you are done, as the item cannot sink any lower. Next, figure out which of your children (you have at least one child now) is the larger. It is the larger of the children that you exchange with the item in question if required. You start by assuming that the left child is the larger, and change your assumption to the right child only if you have a right child and it is larger than the left child.

At this point, you know which of your children is the larger one. All that remains is to compare the item itself with the larger of the children. If the child is larger, swap them and recursively call sink() to continue the process down the heap until the heap condition is restored.

The heap-ordered implementation of the priority queue is the final version included in this chapter. It is interesting because it adds items to the queue and removes them from the queue in a manner that is O(log N). Any algorithm that is proportional to the depth of a binary tree of the elements in question has this characteristic, and has a great advantage over those that treat the items in a long linear fashion. In the next section, we will compare our three priority queue implementations to see how they stack up against each other.

Comparing the Priority Queue

Implementations

As in previous chapters, we will opt for a practical, rather than a theoretical, comparison of the various implementations. Once again, we will use CallCountingComparator to gain an understanding of how much effort the various implementations take to achieve their results. Remember not to take this single dimension of evaluation as total or definitive. Rather, use it to gain insight and to inspire further investigation. Many theoretically sound comparisons are available, so check Appendix B for resources if you’re interested in delving further into that area.

As when comparing sorting algorithms in the previous two chapters, this chapter considers best, worst, and average cases for each of the three priority queue implementations. We perform a mixed set of operations to add and remove items from the queues under test. The best case consists of adding data in sorted order. The worst case consists of adding the data in reverse sorted order, and the average case consists of adding randomly generated data to the queue.

194

Priority Queues

The basic structure of the test driver class is shown here. You declare a constant to control the size of the tests, and then declare the lists for each of the best, worst, and average cases. Finally, you declare CallCountingComparator to collect the statistics:

public class PriorityQueueCallCountingTest extends TestCase { private static final int TEST_SIZE = 1000;

private final List _sortedList = new ArrayList(TEST_SIZE); private final List _reverseList = new ArrayList(TEST_SIZE);

private final List _randomList = new ArrayList(TEST_SIZE);

private CallCountingComparator _comparator;

...

}

The setUp() method instantiates the comparator and fills the three lists with the appropriate test data:

protected void setUp() throws Exception { super.setUp();

_comparator = new CallCountingComparator(NaturalComparator.INSTANCE);

for (int i = 1; i < TEST_SIZE; ++i) { _sortedList.add(new Integer(i));

}

for (int i = TEST_SIZE; i > 0; --i) { _reverseList.add(new Integer(i));

}

for (int i = 1; i < TEST_SIZE; ++i) {

_randomList.add(new Integer((int)(TEST_SIZE * Math.random())));

}

}

Next are the three worst-case scenarios, all of which delegate to the runScenario() method:

public void testWorstCaseUnsortedList() {

runScenario(new UnsortedListPriorityQueue(_comparator), _reverseList);

}

public void testWorstCaseSortedList() {

runScenario(new SortedListPriorityQueue(_comparator), _reverseList);

}

public void testWorstCaseHeapOrderedList() {

runScenario(new HeapOrderedListPriorityQueue(_comparator), _reverseList);

}

195

Chapter 8

Now you define the three best-case scenarios, one for each of the priority queue implementations:

public void testBestCaseUnsortedList() {

runScenario(new UnsortedListPriorityQueue(_comparator), _sortedList);

}

public void testBestCaseSortedList() {

runScenario(new SortedListPriorityQueue(_comparator), _sortedList);

}

public void testBestCaseHeapOrderedList() {

runScenario(new HeapOrderedListPriorityQueue(_comparator), _sortedList);

}

Finally, you have the three average-case scenarios:

public void testAverageCaseUnsortedList() {

runScenario(new UnsortedListPriorityQueue(_comparator), _randomList);

}

public void testAverageCaseSortedList() {

runScenario(new SortedListPriorityQueue(_comparator), _randomList);

}

public void testAverageCaseHeapOrderedList() {

runScenario(new HeapOrderedListPriorityQueue(_comparator), _randomList);

}

The runScenario() method is shown next. It is provided with two parameters: a queue to test and a list of input data. Its approach is to iterate through the input data, adding the elements to the queue under test. However, every 100 items, it stops and takes 25 items off the queue. These numbers are entirely arbitrary and serve only to give you a mixture of both the enqueue() and dequeue() operations to better simulate how priority queues are used in practice. Before the method finishes, it completely drains the queue and calls reportCalls() to output a line summarizing the test:

private void runScenario(Queue queue, List input) { int i = 0;

Iterator iterator = input.iterator(); iterator.first();

while (!iterator.isDone()) { ++i;

queue.enqueue(iterator.current()); if (i % 100 == 0) {

for (int j = 0; j < 25; ++ j) { queue.dequeue();

}

}

iterator.next();

}

while (!queue.isEmpty()) { queue.dequeue();

}

reportCalls();

}

196

Priority Queues

The final method in the driver program is a simple dump of the number of comparisons made during the test run:

private void reportCalls() {

int callCount = _comparator.getCallCount();

System.out.println(getName() + “: “ + callCount + “ calls”);

}

The following results of the comparison of the three priority queue implementations are for the worst case:

testWorstCaseUnsortedList: 387000 calls testWorstCaseSortedList: 387000 calls testWorstCaseHeapOrderedList: 15286 calls

The heap-ordered version is a clear winner here, with no difference at all between the two simpler versions. Next are the best-case results:

testBestCaseUnsortedList:

386226

calls

testBestCaseSortedList:

998

calls

testBestCaseHeapOrderedList: 22684

calls

That’s interesting, although if you recall that insertion sort is excellent on already sorted data, you’ll understand why these results show the sorted list version doing the least amount of work. The brute-force version is almost no different, and the heap version is performing about 50 percent more operations by this measure.

Finally, look at the results that most indicate what is likely to happen in the real world:

testAverageCaseUnsortedList: 386226 calls testAverageCaseSortedList: 153172 calls testAverageCaseHeapOrderedList: 17324 calls

You can see that the sorted list version is doing about half as many comparisons as the brute-force version, whereas the heap-ordered implementation remains a clear leader again. The implementation based on the heap structure is clearly the most effective based on this simple test; however, whether to use it depends on your specific circumstances. You need to balance the extra complexity with the extra efficiency to determine which implementation suits your application.

197

Chapter 8

Summar y

This chapter covered a few key points:

You learned about a new data structure called a priority queue. This data structure is a more general form of the Queue that was covered in Chapter 4.

A priority queue provides access to the largest item in the queue at any given time. A comparator is used to determine the relative size of the items in the queue.

You implemented three different versions of a priority queue. The first simply added items to an underlying list and did a full scan of the items when required to return the largest. The second was an improvement on this, in that it kept the items in sorted order at all times, allowing rapid retrieval of the largest item at any time. The final version used a list arranged as a heap structure to achieve excellent performance for both add and remove operations. A thorough explanation of heaps and how they work was provided.

The three implementations were compared and contrasted using a practical, rather than a theoretical, approach.

Exercises

To test your understanding of priority queues, try the following exercises:

1.Use a priority queue to implement a Stack.

2.Use a priority queue to implement a FIFO Queue.

3.Use a priority queue to implement a ListSorter.

4.Write a priority queue that provides access to the smallest item, rather than the largest.

198

9

Binar y Searching and

Inser tion

So far, this book has discussed basic structures for storing and sorting your data, but it has only touched on some rudimentary approaches to searching the data.

Modern software applications often deal with enormous amounts of data, and being able to search that data efficiently is important. Being able to locate a particular patient’s record quickly among tens of thousands of others can make or break an application. From now on, the chapters in this book focus largely on algorithms and data structures designed specifically for the efficient storage and searching of data.

Binary searching is one of the most basic techniques for efficiently searching through data in memory. Binary insertion is a variation on binary searching that enables you to maintain the data such that it can be efficiently searched.

This chapter discusses the following:

How to perform binary searching

Implementing a binary search using iteration and recursion

Comparing binary searching with other search techniques

Comparing binary insertion with other sorting techniques

Understanding Binar y Searching

Binary searching is a technique for locating items in a sorted list. A binary search takes advantage of certain characteristics of sorted lists that a simple linear search doesn’t. Indeed, whereas a bruteforce linear search runs in O(N) time, a binary search runs in O(log N), assuming the data to be searched is sorted.

Chapter 9

As you saw in Chapter 2, the simplest way to search an unordered list is to start at the first item and continue forward until you either find a match or run out of items. This leads to an average-case running time of O(N). The actual average running time is around N/2; that is, you would need to traverse, on average, half the items in the list before you found the one for which you were looking. For data that is sorted, however, you can do a lot better.

Binary searching gets its name from the fact that you continually divide the data in half, progressively narrowing down the search space until you find a match or there are no more items to process.

Take, for example, an English dictionary. If you were asked to look up the word algorithm, where would you start? You would probably open the book at the first page and start flipping through, one page at a time.

If you were asked to find the word lama, you would probably start somewhere towards the middle, but why? Why not start at the end of the book? The reason, of course, is because you know a dictionary is arranged in ascending alphabetical order (A–Z), so you can make a reasonably good guess as to where you should begin looking. When searching for lama, for example, if you open the book at mandarin, then you would know you had gone too far and you’d skip back a few pages. If, conversely, you first encounter kangaroo, then you would know you hadn’t gone far enough and you would skip forwards some pages. Upon discovering that you are not yet at the page containing the word for which you are searching, the next question is, how far should you skip forwards or backwards?

In the specific example just given, you can probably guess how far to skip based on your knowledge of the language and the relative number of words beginning with each letter of the alphabet. But what if you had no idea about the contents? What if all you knew about the book was that it was sorted?

A binary search involves continually dividing the data in half — hence, binary — and searching the lower or upper half as appropriate. The steps involved in performing a binary search can be summarized as follows:

1.Start in the middle of the list.

2.Compare the search key with the item at the current location.

3.If the search key is at the current location, then you are done.

4.If the search key is less than the item at the current location, then it must be in the lower half of the data (if it exists at all) so divide the list in two and go to step 1, using the lower half.

5.Otherwise, it must be in the upper half of the list (again, if it exists at all), so go to step 1, using the upper half.

The following example demonstrates how to search for the letter K in the list of letters shown in Figure 9-1. This list contains nine letters in sorted order.

0

1

2

3

4

5

6

7

8

A

D

F

H

I

K

L

M

P

 

 

 

 

 

 

 

 

 

Figure 9-1: List containing nine letters in ascending sorted order.

200

Binary Searching and Insertion

You start the search in the middle of the list by comparing the search key with the letter I, as shown in Figure 9-2.

0

1

2

3

4

5

6

7

8

A

D

F

H

I

K

L

M

P

 

 

 

 

 

 

 

 

 

Figure 9-2: A search always begins with the middle item.

Because you haven’t yet found a match, you divide the list in half. Then, because the search key, K, sorts higher than the current item, you concentrate your efforts on the upper half (see Figure 9-3).

0

1

2

3

4

5

6

7

8

A

D

F

H

I

K

L

M

P

 

 

 

 

 

 

 

 

 

Figure 9-3: The search key must exist somewhere in the upper half of the list.

The new list consists of four letters: K, L, M, and P — an even number. Finding the middle item in a list containing an even number of items is clearly nonsensical. Luckily, though, it doesn’t really matter whether the two halves are strictly equal, so you can arbitrarily choose one of the two middle items: L or M. This example uses L (see Figure 9-4).

0

1

2

3

4

5

6

7

8

A

D

F

H

I

K

L

M

P

 

 

 

 

 

 

 

 

 

Figure 9-4: The search continues with the “middle” item.

Now you compare the search key with the chosen middle item — L. Once again, it’s not the item you are looking for, so you divide the list in two and try again. This time, however, the search key sorts lower than the current item — K sorts before L — so you can assume that it will be found in the lower half, if indeed it exists at all.

Figure 9-5 shows that the search finally narrows to only one item, K, which in this case is the one you were looking for.

0

1

2

3

4

5

6

7

8

A

D

F

H

I

K

L

M

P

 

 

 

 

 

 

 

 

 

Figure 9-5: The search finally narrows to only one item.

201

Chapter 9

The search is complete and you managed to locate the key in only three comparisons: the two intermediary comparisons with I and L, and the final comparison that resulted in the match. The same search using a brute-force approach would have taken six comparisons: first with the A, then D, F, H, I, and finally K.

You could argue that the search key used makes binary searching seem more efficient than it would have if the key had been at the start of the list. For example, if the letter A had been used as the search key, the brute-force approach would have found it in only one comparison, whereas the binary search would have taken four!

So it is fair to say that, in some very limited case, a brute-force search will do better than a binary search; however, in most cases, a binary search will do much better — a fact that is demonstrated concretely later in the chapter.

Binary Search Approaches

Now that you’ve observed how the algorithm works in principle, it’s time to turn words into code. This section demonstrates two binary search approaches: one involving recursion and another using iteration. Each one has the same general performance characteristics, but you will see that one seems a little more intuitive than the other.

A List Searcher

In the following Try It Out, you define an interface that is common to the two approaches (recursive and iterative) you can take when implementing a binary search. This will enable you to plug in the different implementations for testing and performance evaluation.

A list searcher enables you to search a list (in this case, a sorted list) for a specified key via a single method, search(). This method uses a comparator to determine whether the search key matches any of the items in the list. If the key is found, search() returns its position (0, 1, 2, . . .). If the item is not found, search() returns a negative value corresponding to the point at which it would have been found had it existed. At this point, you’re probably wondering how you can return the position information and at the same time indicate that the search key wasn’t found.

Part of the answer is to use a negative value. That way, you can use positive return values to indicate searches that were successful, and negative values to indicate unsuccessful searches. However, if you simply take the negative value of the position (for example, 1 becomes –1, 2 becomes –2, and so on), what do you do with the first position in the list, 0? A value of –0 doesn’t make sense.

The trick is to alter the return value so that a position of 0 becomes –1, 1 becomes –2, and so on. In this way, you can encode both the position and the fact that the search key wasn’t found.

Try It Out

Creating the List Searcher Interface

The first thing you need to do is to create the actual Java interface as follows:

package com.wrox.algorithms.bsearch;

import com.wrox.algorithms.lists.List;

202

Binary Searching and Insertion

public interface ListSearcher {

public int search(List list, Object key);

}

How It Works

The interface has one method corresponding to the single search() operation discussed earlier. This method takes a list to search and a key to look for as arguments, and returns an integer corresponding to the position within the list.

Notice that you don’t pass a comparator to search() even though you will need one. Instead, it is assumed that any searcher will have been constructed with a comparator already. This separation of concerns enables a list searcher to be passed around without code needing to know how the ordering is being performed. This should become more obvious when you write the actual test code.

Try It Out

Writing the Tests

Now that you have an interface to work with, you write the tests. We’ve already identified at least two possible list searcher implementations, iterative and recursive, and you will end up with one more before the chapter is finished. We’ll first create a suite of tests that any list searcher must satisfy. This way, you won’t have to rewrite the tests for each different implementation.

Start by creating the test class itself:

package com.wrox.algorithms.bsearch;

import com.wrox.algorithms.lists.ArrayList; import com.wrox.algorithms.lists.List;

import com.wrox.algorithms.sorting.Comparator; import com.wrox.algorithms.sorting.NaturalComparator; import junit.framework.TestCase;

public abstract class AbstractListSearcherTestCase extends TestCase { private static final Object[] VALUES = {“B”, “C”, “D”, “F”, “H”, “I”,

“J”, “K”, “L”, “M”, “P”, “Q”};

private ListSearcher _searcher; private List _list;

protected abstract ListSearcher createSearcher(Comparator comparator);

protected void setUp() throws Exception { super.setUp();

_searcher = createSearcher(NaturalComparator.INSTANCE); _list = new ArrayList(VALUES);

}

}

203