Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 6

Figure 6-10 shows your hand of cards, face down. They are unsorted, although they may already be in just the order you need. (Even if they are, the algorithm will need to run its course.)

Figure 6-10: A hand of five cards.

You begin by turning over the first card. Nothing could be easier than sorting a single card, so you hold it in your hand on its own. In this case, it’s the seven of diamonds. Figure 6-11 shows the current situation: one sorted card and four still unsorted cards lying face down.

7 D

Figure 6-11: The first card is sorted by itself.

Pick up the second card. It’s the jack of spades. Because you know spades come before diamonds, you insert it into your hand to the left of your current card. Figure 6-12 shows the situation now.

134

Basic Sorting

J 7

S D

Figure 6-12: The second card is inserted before the first.

Pick up the third card. In the example it’s the ace of clubs. Looking at your two already sorted cards, this new one needs to be inserted between them. Figure 6-13 shows the state of your hand now.

J A 7

S C D

Figure 6-13: The third card is inserted in the middle.

An insertion sort works by dividing the data into two groups: already sorted items and unsorted items. Initially, the sorted group is empty and the unsorted group contains all the items. One by one, an item is taken from the unsorted group and inserted at the appropriate position in the growing group of sorted items. Eventually, all of the items are in the sorted group and the unsorted group is empty. Figure 6-14 shows what happens when you pick up the final two cards.

135

Chapter 6

J

A

7

Q

S

C

D

H

J

A

9

7

Q

S

C

C

D

H

Figure 6-14: The last two cards are inserted.

In the next Try It Out, you start by creating a test case for the insertion sort algorithm. Then you implement it to complete the three basic sorting algorithms for this chapter.

Try It Out

Testing InsertionSortListSorter

In the same way as you did for bubble sort and selection sort, you extend the AbstractListSorter test case for the insertion sort algorithm, as shown here:

package com.wrox.algorithms.sorting;

public class InsertionSortListSorterTest extends AbstractListSorterTest { protected ListSorter createListSorter(Comparator comparator) {

return new InsertionSortListSorter(comparator);

}

}

How It Works

Although you implemented a single method with a single line of code, the key point here is that you are extending the AbstractListSorterTest class described earlier in this chapter. The abstract class provides the test data and several test methods; all you need to do is provide the ListSorter implementation for these tests to use, and that’s what you have done here.

Try It Out

Implementing InsertionSortListSorter

By now you will be familiar with the basic structure of the sorting algorithm implementations. Use the following class declaration and constructor for InsertionSortListSorter:

136

Basic Sorting

package com.wrox.algorithms.sorting;

import com.wrox.algorithms.lists.List; import com.wrox.algorithms.lists.LinkedList;

import com.wrox.algorithms.iteration.Iterator;

public class InsertionSortListSorter implements ListSorter { private final Comparator _comparator;

public InsertionSortListSorter(Comparator comparator) { assert comparator != null : “comparator cannot be null”; _comparator = comparator;

}

...

}

How It Works

The implementation of the sort() method is very different from the two algorithms you have seen earlier in the chapter. This algorithm does not sort the objects in place by rearranging the order of the list it is given; rather, this algorithm creates a new, empty list and inserts each item from the original list into the result list in sorted order.

In addition, the original list is processed using an iterator instead of accessing the items by index because you have no need for direct access to the items in the original list. You simply process each one in turn, which is the natural idiom for an iterator:

public List sort(List list) {

assert list != null : “list cannot be null”;

final List result = new LinkedList();

Iterator it = list.iterator();

for (it.first(); !it.isDone(); it.next()) { int slot = result.size();

while (slot > 0) {

if (_comparator.compare(it.current(), result.get(slot - 1)) >= 0) { break;

}

--slot;

}

result.insert(slot, it.current());

}

return result;

}

Finally, notice that the inner loop is a while loop, rather than a for loop. Its task is to find the right position in the result list to insert the next item. After it finds the right position (or falls off the end of the result list), it exits the inner loop. The current item is then inserted into the result list. At all times, the result list is entirely sorted; each item is placed into position relative to those items already in the list, thereby maintaining the overall sorted sequence. This example uses a LinkedList for the result list because it is better suited to insertion operations.

137

Chapter 6

Note also that the algorithm searches backwards through the result list looking for the right position, rather than forwards. This is a big advantage when it comes to sorting already sorted or nearly sorted objects, as demonstrated in the section “Comparing the Basic Sorting Algorithms,” later in this chapter. It is also the reason why this algorithm is stable, which is the subject of the next section.

Understanding Stability

Some sorting algorithms share an interesting characteristic called stability. To illustrate this concept, examine the list of people sorted by their first names shown in Table 6-1.

Table 6-1: List Sorted by First Names

First Name

Last Name

 

 

Albert

Smith

Brian

Jackson

David

Barnes

John

Smith

John

Wilson

Mary

Smith

Tom

Barnes

Vince

De Marco

Walter

Clarke

 

 

Now imagine that you want to sort the same people by their last names. The list in Table 6-1 contains some common last names, such as Smith and Barnes. What would you expect to happen to the order of people with the same last name? You might expect that people with the same last name would be in the same relative order as the original list — that is, sorted by first name within the same last name group. This is stability. If a sorting algorithm maintains the relative order of items with a common sort key, it is said to be a stable algorithm.

Table 6-2 shows a stable last name sort of the people in this example.

Table 6-2: Stable Last Name Sort of Table 6-1

First Name

Last Name

David

Barnes

Tom

Barnes

Walter

Clarke

Vince

De Marco

138

Basic Sorting

First Name

Last Name

Brian

Jackson

Albert

Smith

John

Smith

Mary

Smith

John

Wilson

Two of the three implementations discussed so far — bubble sort and insertion sort — are stable. It is simple to make the selection sort implementation stable. Some of the more advanced sorting algorithms in later chapters may be faster than the three you have seen here, but they often fail to preserve stability, and you should take this into account if it is important to your particular application.

Comparing the Basic Sor ting Algorithms

Now that you have seen a number of sorting algorithms in action, and how you can easily plug in any implementation that supports the ListSorter interface, you might be wondering when to use which algorithm. This section compares each algorithm using a practical approach, rather than a theoretical or mathematical approach. This is not intended to give you a definitive list of criteria for selecting an algorithm; rather, it provides an example of how comparative analysis can be put to use when you need to make implementation choices in the systems you build.

Recall from the introduction to this chapter that sorting algorithms perform two basic steps many times: comparing items and moving items around. This discussion assesses the behavior of the three sorting algorithms with regard to the first of these operations and puts the algorithms through their paces using much larger data sets than you used when implementing them. This is important because any divergence in their relative performance will be clearer on larger sets of data. It is also important that each algorithm receive input data in varying arrangements, as follows:

Already sorted (the best case)

Already sorted but in reverse order from our desired order (the worst case)

Random order (the average case)

If you give each algorithm the same set of input data for each of these cases, then you can make an informed decision about the relative merits in a given real-world situation. The first task is to gather the information about how many times comparisons are made.

CallCountingListComparator

All comparisons in the sorting algorithms are performed by their respective comparator. To count the number of times the comparator’s compare() method is called, you could alter the code for each comparator to specify that it remember the number of calls. Alternatively, you could make all the comparators extend a common base class and put the call counting behavior there. However, to re-use much of

139

Chapter 6

the code you’ve already written, you can add the call counting behavior by decorating any other comparator you already have, as you did with the ReverseComparator:

public final class CallCountingComparator implements Comparator { private final Comparator _comparator;

private int _callCount;

public CallCountingComparator(Comparator comparator) { assert comparator != null : “comparator cannot be null”;

_comparator = comparator; _callCount = 0;

}

public int compare(Object left, Object right) { ++_callCount;

return _comparator.compare(left, right);

}

public int getCallCount() { return _callCount;

}

}

Just like the ReverseComparator, the CallCountingComparator accepts any other Comparator in its constructor. The CallCountingComparator delegates the actual comparison check to this underlying comparator after incrementing the call count. All that is left is to provide the getCallCount() method to retrieve the call count when the sorting is complete.

With the help of the CallCountingComparator, you can now build a program to drive each of the sorting algorithms with best case, worst case, and average case test data and collect the results.

ListSorterCallCountingTest

Although this is not actually a unit test, the program is written to drive the algorithms as a JUnit test case because you need to do some setup and run several discrete scenarios for each algorithm. You begin by creating the test class, a constant for the size of the lists of data, and instance variables for the best, worst, and average case data sets. You also need an instance variable that holds a reference to the CallCountingComparator created in the previous section:

package com.wrox.algorithms.sorting;

import junit.framework.TestCase; import com.wrox.algorithms.lists.List;

import com.wrox.algorithms.lists.ArrayList;

public class ListSorterCallCountingTest extends TestCase { private static final int TEST_SIZE = 1000;

private final List _sortedArrayList = new ArrayList(TEST_SIZE); private final List _reverseArrayList = new ArrayList(TEST_SIZE);

140

Basic Sorting

private final List _randomArrayList = new ArrayList(TEST_SIZE);

private CallCountingComparator _comparator;

...

}

Next you set up the test data. For the best and worst cases, you fill the respective lists with Integer objects with values ranging between 1 and 1,000. For the average case, you generate random numbers within this same range. You also create the call counting comparator by wrapping a

NaturalComparator. This works because java.lang.Integer supports the Comparable interface, just as the strings used in earlier examples do:

protected void setUp() throws Exception {

_comparator = new CallCountingComparator(NaturalComparator.INSTANCE);

for (int i = 1; i < TEST_SIZE; ++i) { _sortedArrayList.add(new Integer(i));

}

for (int i = TEST_SIZE; i > 0; --i) { _reverseArrayList.add(new Integer(i));

}

for (int i = 1; i < TEST_SIZE; ++i) {

_randomArrayList.add(new Integer((int)(TEST_SIZE * Math.random())));

}

}

To run each algorithm in the worst case, create the relevant Listsorter implementation and use it to sort the reverse-sorted list created in the setUp() method. The following code has a method to do this for each of our three algorithms. You might wonder how this works. If the reverse-sorted list is an instance variable and you first sort it using the bubble sort algorithm, how can it still be reverse-sorted when the next algorithm starts? This is one of the reasons you use JUnit to structure this driver program. JUnit creates a new instance of the driver class for each of the test methods, so each method in effect has its own copy of the reverse-sorted list, and setUp() will be run for each of them independently. This keeps the tests from interfering with one another:

public void testWorstCaseBubblesort() {

new BubblesortListSorter(_comparator).sort(_reverseArrayList); reportCalls(_comparator.getCallCount());

}

public void testWorstCaseSelectionSort() {

new SelectionSortListSorter(_comparator).sort(_reverseArrayList);

reportCalls(_comparator.getCallCount());

}

public void testWorstCaseInsertionSort() {

new InsertionSortListSorter(_comparator).sort(_reverseArrayList); reportCalls(_comparator.getCallCount());

}

141

Chapter 6

To produce its output, each of these methods uses the reportCalls() method, described later in this section. Next are three similar methods for the best-case scenario, in which each algorithm is used to sort the already sorted list created in setUp():

public void testBestCaseBubblesort() {

new BubblesortListSorter(_comparator).sort(_sortedArrayList); reportCalls(_comparator.getCallCount());

}

public void testBestCaseSelectionSort() {

new SelectionSortListSorter(_comparator).sort(_sortedArrayList);

reportCalls(_comparator.getCallCount());

}

public void testBestCaseInsertionSort() {

new InsertionSortListSorter(_comparator).sort(_sortedArrayList); reportCalls(_comparator.getCallCount());

}

You create three more methods to test the average case using the randomly generated list of numbers:

public void testAverageCaseBubblesort() {

new BubblesortListSorter(_comparator).sort(_randomArrayList); reportCalls(_comparator.getCallCount());

}

public void testAverageCaseSelectionSort() {

new SelectionSortListSorter(_comparator).sort(_randomArrayList);

reportCalls(_comparator.getCallCount());

}

public void testAverageCaseInsertionSort() {

new InsertionSortListSorter(_comparator).sort(_randomArrayList); reportCalls(_comparator.getCallCount());

}

Lastly, you define the reportCalls() method that produces the output for each scenario defined previously:

private void reportCalls(int callCount) { System.out.println(getName() + “: “ + callCount + “ calls”);

}

This simple code contains one subtle point of interest. It uses the getName() method provided by the JUnit TestCase superclass to print the name of the scenario itself. The output produced by the program for the worst case is shown here:

testWorstCaseBubblesort: 499500 calls testWorstCaseSelectionSort: 499500 calls testWorstCaseInsertionSort: 499500 calls

142

Basic Sorting

As you can see, all three algorithms do exactly the same number of comparisons when tasked with sorting a completely reverse-sorted list! Don’t take this to mean that they will always take the same amount of time to run; you are not measuring speed here. Always be careful to avoid jumping to conclusions based on simple statistics like those here. That said, this is a very interesting thing to know about these algorithms in this particular scenario.

The following numbers are produced for the best case:

testBestCaseBubblesort: 498501 calls testBestCaseSelectionSort: 498501 calls testBestCaseInsertionSort: 998 calls

Once again, the results are interesting. The bubble and selection sorts do the same number of comparisons, but the insertion sort does dramatically fewer indeed. You might want to review the insertion sort implementation now to see why this is the case.

The following numbers are produced in the average case:

testAverageCaseBubblesort: 498501 calls testAverageCaseSelectionSort: 498501 calls testAverageCaseInsertionSort: 262095 calls

Once again, the bubble and selection sorts performed the same number of comparisons, and the insertion sort required about half the number of comparisons to complete its job.

Understanding the Algorithm Comparison

You can draw a few conclusions from the comparative analysis just performed, but you must be careful not to draw too many. To really understand the difference in their behavior, you would need to run additional scenarios, such as the following:

Quantifying how many objects are moved during the sort

Using both LinkedList and ArrayList implementations for the test data

Measuring running times for each scenario

Bearing the limitations of the analysis in mind, you can make the following observations:

Bubble and selection sorts always do exactly the same number of comparisons.

The number of comparisons required by the bubble and selection sorts is independent of the state of the input data.

The number of comparisons required by an insertion sort is highly sensitive to the state of the input data. At worst, it requires as many comparisons as the other algorithms. At best, it requires fewer comparisons than the number of items in the input data.

Perhaps the most important point is that bubble and selection sorts are insensitive to the state of the input data. You can, therefore, consider them “brute force” algorithms, whereas the insertion sort is adaptive, because it does less work if less work is required. This is the main reason why the insertion sort tends to be favored over the other two algorithms in practice.

143