Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 13

Maps are also known as associative arrays, dictionaries, indexes, and lookup tables.

In general, a map makes no guarantee of the iteration order.

Three common map implementations are the list map, the hash map, and the tree map.

List-based maps are useful for relatively small data sets, as operations run in O(N).

Hash table–based map operations run in O(1) with random iteration order.

Binary search tree–based sets provide O(log N) performance as well as predictable iteration order.

Exercises

1.Create an iterator that returns only the keys contained within a map.

2.Create an iterator that returns only the values contained within a map.

3.Create a set implementation that uses a map as the underlying storage mechanism for the values.

4.Create an empty map that throws UnsupportedOperationException anytime an attempt is made to modify it.

344

14

Ternar y Search Trees

So far, you’ve learned a number of ways to store data — from simple, unordered lists to sorted lists, binary search trees, and even hash tables. All of these are great for storing objects of arbitrary types. You’re about to learn one last data structure for storing strings. It’s not only fast to search, it also enables you to perform some quite different and interesting forms of searching.

This chapter discusses the following topics:

General properties of ternary search trees

How words are stored

How words can be looked up

How ternary search trees can be used for creating a dictionary

How to implement a simple application to help solve crossword puzzles

Understanding Ternar y Search Trees

Ternary search trees are specialized structures for storing and retrieving strings. Like a binary search tree, each node holds a reference to the smaller and larger values. However, unlike a binary search tree, a ternary search tree doesn’t hold the entire value in each node. Instead, a node holds a single letter from a word, and another reference — hence ternary — to a subtree of nodes containing any letters that follow it in the word.

Figure 14-1 shows how you could store “cup,” “ape,” “bat,” “map,” and “man” in a ternary search tree. Notice that we have depicted the siblings — smaller and larger letters — as solid links, and the children — letters that follow — as dashed links.

Chapter 14

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-1: A sample ternary search tree with “c” as the root. The highlighted nodes trace out a path for “bat.”

Although it doesn’t look much like one, if you take out the child links at the first level, then you’d have a perfectly valid binary search tree containing the values a, b, c, and m. You’ve just added an extra reference from each node to its child.

At each level, just like a binary search tree, each left node is smaller — b is larger than a, which is smaller than c, which is smaller than m — and each child node represents the next letter in a word — a comes after b, followed by t, in “bat.”

Searching for a Word

At each level in the tree, you perform a binary search starting with the first letter of the word for which you are looking. Just like searching a binary search tree, you start at the root and follow links left and right as appropriate. Once you find a matching node, you move down one level to its child and start again, this time with the second letter of the word. This continues until you either find all the letters — in which case, you have a match — or you run out of nodes.

To get a better idea of how a ternary search tree works, the following example searches for the word “bat” in the tree shown in Figure 14-1.

The search starts at the root node, c, looking for the first letter of the search word, a, as shown in Figure 14-2.

Because you don’t yet have a match, you need to visit one of the siblings — if there are any — and try again. In this case, your search letter, a, sorts before the current node, c, so you try the left sibling (see Figure 14-3).

346

Ternary Search Trees

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-2: A search for the first letter at the root of the tree.

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-3: The letter b sorts before c, so you follow the left link.

Next, you compare your search letter with the letter at the next node, but again you find a mismatch. However, because b sorts after a, this time you need to follow a right link (see Figure 14-4).

Finally, you have a match for the first letter of the word, so you can move on to the second letter, a, starting with the first child of b, as shown in Figure 14-5.

347

Chapter 14

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-4: The letter b sorts after a, so you follow the right link.

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-5: A search for the next letter starts at the first child.

This time, you get a hit straightaway, so repeating the process, you move on to the third letter, t, and continue searching at the next child node (see Figure 14-6).

Once again you find a matching letter and, because there are no more letters in the search word, a matching word, and you did so in a total of five individual character comparisons.

At each level, you’re looking for a letter in a binary search tree. Once found, you move down a level and perform another search in a binary search tree for the next letter. This is done for all the letters of the word you are looking for. From this, you can deduce the run time for performing a lookup in your ternary search tree. You might guess that ternary search trees are at least as efficient as binary search trees. It turns out, however, that ternary search trees can actually be more efficient than simple binary search trees.

348

Ternary Search Trees

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-6: The search ends when the last letter is found.

Imagine you were looking for the word “man” in the tree from Figure 14-1. Count the character comparisons. First you would compare “m” with “c” (one). Next, you compare “m” with “m” (two), followed by “a” with “a” (three). You’re now at the last letter, so you compare “n” with “p” (four) and eventually “n” with “n” (five).

Now compare the same search in an binary search tree that stores the same words, as shown in Figure 14-7. First you compare “man” with “cup,” but as you get a mismatch on the first letter, you can move to the next node having performed only one letter comparison. Next you compare “man” with “map”; that’s an extra three comparisons. Finally, you compare “man” with “man” for another three comparisons, giving you a grand total of 1 + 3 + 3 = 7 single-letter comparisons.

cup

ape

map

bat man

Figure 14-7: An equivalent binary search tree.

Even in such a simple tree as the one used here, you can see that a ternary search tree performs far fewer individual character comparisons compared with an equivalent binary search tree. This is because words that share a common prefix are compressed. Having traversed those common letters, you need never compare them again. Compare this with a binary search tree, in which you continually compare all the letters from each node, as shown previously.

349

Chapter 14

In addition to being efficient at finding positive results, ternary search trees particularly excel at quickly discarding words that don’t exist in the tree. Whereas a binary search tree continues searching until it runs out of nodes at the leaves, a ternary search tree will terminate as soon as no matching prefix is found.

Now that you understand how searching works, you can work out the general performance characteristics for ternary search trees. Imagine that every level contained all the letters from the alphabet arranged as nodes in a binary search tree. If the size of the alphabet (for example, A to Z for English) is represented by M, then you know that a search at any given level will take on average O(log M) comparisons. Of course, that’s just for one letter. To find a word of length N, you would need to perform one binary search for each letter, or O(N log M) comparisons. In practice, however, the performance turns out to be much better due to common prefixes and the fact that not every letter of the alphabet appears at each level of each branch in the tree.

Inserting a Word

Inserting a word into a ternary search tree isn’t much more difficult than performing a search; you simply add extra leaf nodes for any letters that don’t already exist. In Figure 14-8, you can see that inserting the word “bats” requires the addition of a single child node, tacked onto the end of the existing word “bat,” whereas inserting the word “mat” adds a single node as a sibling of the letter “p” in “map.”

 

 

c

 

 

a

 

u

 

m

p

b

p

 

a

e

a

 

 

p

 

t

 

n

t

s

Figure 14-8: Inserting “bats” or “mat” requires the addition of only one extra node.

Of course, you now have a situation in which multiple words — ”bat” and its plural, “bats” — share a common prefix. How can you tell them apart? How do you determine that “bat” is a word and so is “bats,” but that “ba,” “b,” or even “ap” aren’t?

350

Ternary Search Trees

The answer is simple: You store some associated information along with each node that tells you when you have found the end of a word, as shown in Figure 14-9. This associated information might take the form of a simple Boolean yes or no flag if all you needed was to determine whether a given search word was valid or not; or it might be the definition of the word if you were to implement a dictionary, for example. It doesn’t really matter what is used — the important point is that you mark, in some way, only those nodes that represent the last letter in a word.

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

s

Figure 14-9: Nodes are marked to show they represent the end of a word.

So far, you’ve only shown balanced trees, but just like binary search trees, ternary search trees can become unbalanced. The tree shown in Figure 14-9 is the result of inserting words in the order “cup,” “ape,” “bat,” “map,” and “man,” but what would happen if you inserted the words in sorted order: “ape,” “bat,” “cup,” “man,” and “map” instead? Unfortunately, in-order insertion of words leads to an unbalanced tree, as shown in Figure 14-10.

While searching a balanced ternary search tree runs in O(N log M) comparisons, searching an unbalanced tree requires O(NM) comparisons. The difference could be quite substantial for very large values of M, although in practice you typically see much better performance due to prefix-sharing and the fact that not every letter of the alphabet is represented on each level.

Prefix Searching

Perhaps you have used an application or a website that allowed you to select a value from a list by typing the first couple of letters from a word. As you type, the list of possibilities narrows until eventually it contains only a handful of values.

351

Chapter 14

a

p b

e a c

t u m

p a

n

p

Figure 14-10: In-order insertion leads to an unbalanced tree.

One interesting use for a ternary search tree is finding all words with a common prefix. The trick is to perform a standard search of the tree using only the prefix. Then perform an in-order traversal looking for every end-of-word marker in every subtree of the prefix.

An in-order traversal of a ternary search tree is very similar to that of a binary search tree except, of course, you need to include traversal of the child node. Having found the last node in the prefix, follow these steps:

1.

2.

3.

4.

Traverse the left subtree of the node.

Visit the node itself.

Traverse the node’s children.

Traverse the right subtree.

Figure 14-11 shows the first match for the prefix “ma.”

Having found the prefix, you start by traversing the left subtree, which in this case returns “man.” Next you traverse the node itself: “ma.” However, it doesn’t mark the end of a word. You would now traverse any children if there were any. Finally, you traverse the right subtree as shown in Figure 14-12, resulting in a match with “map.”

352

Ternary Search Trees

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-11: The first end-point for prefix “ma” is for “man.”

 

 

c

 

a

 

u

m

p

b

p

a

e

a

 

p

 

t

 

n

Figure 14-12: The next end-point for the prefix “ma” is for “map.”

As you traverse the tree, you can either print out the words as you find them or collect them and use them in another way — for example, displaying them in a list to a user.

Pattern Matching

Have you ever been doing a crossword puzzle or playing a game of Scrabble and racked your brain to think of a word that would fit between the existing letters? There in front of you are the letters: “a-r---t” — but not a single possibility springs to mind.

353