Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Discrete math with computers_3

.pdf
Скачиваний:
84
Добавлен:
16.03.2016
Размер:
2.29 Mб
Скачать

88

CHAPTER 5. TREES

data BinTree a = BinLeaf

| BinNode a (BinTree a) (BinTree a) deriving Show

The point of including a in the tree type is that it allows any type a to be used for the data attached to nodes, but it also requires that all the nodes have attached data of the same type. Here are some examples of correct tree definitions:

tree4 :: BinTree String

tree4 = BinNode "cat" BinLeaf (BinNode "dog" BinLeaf BinLeaf)

tree5 :: BinTree (Integer,Bool) tree5 = BinNode (23,False)

BinLeaf

(BinNode (49,True) BinLeaf BinLeaf)

tree6 :: BinTree Int tree6 = BinNode 4

(BinNode 2

(BinNode 1 BinLeaf BinLeaf)

(BinNode 3 BinLeaf BinLeaf)) (BinNode 6

(BinNode 5 BinLeaf BinLeaf)

(BinNode 7 BinLeaf BinLeaf))

The following definition produces a type error, because the nodes have attached data values of di erent types. The compiler will give an error message saying that Char, the type of one of the nodes, doesn’t match Bool, the type of the other nodes.

treeBad = BinNode ’c’

(BinNode True BinLeaf BinLeaf)

(BinNode False BinLeaf BinLeaf)

Exercise 1. Define a Haskell datatype Tree1 for a tree that contains a character and an integer in each node, along with exactly three subtrees.

Exercise 2. Define a Haskell datatype Tree2 for a tree that contains an integer in each node, and that allows each node to have any number of subtrees.

5.3Processing Trees with Recursion

Just as a for-loop is the natural programming technique for processing an array, recursion is the natural programming technique for processing trees. Recursion

5.3. PROCESSING TREES WITH RECURSION

89

fits well with trees, because each subtree is a complete tree in its own right, so subtrees can be handled by recursive function applications. This section presents several typical algorithms on trees and shows how to implement them with recursive functions.

5.3.1Tree Traversal

A natural task is to visit each node in the tree in order to process the data in that node, building a list of the results. An algorithm that works through a tree in this fashion is called a tree traversal. A traversal needs to decide what order to visit the nodes, and there are three specific traversal orders for binary trees that are commonly used in algorithms:

Preorder traversal: first visit the root, then traverse the left subtree, then traverse the right subtree. A preorder traversal of the tree in Figure 5.3 produces [4, 2, 1, 3, 7, 5, 6, 8].

Inorder traversal: first visit the left subtree, then the root, then the right subtree. An inorder traversal of the tree in Figure 5.3 produces [1, 2, 3, 4, 5, 6, 7, 8].

Postorder traversal: Traverse the left subtree, then the right subtree, and finally visit the root. A postorder traversal of the tree in Figure 5.3 produces [1, 3, 2, 6, 5, 8, 7, 4].

Functions can be defined to perform each of the traversals using recursion. The traversal functions take an argument which is a tree of type BinTree a, and they produce a flat list with type [a] consisting of the data values found attached to the nodes. All three of the traversal functions will produce a list containing all the data values in a tree; the only di erence among them is the order of items in the list. The definitions of these functions are similar to the informal definitions of the traversal algorithms.

inorder :: BinTree a -> [a] inorder BinLeaf = []

inorder (BinNode x t1 t2) = inorder t1 ++ [x] ++ inorder t2

preorder :: BinTree a -> [a] preorder BinLeaf = []

preorder (BinNode x t1 t2) = [x] ++ preorder t1 ++ preorder t2

postorder :: BinTree a -> [a] postorder BinLeaf = [] postorder (BinNode x t1 t2) =

postorder t1 ++ postorder t2 ++ [x]

90

CHAPTER 5. TREES

 

The traversal functions produce an empty list for an empty tree, and they

use the ++ operator to construct lists for nonempty trees. The subtrees are handled by recursion, and the data value x attached to the current node is also placed in the result list. The best way to understand the recursion is by equational reasoning. (Some books approach recursion by considering stacks, return addresses, and other low level details, but those topics just confuse the issue and are appropriate only for compiler writers.) Here, for example, is the inorder traversal of of a small tree:

inorder (BinNode 4 (BinNode 8 BinLeaf BinLeaf) BinLeaf)

=inorder (BinNode 8 BinLeaf BinLeaf) ++ [4] ++ inorder BinLeaf

=(inorder BinLeaf ++ [8] ++ inorder BinLeaf) ++ [4] ++ []

=([] ++ [8] ++ []) ++ [4] ++ []

=[8, 4]

All of these traversal functions convert a tree to a list containing the data attached to the tree’s nodes. The inorder traversal produces a list with the data in the same order that would appear if you read the tree from left to right. In many applications this is the most natural way to represent the tree’s data as a list. Because of this, the inorder function is often named flatten.

Exercise 3. Calculate the inorder traversal of tree3.

Exercise 4. Suppose that a tree has type BinTree a, and we have a function f :: a -> b. Write a new traversal function inorderf :: (a->b) -> BinTree a -> [b] that traverses the tree using inorder, but it applies f to the data value in each node before placing the result in the list. For example, inorder tree6 produces [1, 2, 3, 4, 5, 6, 7], but inorderf (2*) tree6 produces [2, 4, 6, 8, 10, 12, 14].

5.3.2Processing Tree Structure

There are several functions that measure the size of a tree, or a ect its shape. This section introduces some of the basic ones; more advanced operations are described in Chapter 12, which discusses AVL trees.

The reflect function takes a binary tree and returns its mirror image, where everything is reversed left-to-right.

reflect :: BinTree a -> BinTree a reflect BinLeaf = BinLeaf

reflect (BinNode n l r) = BinNode n (reflect r) (reflect l)

Some of the most important properties of trees are concerned with the numbers of nodes in the two branches and with the heights of trees and subtrees. The time required by many algorithms depends on the heights of trees, so the science of algorithmics is often concerned with these properties.

5.3. PROCESSING TREES WITH RECURSION

91

The height of a tree is the distance between its root and its deepest leaf. An empty tree, consisting only of a leaf, has height 0. The height of a nonempty tree is the height of its taller subtree, plus one to account for the root node. Thus the height satisfies the following equations:

height :: BinTree a -> Integer height BinLeaf = 0

height (BinNode x t1 t2) = 1 + max (height t1) (height t2)

The size of a tree is the number of nodes that it contains. This is measured simply by adding up the sizes of the subtrees, plus one for the root node.

size :: BinTreeInt -> Int size Leaf = 0

size (Node x t1 t2) = 1 + size t1 + size t2

The size of a tree of type BinTree a tells you how many data values of type a are represented in the tree, and it is also related to the amount of computer memory required to represent the tree. The height of a tree is related to the tree’s shape. At one extreme, a tree like tree1 below, where all the left nodes are leaves, has a height that is the same as its size. At the other extreme, a tree like tree2, where the data is distributed evenly throughout the tree, will have a smaller height given the same amount of data. Such a tree is said to be balanced.

tree7, tree8 :: BinTree Integer

tree7 = BinNode 1 BinLeaf (BinNode 2

BinLeaf

(BinNode 3 BinLeaf BinLeaf))

tree8 = BinNode 1

(BinNode 2 BinLeaf BinLeaf)

(BinNode 3 BinLeaf BinLeaf)

We can give a formal definition of balanced trees by writing down the following equations, which also define an executable function. The function application balanced t returns True if the tree t is perfectly balanced, and False otherwise.

balanced :: BinTree a -> Bool balanced BinLeaf = True balanced (BinNode x t1 t2) =

balanced t1 && balanced t2 && (height t1 == height t2)

92

CHAPTER 5. TREES

Exercise 5. Define two trees of size seven, one with the largest possible height and the other with the smallest possible height.

Exercise 6. Suppose that the last equation of the function balanced were changed to the following: balanced (BinNode x t1 t2) = balanced t1 && balanced t2. Give an example showing that the modified function returns True for an unbalanced tree.

Exercise 7. Suppose that the last equation of the function balanced were changed to the following: balanced (BinNode x t1 t2) = height t1 == height t2. Give an example showing that the modified function returns True for an unbalanced tree.

5.3.3Evaluating Expression Trees

Software that is working with text written in some language often uses trees to represent documents in the language. Trees express the essential structure of the tex while omitting unimportant details. Examples include programs that manipulate natural language, as well as compilers and interpreters for programming languages.

Consider a simple expression language, consisting of integer constants, additions, and multiplications. A document in the language can be expressed with a tree type:

data Exp

= Const Integer | Add Exp Exp | Mult Exp Exp

A simple programming language interpreter can now be written as a tree traversal. The function takes an expression tree and returns the value that it denotes.

eval :: Exp -> Integer eval (Const n) = n

eval (Add e1 e2) = eval e1 + eval e2 eval (Mult e1 e2) = eval e1 * eval e2

5.3.4Binary Search Trees

Suppose you have a large set of pairs of keys (with some type a) and corresponding values (with some type b). For example, a key might be a person’s name and the value their age. A crucial problem in computing with databases is to find the value corresponding to a given key.

A straightforward approach, called a linear search, is to store the data as a list of pairs, so that the database has type [(a,b)]. The search algorithm,

5.3. PROCESSING TREES WITH RECURSION

93

linSearch, is given a key of type a and a database, and it works through the list sequentially until the right pair is found. The key type a must be in the equality class (i.e., it has to be a type that can be compared with ==) because we have to be able to compare two keys to determine whether they are the same. It is possible that the key might not appear in the database, so we return the result as a Maybe type. Thus a failed search returns Nothing and a successful search returns Just y.

linSearch :: Eq a => a -> [(a,b)] -> Maybe b linSearch k [] = Nothing

linSearch k ((x,y):xs) = if k==x

then Just y

else linSearch k xs

The linear search is simple; its only problem is e ciency. Since databases may contain millions of key-value pairs, it is important to perform the search quickly. For random data, it is reasonable to assume that an average linear search will find the pair at the middle of the list—half of the searches will find the data sooner, half will find it later. Therefore the time required for the search is proportional to the length of the list. If a bank with a million customers were to use a linear search, the average search time would be half a million iterations.

A much faster approach is the binary search algorithm, and a particularly good way to implement this is with a binary search tree. The idea is to store the data in such a way that a particular key-value pair can be found quickly, without examining half of the database.

A binary search tree is a tree of type BinTree (a, b), where a is the key type and b is the value type. A binary search tree must also satisfy an additional property: if the key in a node has the value k, then all keys in the left subtree must be less than k, and all the keys in the right subtree must be greater than k. This property must hold throughout the entire tree, not just for the top node. If a tree of type BinTree (a, b) lacks this property, then it is still a perfectly good tree, but it is not a binary search tree. Since we need to compare keys for ordering, not just equality, the key type a must be in the Ord class (ensuring that < and > can be used).

The bstSearch algorithm first compares the key it is searching for with the key x of the root node. If there is a match, then the search has finished successfully. Otherwise, if key < x then it is guaranteed that the answer will be found in the left subtree, or not at all. Therefore the algorithm just continues with a recursive search down the left subtree. However, if key > x then the search continues down the right subtree.

bstSearch :: Ord a => a -> BinTree (a,b) -> Maybe b bstSearch key BinLeaf = Nothing

94

CHAPTER 5. TREES

bstSearch key

(BinNode (x,y) t1 t2) =

if key == x

 

then Just

y

else if key

< x

then

bstSearch key t1

else

bstSearch key t2

The average time required for a search is proportional to the height of the tree. With good luck, the database tree will be reasonably balanced, so that about half the data is in the left subtree and half in the right subtree. In this case, every step of the search algorithm discards half of the remaining data, so that the average search time for a database of size n is about log n, compared with about n/2 for the linear search. For a large database, the di erence is huge: with one million items, the average linSearch time is 500,000 while the average bstSearch time is 20. For a database containing a billion items, the number of steps in the search increases by only fifty percent, to 30 steps, even though the size of the database has increased by a factor of a thousand. The search time gets better and better, relatively speaking, as the size of the database increases. This property is crucial for large scale practical applications.

In practical applications, we start with an empty binary search tree (i.e., BinLeaf), and construct the database by inserting the key-value pairs one by one. The insert function takes a new pair, and an existing binary search tree, and it returns a new tree that contains the additional data. The function is defined carefully so that it creates a valid binary search tree (provided that its argument is valid). The function definition has a recursive structure similar to that of the search.

insert :: Ord a => (a,b) -> BinTree (a,b) -> BinTree (a,b) insert (key,d) BinLeaf = BinNode (key,d) BinLeaf BinLeaf insert (key,d) (BinNode (x,y) t1 t2) =

if key == x

then BinNode (key,d) t1 t2 else if key < x

then BinNode (x,y) (insert (key,d) t1) t2 else BinNode (x,y) t1 (insert (key,d) t2)

There is an interesting point to notice about binary search trees. Sometimes the properties that a data structure must satisfy are specified completely by its type. For example, a linear search algorithm can be applied to list of key-value pairs as long as the list has the right type, [(a,b)]. In contrast, the binary search tree must be of the right type, but it must also satisfy the ordering constraint. Many data structures are like this, with some additional properties required beyond just the type.

The distinction is important, because the type of the data structure can be checked by the compiler, whereas the additional properties cannot. The

5.3. PROCESSING TREES WITH RECURSION

95

type system of a programming language provides a way to describe required properties that can be checked at compile time; if the type checking indicates no problem, then it is guaranteed that the program will never generate any data that lacks the necessary properties. For example, we said above that the linear search algorithm requires its database to have a certain type. When the compiler checks the type of the program and finds no type errors, this constitutes a formal proof that the program will never generate a database for the linear search which is in the wrong form, and could thus cause a runtime error. The type system is unable, however, to prove that a binary search tree built using the insert function satisfies the required properties. Therefore the programmer needs to do the proof manually.

There is much more to say about the execution times of these algorithms. Books on the analysis of algorithms give a detailed discussion, but we will make just a few comments. In the first place, the times for individual searches will vary. Consider the linear search: with good luck, the desired data will be found at the very beginning of the list, and with bad luck it will appear at the end. It is not the case that every search takes n/2 time, for a database of size n. Instead, the average time over a large number of searches can be expected to be n/2. The correct way to calculate the average execution time of an algorithm is to add up the times required by every possible case, weighted by the probability of that case occurring.

The e ciency of the bstSearch algorithm is extremely sensitive to the shape of the tree. If the tree is perfectly balanced, the search time is log n, but it is possible for the tree to be completely unbalanced. This causes the bstSearch to behave much like a linear search, requiring linear time.

In large scale practical applications, it isn’t good enough to hope that the tree will be balanced through good fortune. A better approach is to change the insert function so that it guarantees a reasonably balanced result. This leads to a tradeo : we can make the search faster at the cost of a slower insertion. Ideally, we would like algorithms that give good results for searches and insertions, regardless of the data values. These issues are considered in more depth in Chapter 12.

Exercise 8. Define a function mapTree that takes a function and applies it to every node in the tree, returning a new tree of results. The type should be mapTree :: (a->b) -> BinTree a -> BinTree b. This function is analogous to map, which operates over lists.

Exercise 9. Write concatTree, a function that takes a tree of lists and concatenates the lists in order from left to right. For example,

concatTree (Node [2] (Node [3,4] Tip Tip) (Node [5] Tip Tip))

==> [3,4,2,5]

Exercise 10. Write zipTree, a function that takes two trees and pairs each of

96

CHAPTER 5. TREES

the corresponding elements in a list. Your function should return Nothing if the two trees do not have the same shape. For example,

zipTree (Node 2 (Node 1 Tip Tip) (Node 3 Tip Tip)) (Node 5 (Node 4 Tip Tip) (Node 6 Tip Tip))

==> Just [(1,4),(2,5),(3,6)]

Exercise 11. Write zipWithTree, a function that is like zipWith except that it takes trees instead of lists. The first argument is a function of type a->b->c, the second argument is a tree with elements of type a,and the third argument is a tree with elements of type b. The function returns a list with type [c].

5.4Induction on Trees

There are many useful theorems about the properties of trees, and the behaviour of functions on trees. To prove them, we can use induction, but the principle of induction needs to be made slightly more general for tree structures.

The general idea behind tree induction is similar to list induction. The base case is used for empty trees (or leaves), and the induction case is used for nodes. The principle of tree induction is stated below for binary trees, but it can be generalised to other types of tree as well.

Theorem 28 (Principle of induction on binary trees). Let BinTree a be a binary tree type as defined above, and let P (t) be a proposition on trees. Suppose the following two requirements hold:

Base case: P (BinLeaf)

Induction case: For all t1 and t2 of type BinTree a, and all x :: a, suppose that the proposition holds for a tree consisting of a node, the value a, and the subtrees t1 and t2, provided that the proposition holds for t1 and t2. Using the notation of the chapter on propositional logic, this can be written formally as P (t1) P (t2) → P (BinNode x t1 t2).

Then t :: BinTree a . P (t); thus the proposition holds for all trees of finite size.

Carrying out a proof using tree induction is generally no harder than list induction. A number of examples are given below. In all cases, we will use trees of type BinTree a, for an arbitrary type a.

5.4.1Repeated Reflection Theorem

Recall the reflect function, which reverses the order of the data in a binary tree. The following theorem says that if you reflect a tree twice, you get the same tree back.

5.4. INDUCTION ON TREES

97

Theorem 29. Let t :: BinTree a. Then reflect (reflect t) = t.

Proof. The proposition P (t) that we need to prove says reflect (reflect t) = t. The theorem is proved by induction over t. The base case is BinLeaf:

reflect (reflect BinLeaf)

 

= reflect BinLeaf

{ reflect.1 }

= BinLeaf

{ reflect.1 }

For the inductive case, let t1, t2 :: BinTree a be trees, and assume P (t1) and P (t2). These are the inductive hypotheses. The aim is to prove that the proposition holds for a larger tree P (Node x t1 t2).

reflect (reflect (Node x t1 t2))

 

 

= reflect (Node x (reflect

t1)

(reflect t2))

{ reflect.2 }

= Node x (reflect (reflect

t1))

 

{ reflect.2 }

(reflect (reflect

t2))

 

 

= Node x t1 t2

 

 

{ hypothesis }

Now we have proved the base case, and we have proved the logical implication required for the induction case—that is, if the proposition holds for t1 and t2, then it must also hold for Node x t1 t2. Thus, by the principle of tree induction, the theorem holds for all finite trees.

5.4.2Reflection and Reversing

Reflecting a tree changes the order of data values, in a similar way to the reversal of a list. It seems that reflect and reverse are somehow doing the same thing, but these functions are definitely not equal.

Whenever we find functions that seem intuitively to be doing related things, it’s useful to state this relationship precisely as an equation, rather than just a vague phrase in English. The resulting theorem gives a deeper understanding, and it may also be useful in practice. A powerful problem solving technique is to notice that a problem you have can be translated into a related notation for which you already have a solution.

To state precisely the relationship between reflecting a tree and reversing a list, we need to use an explicit translation between trees and lists. The inorder function is exactly what we need.

Theorem 30. Let t :: BinTree a be an arbitrary binary tree of finite size. Then inorder (reflect t) = reverse (inorder t).

The proof requires two lemmas that describe properties of reverse and (++). We leave the proofs of the lemmas as an exercise.

reverse xs ++ [x] = reverse ([x] ++ xs) reverse (xs++ys) = reverse ys ++ reverse xs

Proof. Base case.

Соседние файлы в предмете Дискретная математика