Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 15

private final List _entries = new ArrayList(_maxKeysPerNode + 1); private final List _children;

public Node(boolean leaf) {

_children = !leaf ? new ArrayList(_maxKeysPerNode + 2) : (List) EmptyList.INSTANCE;

}

public boolean isFull() {

return _entries.size() > _maxKeysPerNode;

}

public Entry search(Object key) { int index = indexOf(key);

if (index >= 0) {

Entry entry = (Entry) _entries.get(index); return !entry.isDeleted() ? entry : null;

}

return !isLeaf() ? ((Node) _children.get(-(index + 1))).search(key) :

null;

}

public Object set(Object key, Object value) { int index = indexOf(key);

if (index >= 0) {

return ((Entry) _entries.get(index)).setValue(value);

}

return set(key, value, -(index + 1));

}

private Object set(Object key, Object value, int index) { if (isLeaf()) {

_entries.insert(index, new Entry(key, value)); ++_size;

return null;

}

Node child = ((Node) _children.get(index));

Object oldValue = child.set(key, value);

if (child.isFull()) { child.split(this, index);

}

return oldValue;

}

private int indexOf(Object key) { int lowerIndex = 0;

int upperIndex = _entries.size() - 1;

while (lowerIndex <= upperIndex) {

384

B-Trees

int index = lowerIndex + (upperIndex - lowerIndex) / 2;

int cmp = _comparator.compare(key, ((Entry) _entries.get(index)).getKey());

if (cmp == 0) { return index;

}else if (cmp < 0) { upperIndex = index - 1;

}else {

lowerIndex = index + 1;

}

}

return -(lowerIndex + 1);

}

public void split(Node parent, int insertionPoint) { assert parent != null : “parent can’t be null”;

Node sibling = new Node(isLeaf());

int middle = _entries.size() / 2;

move(_entries, middle + 1, sibling._entries); move(_children, middle + 1, sibling._children);

parent._entries.insert(insertionPoint, _entries.delete(middle));

if (parent._children.isEmpty()) { parent._children.insert(insertionPoint, this);

}

parent._children.insert(insertionPoint + 1, sibling);

}

public void traverse(List list) {

assert list != null : “list can’t be null”;

Iterator children = _children.iterator();

Iterator entries = _entries.iterator();

children.first();

entries.first();

while (!children.isDone() || !entries.isDone()) { if (!children.isDone()) {

((Node) children.current()).traverse(list); children.next();

}

if (!entries.isDone()) {

Entry entry = (Entry) entries.current(); if (!entry.isDeleted()) {

list.add(entry);

385

Chapter 15

}

entries.next();

}

}

}

private void move(List source, int from, List target) { assert source != null : “source can’t be null”; assert target != null : “target can’t be null”;

while (from < source.size()) { target.add(source.delete(from));

}

}

private boolean isLeaf() {

return _children == EmptyList.INSTANCE;

}

}

private static final class Entry extends DefaultEntry { private boolean _deleted;

public Entry(Object key, Object value) { super(key, value);

}

public boolean isDeleted() { return _deleted;

}

public void setDeleted(boolean deleted) { _deleted = deleted;

}

}

}

How It Works

The BTreeMap class holds a comparator to use for ordering the keys, the maximum number of keys per node, the root node, and the number of entries in the map. Notice that the minimum number of keys allowed per node is two. Because a node is split, there needs to be at least one key in the left child, one in the right, and one to move into the parent node. If the minimum number of keys was set at one, a node would be considered full with only two keys and therefore there would be too few to perform a split:

package com.wrox.algorithms.btrees;

import com.wrox.algorithms.iteration.Iterator; import com.wrox.algorithms.lists.ArrayList; import com.wrox.algorithms.lists.EmptyList; import com.wrox.algorithms.lists.List;

import com.wrox.algorithms.maps.DefaultEntry; import com.wrox.algorithms.maps.Map;

386

B-Trees

import com.wrox.algorithms.sorting.Comparator;

public class BTreeMap implements Map {

private static final int MIN_KEYS_PER_NODE = 2;

private final Comparator _comparator; private final int _maxKeysPerNode; private Node _root;

private int _size;

public BTreeMap(Comparator comparator, int maxKeysPerNode) { assert comparator != null : “comparator can’t be null”;

assert maxKeysPerNode >= MIN_KEYS_PER_NODE : “maxKeysPerNode can’t be < “ + MIN_KEYS_PER_NODE;

_comparator = comparator; _maxKeysPerNode = maxKeysPerNode; clear();

}

...

}

There are also two inner classes — Entry and Node — that represent a Map.Entry and a B-Tree node, respectively.

In addition to extending DefaultEntry, the Entry inner class also holds a Boolean flag indicating whether it has been deleted or not. This flag can be switched on and off as appropriate and is used for deleting entries:

private static final class Entry extends DefaultEntry { private boolean _deleted;

public Entry(Object key, Object value) { super(key, value);

}

public boolean isDeleted() { return _deleted;

}

public void setDeleted(boolean deleted) { _deleted = deleted;

}

}

The Node inner class is where most of the work is performed, so we’ll discuss this class first, before the methods on the main BTreeMap class.

Each node is constructed with a Boolean to indicate whether it is to be a leaf node or not. Recall that leaf nodes have no children, which is reflected in the constructor: If the node is a leaf, the list of children is set to the empty list; otherwise, a new array list is allocated to hold the children. This is reflected in the method isLeaf(), which is used to determine whether a node is a leaf or not. In addition, there is a method, isFull(), for determining whether a node contains more than the maximum allowable number of keys:

387

Chapter 15

private final class Node {

private final List _entries = new ArrayList(); private final List _children;

public Node(boolean leaf) {

_children = !leaf ? new ArrayList() : (List) EmptyList.INSTANCE;

}

public boolean isFull() {

return _entries.size() > _maxKeysPerNode;

}

private boolean isLeaf() {

return _children == EmptyList.INSTANCE;

}

...

}

The first thing you need is a method for searching the entries to find a key. The indexOf() method performs a simple linear search of the entries to find a matching key. If found, the position within the list (0, 1, 2, . . .) is returned; otherwise, a negative index is returned to indicate where the key would have been, had it existed. (If you’re interested in a more in-depth discussion of how linear searching works, refer to Chapter 9, as the code is identical to the search() method of LinearListSearcher except that it first retrieves the key from the entry before calling compare().)

private int indexOf(Object key) { int index = 0;

Iterator i = _entries.iterator();

for (i.first(); !i.isDone(); i.next()) {

int cmp = _comparator.compare(key, ((Entry) i.current()).getKey()); if (cmp == 0) {

return index;

} else if (cmp < 0) { break;

}

++index;

}

return -(index + 1);

}

Now that you can find a key within a node, searching through the nodes to find an entry is fairly straightforward.

The search() method first searches for a matching key. If one is found (index >= 0), it is returned immediately. Otherwise, if the node is not a leaf, the search continues recursively in the appropriate child; otherwise, it terminates without finding a matching entry — leaf nodes have no children. Notice that the search() method ignores entries that are marked as deleted. This is an important point to remember later:

388

B-Trees

public Entry search(Object key) { int index = indexOf(key);

if (index >= 0) {

Entry entry = (Entry) _entries.get(index); return !entry.isDeleted() ? entry : null;

}

return !isLeaf() ? ((Node) _children.get(-(index + 1))).search(key)

: null;

}

Next, you want to be able to insert keys into a node, but before doing so, it will be necessary to implement some code to split a node.

The split() method takes a reference to the parent node and a position into which the newly created node will be inserted. The first thing split() does is create a new node as its sibling — hence, the leaf flag is also copied (a sibling of a leaf is also a leaf). Next, all the entries and children from the midpoint on are moved from the node into the sibling. Then, the middle entry is inserted into the parent, followed by the reference to the sibling. A reference to the node being split is only ever inserted into the parent when the parent is a newly created root node, i.e., it has no children:

public void split(Node parent, int insertionPoint) { assert parent != null : “parent can’t be null”;

Node sibling = new Node(isLeaf());

int middle = _entries.size() / 2;

move(_entries, middle + 1, sibling._entries); move(_children, middle + 1, sibling._children);

parent._entries.insert(insertionPoint, _entries.delete(middle));

if (parent._children.isEmpty()) { parent._children.insert(insertionPoint, this);

}

parent._children.insert(insertionPoint + 1, sibling);

}

private void move(List source, int from, List target) { assert source != null : “source can’t be null”; assert target != null : “target can’t be null”;

while (from < source.size()) { target.add(source.delete(from));

}

}

Now that you can split a node, you can go about adding new entries, remembering that a map guarantees uniqueness of keys. For this reason, entries are not always inserted. Instead, if an entry with a matching key already exists, the associated value is updated.

389

Chapter 15

The first set() method starts by obtaining the position to the key within the node. If the key was found (index >= 0), the corresponding entry is retrieved, the value updated, and the old value returned. If the key wasn’t found, then it might need to be inserted, but then again it might also exist within a child node. This logic is handled by the second set() method.

The second set() method first determines whether the node is a leaf. If it is, then the key doesn’t exist anywhere in the tree and is therefore inserted along with the value as a new entry, and the size of the map is incremented accordingly. If the node has children, however, the appropriate child is found and a recursive call is made to the first set() method. In this case, if after insertion the child becomes full, it will need to be split:

public Object set(Object key, Object value) { int index = indexOf(key);

if (index >= 0) {

return ((Entry) _entries.get(index)).setValue(value);

}

return set(key, value, -(index + 1));

}

private Object set(Object key, Object value, int index) { if (isLeaf()) {

_entries.insert(index, new Entry(key, value)); ++_size;

return null;

}

Node child = ((Node) _children.get(index));

Object oldValue = child.set(key, value);

if (child.isFull()) { child.split(this, index);

}

return oldValue;

}

The only other method on the node — traverse() — is used for iteration. This method adds all the entries in the tree into a list. It starts by adding all nondeleted entries in the current node. It then recursively calls each of its children to do the same. This is essentially a pre-order traversal (it is also possible to implement an in-order traversal, an exercise left to the reader):

public void traverse(List list) {

assert list != null : “list can’t be null”;

Iterator entries = _entries.iterator();

for (entries.first(); !entries.isDone(); entries.next()) { Entry entry = (Entry) entries.current();

if (!entry.isDeleted()) { list.add(entry);

}

}

Iterator children = _children.iterator();

390

B-Trees

for (children.first(); !children.isDone(); children.next()) { ((Node) children.current()).traverse(list);

}

}

Now that you’ve covered the Node inner class, you can proceed to the remaining BTreeMap methods required by the Map interface.

The get() method returns the value associated with a key. The search() method of the root node is called with the specified key. If an entry is found, the associated value is returned; otherwise, null is returned to indicate that the key doesn’t exist in the tree:

public Object get(Object key) { Entry entry = _root.search(key);

return entry != null ? entry.getValue() : null;

}

The contains() method determines whether a key exists within the tree. Again, the search() method is called on the root node and true is returned if an entry is found:

public boolean contains(Object key) { return _root.search(key) != null;

}

The set() method adds or updates the value associated with a specified key. Here, the set() method on the root node is called to do most of the work. After the method returns, the root node is checked to determine whether it is now full. If so, a new root node is created and the existing one is split. If not, no special handling is required. In either case, the old value associated with the key (if any) is returned to the caller as required by the Map interface:

public Object set(Object key, Object value) { Object oldValue = _root.set(key, value);

if (_root.isFull()) {

Node newRoot = new Node(false); _root.split(newRoot, 0);

_root = newRoot;

}

return oldValue;

}

The delete() method removes a specified key — and its associated value — from the map. Again, the search() method is called on the root node to find the entry for the specified key. If no entry is found, then null is returned to indicate that the key didn’t exist. Otherwise, the entry is marked as deleted, the size of the map is decremented accordingly, and the associated value is returned to the caller:

public Object delete(Object key) { Entry entry = _root.search(key); if (entry == null) {

return null;

}

entry.setDeleted(true);

391

Chapter 15

--_size;

return entry.setValue(null);

}

The iterator() method returns an iterator over all the entries in the map, in no particular order. The traverse() method on the root node is called, passing in a list to populate with all the entries in the tree, from which an iterator is returned and passed back to the caller:

public Iterator iterator() {

List list = new ArrayList(_size);

_root.traverse(list);

return list.iterator();

}

The clear() method removes all entries from the map. To empty the tree, the root node is set to a leaf node — as it has no children — and the size is reset to 0:

public void clear() { _root = new Node(true); _size = 0;

}

Finally, the size() and isEmpty() methods complete the interface:

public int size() { return _size;

}

public boolean isEmpty() { return size() == 0;

}

The implementation you’ve just created only works in memory. Creating a version that can be saved to and restored from some external medium such as a hard disk requires a little work, but it’s relatively straightforward. See [Cormen, 2001] for more information.

Summar y

This chapter demonstrated the following key points:

B-Trees are ideally suited for searching on external storage such as hard disks, compact discs, and so on.

B-Trees grow from the leaves up.

Each nonroot node is always at least half full.

Nodes split whenever they become “full.”

392

B-Trees

When a node splits, one of the keys is pushed up into the parent.

The height of a B-Tree only increases when the root node splits.

B-Trees remain “balanced,” guaranteeing O(log N) search times.

Exercises

1.Re-implement the traverse() method on Node to return the entries in key order.

2.Re-implement the indexOf() method on Node to perform a binary search instead of a linear search.

393