Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:

Beginning Algorithms (2006)

.pdf
Скачиваний:
252
Добавлен:
17.08.2013
Размер:
9.67 Mб
Скачать

Chapter 14

}

if (c == WILDCARD || c == node.getChar()) { if (index + 1 < pattern.length()) {

patternMatch(node.getChild(), pattern, index + 1, results); } else if (node.isEndOfWord()) {

results.add(node.getWord());

}

}

if (c == WILDCARD || c > node.getChar()) { patternMatch(node.getLarger(), pattern, index, results);

}

}

private void inOrderTraversal(Node node, List results) { assert results != null : “results can’t be null”;

if (node == null) { return;

}

inOrderTraversal(node.getSmaller(), results); if (node.isEndOfWord()) {

results.add(node.getWord());

}

inOrderTraversal(node.getChild(), results); inOrderTraversal(node.getLarger(), results);

}

private static final class Node { private final char _c; private Node _smaller; private Node _larger; private Node _child;

private String _word;

public Node(char c) { _c = c;

}

public char getChar() { return _c;

}

public Node getSmaller() { return _smaller;

}

public void setSmaller(Node smaller) { _smaller = smaller;

}

public Node getLarger() { return _larger;

364

Ternary Search Trees

}

public void setLarger(Node larger) { _larger = larger;

}

public Node getChild() { return _child;

}

public void setChild(Node child) { _child = child;

}

public String getWord() { return _word;

}

public void setWord(String word) { _word = word;

}

public boolean isEndOfWord() { return getWord() != null;

}

}

}

How It Works

The class definition for TernarySearchTree is rather bare, containing a single instance variable for holding the root node and defining a constant to be used as the wildcard character when pattern matching:

package com.wrox.algorithms.tstrees;

import com.wrox.algorithms.lists.List;

public class TernarySearchTree {

public static final char WILDCARD = ‘?’;

private Node _root;

...

}

You also defined the Node class that makes up the structure of the tree, a very simple class for holding and retrieving a character value and references to the smaller and larger siblings as well as any children. Notice the strange variable _word. Recall that you needed some way to mark the end of a word. You could have used a Boolean, but for the purposes of this exercise we’ve instead chosen to store the actual word itself. Although that obviously consumes more memory, it makes the business of collecting words when performing a search much easier. There is also a convenience method, isEndOfWord(), that returns true only if there is a word stored in the node:

private static final class Node { private final char _c; private Node _smaller;

365

Chapter 14

private Node _larger; private Node _child; private String _word;

public Node(char c) { _c = c;

}

public char getChar() { return _c;

}

public Node getSmaller() { return _smaller;

}

public void setSmaller(Node smaller) { _smaller = smaller;

}

public Node getLarger() { return _larger;

}

public void setLarger(Node larger) { _larger = larger;

}

public Node getChild() { return _child;

}

public void setChild(Node child) { _child = child;

}

public String getWord() { return _word;

}

public void setWord(String word) { _word = word;

}

public boolean isEndOfWord() { return getWord() != null;

}

}

One thing to note before getting into the remainder of the code is that because the algorithms that operate on ternary search trees lend themselves easily to recursion, all the methods in this class have been coded as such.

The contains() method returns true if and only if the word exists in the tree (ignoring prefixes); otherwise, it returns false. After first validating the input, you then call search(), passing in the

366

Ternary Search Trees

root node (if any), the word for which to search, and the position of the first character. Finally, true is returned only if a node marking the end of a word was found; otherwise, false is returned to indicate the word was not found:

public boolean contains(CharSequence word) { assert word != null : “word can’t be null”;

assert word.length() > 0 : “word can’t be empty”;

Node node = search(_root, word, 0); return node != null && node.isEndOfWord();

}

The private search() method takes a node from which to start looking, the word to search for, and the position within the word from which to start. In return, search() provides the node containing the last character in the word, or null if the word was not found.

If there is no current node (node == null), the search can terminate immediately. Otherwise, the character at the current position is retrieved and the search begins.

If the current search character matches the one at the current node and there are more characters in the string (index + 1 < word.length()), the search progresses to the next letter, starting at the child node.

If the characters don’t match, the search character must exist either before or after the current node. If the character you are looking for sorts before the current node, then the search continues starting with the smaller sibling; otherwise, it must sort after the current node — in which case, the search continues with the larger sibling.

Eventually, either the letters in the search word run out or you run out of nodes. At this point, whichever node you are currently at (if any) is returned as the result:

private Node search(Node node, CharSequence word, int index) { assert word != null : “word can’t be null”;

if (node == null) { return null;

}

char c = word.charAt(index);

if (c == node.getChar()) {

if (index + 1 < word.length()) {

node = search(node.getChild(), word, index + 1);

}

} else if (c < node.getChar()) {

node = search(node.getSmaller(), word, index); } else {

node = search(node.getLarger(), word, index);

}

return node;

}

The methods add() and insert() work together to add new words to the tree.

367

Chapter 14

After checking the arguments to the method, add() calls insert, passing in the root node (if any), the word to be added, and the position of the first character in the word. The only other thing that needs to be done is update the root node if necessary, with the node returned by insert():

public void add(CharSequence word) {

assert word != null : “word can’t be null”; assert word.length() > 0 : “word can’t be empty”;

Node node = insert(_root, word, 0); if (_root == null) {

_root = node;

}

}

The insert() method starts by obtaining the current character from the word. Then, if there is no current node, one is created — you are, after all, adding.

The current character is then compared with the character for the current node. If it matches, then there are two possibilities: If there are still more characters to insert, then you recurse with the next character starting from the child node; otherwise, you can set the word in the current node to indicate you’re done.

If the character doesn’t match, then there are an additional two possibilities: either the character sorts lower than the current node or it sorts higher. In either case, you need to recurse using the same character but with the smaller or larger node, respectively.

Notice how the return value is used to update the reference to the appropriate child or sibling node. This works because the insert() method always returns the node just inserted (or the appropriate existing node). This means that the node eventually returned to add() is for the first character in the word, not the last, as you may have assumed:

private Node insert(Node node, CharSequence word, int index) { assert word != null : “word can’t be null”;

char c = word.charAt(index);

if (node == null) { node = new Node(c);

}

if (c == node.getChar()) {

if (index + 1 < word.length()) { node.setChild(insert(node.getChild(), word, index + 1));

} else { node.setWord(word.toString());

}

}else if (c < node.getChar()) { node.setSmaller(insert(node.getSmaller(), word, index));

}else {

node.setLarger(insert(node.getLarger(), word, index));

}

return node;

}

368

Ternary Search Trees

The method prefixSearch() first performs a general search to find the node containing the last letter of the prefix. This node is then passed to inOrderTraversal() along with the list for storing the results:

public void prefixSearch(CharSequence prefix, List results) { assert prefix != null : “prefix can’t be null”;

assert prefix.length() > 0 : “prefix can’t be empty”;

inOrderTraversal(search(_root, prefix, 0), results);

}

The method inOrderTraversal() recursively traverses first the smaller sibling, then the node’s child, and finally the large sibling. Each time a word is encountered (node.isEndOfWord()), it is added to the results:

private void inOrderTraversal(Node node, List results) { assert results != null : “results can’t be null”;

if (node == null) { return;

}

inOrderTraversal(node.getSmaller(), results); if (node.isEndOfWord()) {

results.add(node.getWord());

}

inOrderTraversal(node.getChild(), results); inOrderTraversal(node.getLarger(), results);

}

The first patternMatch() method calls the private method of the same name, passing the root node, the pattern to match, the position of the first character in the pattern, and, of course, the list into which the results will be stored:

public void patternMatch(CharSequence pattern, List results) { assert pattern != null : “pattern can’t be null”;

assert pattern.length() > 0 : “pattern can’t be empty”; assert results != null : “results can’t be null”;

patternMatch(_root, pattern, 0, results);

}

The second patternMatch() method looks rather like an in-order traversal of the tree, with some restrictions.

First, instead of always traversing the left and right siblings, a check is first performed to determine whether the traversal is actually required. If the current pattern character sorts before the current node, then a traversal of the smaller sibling is made; if it sorts after the node, then a traversal of the larger sibling is made; and if it is the same as the current node, a recursive call is made, with the next character in the pattern starting at the first child.

369

Chapter 14

Second, at each point, if the current pattern character is WILDCARD, then you traverse no matter what. This way, a wildcard character matches all other characters.

Finally, the search will only consider words of the same length as the pattern — for example, a pattern of length five will only match words of length five:

private void patternMatch(Node node, CharSequence pattern, int index, List results) {

assert pattern != null : “pattern can’t be null”; assert results != null : “results can’t be null”;

if (node == null) { return;

}

char c = pattern.charAt(index);

if (c == WILDCARD || c < node.getChar()) { patternMatch(node.getSmaller(), pattern, index, results);

}

if (c == WILDCARD || c == node.getChar()) { if (index + 1 < pattern.length()) {

patternMatch(node.getChild(), pattern, index + 1, results); } else if (node.isEndOfWord()) {

results.add(node.getWord());

}

}

if (c == WILDCARD || c > node.getChar()) { patternMatch(node.getLarger(), pattern, index, results);

}

}

Crossword Helper Example

Armed with your fully tested and implemented pattern matching code, you can turn your hand to a sample application that demonstrates a novel use of ternary search trees: crossword solving. In this section, you’ll develop a very small command-line application that takes as its arguments a file containing words — one word per line — and a pattern to match, optionally containing wildcard characters.

Try It Out

Creating the Crossword Helper Application

Create the CrosswordHelper class as follows:

package com.wrox.algorithms.tstrees;

import com.wrox.algorithms.iteration.Iterator; import com.wrox.algorithms.lists.LinkedList; import com.wrox.algorithms.lists.List;

import java.io.BufferedReader;

370

Ternary Search Trees

import java.io.FileReader; import java.io.IOException;

public final class CrosswordHelper { private CrosswordHelper() {

}

public static void main(String[] args) throws IOException { assert args != null : “args can’t be null”;

if (args.length < 2) {

System.out.println(“Usage CrosswordHelper <word-list> <pattern> [repetitions]”);

System.exit(-1);

}

int repetitions = 1; if (args.length > 2) {

repetitions = Integer.parseInt(args[2]);

}

searchForPattern(loadWords(args[0]), args[1], repetitions);

}

private static void searchForPattern(TernarySearchTree tree, String pattern, int repetitions) {

assert tree != null : “tree can’t be null”;

System.out.println(“Searching for pattern ‘“ + pattern + “‘...” + repetitions + “ times”);

List words = null;

for (int i = 0; i < repetitions; ++i) { words = new LinkedList(); tree.patternMatch(pattern, words);

}

Iterator iterator = words.iterator();

for (iterator.first(); !iterator.isDone(); iterator.next()) { System.out.println(iterator.current());

}

}

private static TernarySearchTree loadWords(String fileName) throws IOException

{

TernarySearchTree tree = new TernarySearchTree();

System.out.println(“Loading words from ‘“ + fileName + “‘...”);

BufferedReader reader = new BufferedReader(new FileReader(fileName));

try {

371

Chapter 14

String word;

while ((word = reader.readLine()) != null) { tree.add(word);

}

} finally {

reader.close();

}

return tree;

}

}

How It Works

The CrosswordHelper class defines the application entry point main(). This method first verifies that there are at least two arguments on the command line — one for the file containing the word list and another for the pattern. The filename, args[0], is then passed to loadWords(), which, as you will see in a moment, returns a ternary search tree that is then passed along with the pattern, args[1], to searchForPattern()to do the actual matching:

package com.wrox.algorithms.tstrees;

import com.wrox.algorithms.iteration.Iterator; import com.wrox.algorithms.lists.LinkedList; import com.wrox.algorithms.lists.List;

import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException;

public final class CrosswordHelper { private CrosswordHelper() {

}

public static void main(String[] args) throws IOException { assert args != null : “args can’t be null”;

if (args.length < 2) {

System.out.println(“Usage CrosswordHelper <word-list> <pattern>”); System.exit(-1);

}

searchForPattern(loadWords(args[0]), args[1]);

}

...

}

The method loadWords() takes the name of a file containing words — one each per line — and returns a fully populated ternary search tree. It starts by creating a ternary search tree into which the words will be stored. It then opens the file, reading each line and adding the word to the tree. The file is then closed and the newly populated tree is returned to the caller:

372

Ternary Search Trees

private static TernarySearchTree loadWords(String fileName) throws IOException {

TernarySearchTree tree = new TernarySearchTree();

System.out.println(“Loading words from ‘“ + fileName + “‘...”);

BufferedReader reader = new BufferedReader(new FileReader(fileName));

try {

String word;

while ((word = reader.readLine()) != null) { tree.add(word);

}

} finally { reader.close();

}

return tree;

}

Finally, you have the method that actually performs the search: searchForPattern(). This method simply creates a list for holding the results, calls patternMatch(), passing the pattern and the list, and then iterates over the results printing each one to the console:

private static void searchForPattern(TernarySearchTree tree, String pattern) { assert tree != null : “tree can’t be null”;

System.out.println(“Searching for pattern ‘“ + pattern + “‘...”);

List words = new LinkedList(); tree.patternMatch(pattern, words);

Iterator iterator = words.iterator();

for (iterator.first(); !iterator.isDone(); iterator.next()) { System.out.println(iterator.current());

}

}

Running the crossword helper with a list of around 114,000 English words for the pattern “a?r???t” produced the following results:

Loading words from ‘words.txt’...

Searching for pattern ‘a?r???t’...

abreact abreast acrobat aeriest airboat airiest airlift airport

373