Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Andrey Adamovich - Groovy 2 Cookbook - 2013.pdf
Скачиваний:
44
Добавлен:
19.03.2016
Размер:
26.28 Mб
Скачать

Working with Files in Groovy

Processing every word in a text file

Sometimes, you may need to make a word-based analysis of a text file, for example, for spell checking or statistics. This recipe shows how a file can be read word-by-word in Groovy.

Getting ready

For this recipe, you can create a new Groovy script file and download a large text file for testing purposes. The Project Gutenberg website has thousands of text files that can be used for text analysis, for example, William Shakespeare's Macbeth, available at

http://www.gutenberg.net/cache/epub/2264/pg2264.txt.

How to do it...

We assume that the pg2264.txt file containing Shakespeare's masterpiece Macbeth has been downloaded, but any large text file will do for this example.

1.Add the following code to the Groovy script:

def file = new File('pg2264.txt') // Macbeth int wordCount = 0

file.eachLine { String line -> line.tokenize().each { String word ->

wordCount++ println word

}

}

println "Number of words: $wordCount"

2.After the execution, the script should terminate with the following output:

...

FINIS. THE TRAGEDIE OF MACBETH.

Number of words: 20366

140

www.it-ebooks.info

Chapter 4

How it works...

The snippet in the previous paragraph prints every word in the file on a separate line, and at the end, it outputs a total number of words. The simplest way to pick all the words from a file is by reading the file line by line with the help of the eachLine method (described in the Reading a text file line by line recipe) and then splitting each line into words. The java. lang.String class already provides a split method that takes a regular expression for a word separator. There is also a tokenize method added by the Groovy JDK. This splits a given string into a collection of strings by using a whitespace separator ([\s\t]+).

The tokenize method is equivalent to the following split method call:

line.split(/[\s\t]+/).findAll{ it.trim() }.each { String word ->

...

}

Note that we filtered the empty words from the result of the split method using the findAll method available in all collections in Groovy. The tokenize method does this cleaning automatically for us.

There's more...

Another way to split words in a text file is to use the splitEachLine method from java.io.File Groovy's extension. Like String's split method, it also takes a regular expression as an input, as well as a closure to which it passes the collection of strings received from a line split. With the help of this method, our original code snippet can be rewritten in the following way:

int wordCount = 0

file.splitEachLine(/[\s\t]+/) { Collection words -> words.findAll{ it.trim() }.each { String word ->

wordCount++ println word

}

}

println "Number of words: $wordCount"

Also, similar to the split method, we need to filter empty words to get the same number of words.

See also

ff http://groovy.codehaus.org/groovy-jdk/java/io/File.html

ff http://groovy.codehaus.org/groovy-jdk/java/lang/String.html

141

www.it-ebooks.info

Working with Files in Groovy

Writing to a file

Java's I/O API demands a lot of "ceremony code" to cover the file output operations (and actually any other I/O resource). Groovy adds several extensions and syntax sugar to hide Java's complexity and make the code more concise than its Java counterpart. In this recipe, we will cover the file writing methods that are available in Groovy.

Getting ready

To start writing to a file, you just need to create an instance of java.io.File, for example:

File file = new File('output.txt')

How to do it...

Now let's see which writing operations we can perform on a File object.

1.To replace the full text of the file content with a String you can use the setText extension method of the java.io.File (or just Groovy's syntax for property assignment):

file.text = 'Just a text'

2.This also gives you the possibility of assigning multiple text lines at once:

file.text = '''What's in a name? That which we call a rose By any other name would smell as sweet.'''

3.You can also assign binary content with the help of the setBytes method: file.bytes = [ 65, 66, 67, 68 ] as byte[]

4.If you just want to append some text at the end of the file, you can use the append method:

file.append('What\'s in a name? That which we call a rose,\n') file.append('By any other name would smell as sweet.')

5.A more idiomatic way to append text to a file is by using the leftShift operator (<<).

file << 'What\'s in a name? That which we call a rose\n' file << 'By any other name would smell as sweet.'

142

www.it-ebooks.info

Chapter 4

6.You can also take advantage of the java.io.Writer with the help of the withWriter method:

file.withWriter { Writer writer ->

writer << 'What\'s in a name? That which we call a rose\n' writer << 'By any other name would smell as sweet.'

}

7.And if you prefer working with streams, you can also do it with the help of the withOutputStream:

file.withOutputStream { OutputStream stream ->

stream << 'What\'s in a name? That which we call a rose\n' stream << 'By any other name would smell as sweet.'

}

How it works...

In the first and second code snippets, we used the setText method. It is actually absolutely equivalent to the write method that also replaces the file content with the string that is passed. Those methods exist to give the developer the freedom to choose the best wording to describe his or her intent.

The functionality of the leftShift method is exactly the same as for the append method. We used both in steps 4 and 5. There are other special method names in Groovy such as plus for + or minus for - , which you can add to your own classes to be able to write more concise expressions with the help of well-known arithmetical operators.

Each call to the append or leftShift methods opens and closes the file every time you execute them. That's, of course, not very efficient if you need to perform many write operations. To get more control, you can operate on the java.io.Writer object within a closure passed to the withWriter method that we used in the sixth example.

As you might have noticed, we also make use of the left shift (<<) operator on the writer object. That operator is made available by Groovy on all the java.io.Writer instances,

just like with java.io.File, and even with java.io.OutputStream.

The withWriter method creates a new instance of java.io.BufferedWriter and ensures it is flushed and closed upon return. The file content will be fully replaced. In a similar way, you can use the withWriterAppend method to use the writer object to add content to the end of the file. Another available method is withPrintWriter, which gives you access to the java.io.PrintWriter instance within a closure.

Another way of writing to a file implies using the withOutputStream method, which is similar to withWriter with the exception that the closure operates on the java.io.BufferedOutputStream instance as you can see in the seventh snippet.

143

www.it-ebooks.info

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]