- •Credits
- •About the Authors
- •About the Reviewers
- •www.PacktPub.com
- •Table of Contents
- •Preface
- •Introduction
- •Installing Groovy on Windows
- •Installing Groovy on Linux and OS X
- •Executing Groovy code from the command line
- •Using Groovy as a command-line text file editor
- •Running Groovy with invokedynamic support
- •Building Groovy from source
- •Managing multiple Groovy installations on Linux
- •Using groovysh to try out Groovy commands
- •Starting groovyConsole to execute Groovy snippets
- •Configuring Groovy in Eclipse
- •Configuring Groovy in IntelliJ IDEA
- •Introduction
- •Using Java classes from Groovy
- •Embedding Groovy into Java
- •Compiling Groovy code
- •Generating documentation for Groovy code
- •Introduction
- •Searching strings with regular expressions
- •Writing less verbose Java Beans with Groovy Beans
- •Inheriting constructors in Groovy classes
- •Defining code as data in Groovy
- •Defining data structures as code in Groovy
- •Implementing multiple inheritance in Groovy
- •Defining type-checking rules for dynamic code
- •Adding automatic logging to Groovy classes
- •Introduction
- •Reading from a file
- •Reading a text file line by line
- •Processing every word in a text file
- •Writing to a file
- •Replacing tabs with spaces in a text file
- •Deleting a file or directory
- •Walking through a directory recursively
- •Searching for files
- •Changing file attributes on Windows
- •Reading data from a ZIP file
- •Reading an Excel file
- •Extracting data from a PDF
- •Introduction
- •Reading XML using XmlSlurper
- •Reading XML using XmlParser
- •Reading XML content with namespaces
- •Searching in XML with GPath
- •Searching in XML with XPath
- •Constructing XML content
- •Modifying XML content
- •Sorting XML nodes
- •Serializing Groovy Beans to XML
- •Introduction
- •Parsing JSON messages with JsonSlurper
- •Constructing JSON messages with JsonBuilder
- •Modifying JSON messages
- •Validating JSON messages
- •Converting JSON message to XML
- •Converting JSON message to Groovy Bean
- •Using JSON to configure your scripts
- •Introduction
- •Creating a database table
- •Connecting to an SQL database
- •Modifying data in an SQL database
- •Calling a stored procedure
- •Reading BLOB/CLOB from a database
- •Building a simple ORM framework
- •Using Groovy to access Redis
- •Using Groovy to access MongoDB
- •Using Groovy to access Apache Cassandra
- •Introduction
- •Downloading content from the Internet
- •Executing an HTTP GET request
- •Executing an HTTP POST request
- •Constructing and modifying complex URLs
- •Issuing a REST request and parsing a response
- •Issuing a SOAP request and parsing a response
- •Consuming RSS and Atom feeds
- •Using basic authentication for web service security
- •Using OAuth for web service security
- •Introduction
- •Querying methods and properties
- •Dynamically extending classes with new methods
- •Overriding methods dynamically
- •Adding performance logging to methods
- •Adding transparent imports to a script
- •DSL for executing commands over SSH
- •DSL for generating reports from logfiles
- •Introduction
- •Processing collections concurrently
- •Downloading files concurrently
- •Splitting a large task into smaller parallel jobs
- •Running tasks in parallel and asynchronously
- •Using actors to build message-based concurrency
- •Using STM to atomically update fields
- •Using dataflow variables for lazy evaluation
- •Index
10
Concurrent Programming in Groovy
In this chapter, we will cover:
ff
ff
ff
ff
ff
ff
ff
Processing collections concurrently
Downloading files concurrently
Splitting a large task into smaller parallel jobs Running tasks in parallel and asynchronously Using actors to build message-based concurrency
Using STM to atomically update fields Using dataflow variables for lazy evaluation
Introduction
The chapter you are about to read contains several recipes that deal with concurrent programming. We are going to examine a number of very efficient algorithm and paradigms to leverage the modern architecture of multi-core CPUs.
Most of the recipes in this chapter will use the awesome GPars (Groovy Parallel System) framework. GPars, which reached v1.0 at the end of 2012, is now a part of the Groovy distribution. Its main objective is to abstract away the complexity of parallel programming. GPars offers a number of parallel and concurrent programming tools that has almost no paragon in the JVM ecosystem. Most of the recipes will show how to execute tasks in parallel to save time and use resources at their best.
www.it-ebooks.info
Concurrent Programming in Groovy
Processing collections concurrently
As mentioned in the introduction, this chapter's recipes will recourse to the spectacular features of the GPars framework.
In this recipe, we take a look at the Parallelizer, which is the common GPars term that refers to Parallel Collections. These are a number of additional methods added by GPars to the Groovy collection framework, which enable data parallelism techniques.
Getting ready
We will start with setting up the Gradle build (see the Integrating Groovy into the build process using Gradle recipe in Chapter 2, Using Groovy Ecosystem) and the folder structure that we will reuse across the recipes of this chapter. In a new folder aptly called parallel, create a build.gradle file having the following content:
apply plugin: 'groovy'
repositories { mavenCentral()
}
dependencies {
compile 'org.codehaus.groovy:groovy-all:2.1.6' compile 'org.codehaus.gpars:gpars:1.0.0' compile 'com.google.guava:guava:14.0.1'
compile group: 'org.codehaus.groovy.modules.http-builder', name: 'http-builder', version: '0.6'
compile('org.multiverse:multiverse-beta:0.7-RC-1') { transitive = false
}
testCompile 'junit:junit:4.+'
testCompile 'edu.stanford.nlp:stanford-corenlp:1.3.5'
}
Some dependencies may appear obscure, but they will be revealed and explained in every recipe. The GPars dependency is visible after the Groovy one (note that the Groovy distribution is already packaged with GPars 1.0.0, located in the lib folder of the Groovy's binary distribution).
Before delving into the code, we also need to create a folder structure to hold the classes and the tests. Create the following structure in the same folder where the build file resides:
src/main/groovy/org/groovy/cookbook
src/test/groovy/org/groovy/cookbook
336
www.it-ebooks.info
Chapter 10
How to do it...
In the following steps, we are going to fill our sample project structure with code.
1.Let's create a unit test, named ParallelTest.groovy in the new src/test/ groovy/org/groovy/cookbook directory. The unit test class will contain tests in which we sample the various parallel methods available from the
Parallelizer framework: package org.groovy.cookbook
import static groovyx.gpars.GParsPool.* import org.junit.*
import edu.stanford.nlp.process.PTBTokenizer
import edu.stanford.nlp.process.CoreLabelTokenFactory import edu.stanford.nlp.ling.CoreLabel
class ParallelizerTest {
static words = []
...
}
2.Now we add a couple of test setup methods that generate a large collection of test data:
@BeforeClass
static void loadDict() {
def libraryUrl = 'http://www.gutenberg.org/cache/epub/' def bookFile = '17405/pg17405.txt'
def bigText = "${libraryUrl}${bookFile}".toURL() words = tokenize(bigText.text)
}
static tokenize(String txt) { List<String> words = []
PTBTokenizer ptbt = new PTBTokenizer( new StringReader(txt),
new CoreLabelTokenFactory(),
''
)
337
www.it-ebooks.info
Concurrent Programming in Groovy
ptbt.each { entry -> words << entry.value()
}
words
}
3.And finally, add some tests:
@Test
void testParallelEach() { withPool {
words.eachParallel { token -> if (token.length() > 10 && !token.startsWith('http')) {
println token
}
}
}
}
@Test
void testEveryParallel() { withPool {
assert !(words.everyParallel { token -> token.length() > 20
})
}
}
@Test
void combinedParallel() { withPool {
println words
.findAllParallel { it.length() > 10 && !it.startsWith('http') }
.groupByParallel { it.length() }
.collectParallel { "WORD LENGTH ${it.key}: " + it.value*.toLowerCase().unique() }
}
}
338
www.it-ebooks.info
Chapter 10
How it works...
In this test, we sample some of the methods available through the GParsPool class. This class uses a "fork/join" based pool (see http://en.wikipedia.org/wiki/
Fork%E2%80%93join_queue), to provide parallel variants of the common Groovy iteration methods such as each, collect, findAll, and others.
The tokenize method, in step 2, is used to split a large text downloaded from the Internet into a list of "tokens". To perform this operation, we use the excellent NLP (Natural Language Processing) library from Stanford University. This library allows fast and error-free tokenizing of any English text. What really counts here is that, we are able to quickly create a large List of values, on which we can test some parallel methods. The downloaded text comes from the Gutenberg project website, a large repository of literary works stored in plain text. We have already used files from the Gutenberg project in the Defining data structures as code in
Groovy recipe in Chapter 3, Using Groovy Language Features and Processing every word in a text file recipe from Chapter 4, Working with Files in Groovy.
All the tests require the GParsPool class. The withPool method is statically imported for brevity.
The first test uses eachParallel to traverse the List and print the tokens if a certain condition is met. On an 8-core processor, this method is between 35 percent and 45 percent faster than the sequential equivalent.
The third test shows a slightly more complex usage of the Parallelizer API and demonstrates how to combine several methods to aggregate data. The list is first filtered out by word length, and then a grouping is executed on the token length itself, and finally, the collectParallel method is used to create a parallel array out of the supplied collection. The result of the previous test would print something as follows:
[WORD LENGTH 22: [-LRB-801-RRB- 596-1887], WORD LENGTH 20: [trademark\/copyright] WORD LENGTH 19: [straightforwardness], WORD LENGTH 18: [commander-in-chief,
business@pglaf.org],...
The original list of tokens is aggregated into a Map, where the key is the word length and the value is a List of words having that length found in the text.
339
www.it-ebooks.info