Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Andrey Adamovich - Groovy 2 Cookbook - 2013.pdf
Скачиваний:
44
Добавлен:
19.03.2016
Размер:
26.28 Mб
Скачать

10

Concurrent Programming in Groovy

In this chapter, we will cover:

ff

ff

ff

ff

ff

ff

ff

Processing collections concurrently

Downloading files concurrently

Splitting a large task into smaller parallel jobs Running tasks in parallel and asynchronously Using actors to build message-based concurrency

Using STM to atomically update fields Using dataflow variables for lazy evaluation

Introduction

The chapter you are about to read contains several recipes that deal with concurrent programming. We are going to examine a number of very efficient algorithm and paradigms to leverage the modern architecture of multi-core CPUs.

Most of the recipes in this chapter will use the awesome GPars (Groovy Parallel System) framework. GPars, which reached v1.0 at the end of 2012, is now a part of the Groovy distribution. Its main objective is to abstract away the complexity of parallel programming. GPars offers a number of parallel and concurrent programming tools that has almost no paragon in the JVM ecosystem. Most of the recipes will show how to execute tasks in parallel to save time and use resources at their best.

www.it-ebooks.info

Concurrent Programming in Groovy

Processing collections concurrently

As mentioned in the introduction, this chapter's recipes will recourse to the spectacular features of the GPars framework.

In this recipe, we take a look at the Parallelizer, which is the common GPars term that refers to Parallel Collections. These are a number of additional methods added by GPars to the Groovy collection framework, which enable data parallelism techniques.

Getting ready

We will start with setting up the Gradle build (see the Integrating Groovy into the build process using Gradle recipe in Chapter 2, Using Groovy Ecosystem) and the folder structure that we will reuse across the recipes of this chapter. In a new folder aptly called parallel, create a build.gradle file having the following content:

apply plugin: 'groovy'

repositories { mavenCentral()

}

dependencies {

compile 'org.codehaus.groovy:groovy-all:2.1.6' compile 'org.codehaus.gpars:gpars:1.0.0' compile 'com.google.guava:guava:14.0.1'

compile group: 'org.codehaus.groovy.modules.http-builder', name: 'http-builder', version: '0.6'

compile('org.multiverse:multiverse-beta:0.7-RC-1') { transitive = false

}

testCompile 'junit:junit:4.+'

testCompile 'edu.stanford.nlp:stanford-corenlp:1.3.5'

}

Some dependencies may appear obscure, but they will be revealed and explained in every recipe. The GPars dependency is visible after the Groovy one (note that the Groovy distribution is already packaged with GPars 1.0.0, located in the lib folder of the Groovy's binary distribution).

Before delving into the code, we also need to create a folder structure to hold the classes and the tests. Create the following structure in the same folder where the build file resides:

src/main/groovy/org/groovy/cookbook

src/test/groovy/org/groovy/cookbook

336

www.it-ebooks.info

Chapter 10

How to do it...

In the following steps, we are going to fill our sample project structure with code.

1.Let's create a unit test, named ParallelTest.groovy in the new src/test/ groovy/org/groovy/cookbook directory. The unit test class will contain tests in which we sample the various parallel methods available from the

Parallelizer framework: package org.groovy.cookbook

import static groovyx.gpars.GParsPool.* import org.junit.*

import edu.stanford.nlp.process.PTBTokenizer

import edu.stanford.nlp.process.CoreLabelTokenFactory import edu.stanford.nlp.ling.CoreLabel

class ParallelizerTest {

static words = []

...

}

2.Now we add a couple of test setup methods that generate a large collection of test data:

@BeforeClass

static void loadDict() {

def libraryUrl = 'http://www.gutenberg.org/cache/epub/' def bookFile = '17405/pg17405.txt'

def bigText = "${libraryUrl}${bookFile}".toURL() words = tokenize(bigText.text)

}

static tokenize(String txt) { List<String> words = []

PTBTokenizer ptbt = new PTBTokenizer( new StringReader(txt),

new CoreLabelTokenFactory(),

''

)

337

www.it-ebooks.info

Concurrent Programming in Groovy

ptbt.each { entry -> words << entry.value()

}

words

}

3.And finally, add some tests:

@Test

void testParallelEach() { withPool {

words.eachParallel { token -> if (token.length() > 10 && !token.startsWith('http')) {

println token

}

}

}

}

@Test

void testEveryParallel() { withPool {

assert !(words.everyParallel { token -> token.length() > 20

})

}

}

@Test

void combinedParallel() { withPool {

println words

.findAllParallel { it.length() > 10 && !it.startsWith('http') }

.groupByParallel { it.length() }

.collectParallel { "WORD LENGTH ${it.key}: " + it.value*.toLowerCase().unique() }

}

}

338

www.it-ebooks.info

Chapter 10

How it works...

In this test, we sample some of the methods available through the GParsPool class. This class uses a "fork/join" based pool (see http://en.wikipedia.org/wiki/

Fork%E2%80%93join_queue), to provide parallel variants of the common Groovy iteration methods such as each, collect, findAll, and others.

The tokenize method, in step 2, is used to split a large text downloaded from the Internet into a list of "tokens". To perform this operation, we use the excellent NLP (Natural Language Processing) library from Stanford University. This library allows fast and error-free tokenizing of any English text. What really counts here is that, we are able to quickly create a large List of values, on which we can test some parallel methods. The downloaded text comes from the Gutenberg project website, a large repository of literary works stored in plain text. We have already used files from the Gutenberg project in the Defining data structures as code in

Groovy recipe in Chapter 3, Using Groovy Language Features and Processing every word in a text file recipe from Chapter 4, Working with Files in Groovy.

All the tests require the GParsPool class. The withPool method is statically imported for brevity.

The first test uses eachParallel to traverse the List and print the tokens if a certain condition is met. On an 8-core processor, this method is between 35 percent and 45 percent faster than the sequential equivalent.

The third test shows a slightly more complex usage of the Parallelizer API and demonstrates how to combine several methods to aggregate data. The list is first filtered out by word length, and then a grouping is executed on the token length itself, and finally, the collectParallel method is used to create a parallel array out of the supplied collection. The result of the previous test would print something as follows:

[WORD LENGTH 22: [-LRB-801-RRB- 596-1887], WORD LENGTH 20: [trademark\/copyright] WORD LENGTH 19: [straightforwardness], WORD LENGTH 18: [commander-in-chief,

business@pglaf.org],...

The original list of tokens is aggregated into a Map, where the key is the word length and the value is a List of words having that length found in the text.

339

www.it-ebooks.info

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]