Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Desktop / Recognizing and Predicting the.doc
Скачиваний:
11
Добавлен:
30.03.2015
Размер:
67.07 Кб
Скачать

Recognizing and Predicting the

Impact on Human Emotion (Affect)

Using Computing Systems

David G. Cooper

University of Massachusetts, Department of Computer Science,

140 Governors Drive, Amherst MA 01003, USA

dcooper@cs.umass.edu

http://www.cs.umass.edu/~dcooper

Abstract. Emotional intelligence is a clear factor in education [1–3],

health care [4], and day to day interaction. With the increasing use of

computer technology, computers are interacting with more and more individuals.

This interaction provides an opportunity to increase knowledge

about human emotion for human consumption, well-being, and improved

computer adaptation.

This research makes five main contributions. 1) Construct a method

for determining a set of sensor features that can be automatically

processed to predict human emotional changes in observed people. 2)

Identify principles, algorithms, and classifiers that enable computational

recognition of human emotion. 3) Apply this method to an intelligent tutoring

system instrumented with sensors. 4) Apply and adapt the method

to audio and video sensors for a number of applications such as a) detection

of psychological disorders, b) detection of emotional changes in

health care providers, c) detection of emotional impact of one person on

another during video chat, and/or d) detection of emotional impact of

one fictional character on another in a motion picture. 5) Integrate emotional

detection technologies so that they can be used in more realistic

settings.

Keywords: emotional interaction, multi-sensor affective processing,

smart environments, actionable affect, social signal processing.

Approach

I intend to research affective processing in three domains: Intelligent Tutoring

Systems (ITS), clinical voice analysis, and personal interaction. The method

will consist of data exploration over a number of data sets. For the ITS domain,

we have already collected data from five different schools with more than ten

classrooms and over 600 students using between 0 and 4 sensors for one ITS. For

the clinical voice analysis we have data from five studies. Three of the studies

have a very consistent protocol as far as data collection, however the population

and the reason for collection differ. The other two studies are not as consistently

P. De Bra, A. Kobsa, and D. Chin (Eds.): UMAP 2010, LNCS 6075, pp. 399–402, 2010.

_c Springer-Verlag Berlin Heidelberg 2010

400 D.G. Cooper

controlled, and may have more artifacts. The studies are a multicultural study

with 4 different cultures, 10 subjects from each culture balanced for gender and

age [5]; a study with a Greek population where the individuals were shown

pictures meant to elicit an emotion and spontaneous speech was collected; one

additional study is with an examiner and a child with typical development,

apraxia of speech, and autism conditions. For the personal interaction domain

a study will be developed in either a nursing lab, an architecture critique, or

an office interaction and will use audio and video processing as the source of

affective features.

The first domain is using affect for the Intelligent Tutoring System (ITS)

Wayang Outpost. For Wayang Outpost, data has been collected for more than

600 students using between 0 and 4 sensors in a classroom environment. Along

with this data are a sparse set of emotional labels pertaining to 4 different emotional

states (Frustration, Confidence, Interest, and Excitement). So far I have

shown that linear classifiers can be created to get good Specificity for Confidence,

and good Sensitivity for Interest and Excitement using basic statistics on

a per problem basis. Results of a feature selection and ranking are summarized

in Table 1. This is a follow-on study to [6].

Table 1. Classifier Ranking Using Validation data from the Spring of 2009. Parametric

results and Non-parametric results are shown side by side.

Confident Tukey HSD NPMC

Specificity (confCameraA confT utorA

confT utorM) > (confSeat

confT utorW) > confBasline

confCameraB > confT utorW >

confBaseline

(confCameraA confT utorA

confT utorM) > (confSeat

confT utorW) > confBasline

confCameraB > confT utorW >

confBaseline

Interested Tukey HSD NPMC

Sensitivity intCamera > intBaseline intCamera > intBaseline

Excited Tukey HSD NPMC

Sensitivity ((excCamera > excTutor)

excCameraSeat) > excBaseline

excCamera > excCameraSeat >

excTutor > excBaseline

The next steps include feature improvement, such as finding event related

sensor features, finding ‘time series motifs’ [7] based on the time-series data, and

using other sensor specific methods; in addition, applying more advanced classifiers

such as support vector clustering [8], the group method of data handling

[9, 10], decision trees, and random forests will likely improve the current results.

In addition, in order to move to sensors that don’t have to be on or near the

body, I plan to integrate video and audio based emotional detection systems in

order to run meaningful experiments for the detection of emotional impact. To

that end, I intend to both extract new features from the video in the tutor data

(e.g. head position, head motion, looking away) and utilize new video features

Recognizing and Predicting the Impact on Human Emotion 401

from distal video, such as body and face position relative to another body, body

and hand gestures and articulation, as well as audio features, such as prosody

(rate of speech), inflection, and other acoustical changes in speech. The audio

features will be explored in the clinical voice domain before they are used in the

personal interaction domain.

Using the Viola-Jones face detector [11] implemented in OpenCV, the faces

of the tracked people can be detected, extracted and sent to a facial feature

tracker such as [12] used with the Wayang Outpost Tutor. The difficulties in

this are getting a connected sequence of faces at a fast enough frame rate, and

the resolution. Thus, it is likely that when the face is too far from any camera,

that other features will need to be relied upon such as audio features and body

gesture, head position and motion.

There are a number of ways that researchers have categorized the observation

of emotion. The two most prevalent are 1) a two dimensional feature space, and 2)

a discrete set of classes for emotion. The two dimensional feature space consists of

valence (or the pleasantness of an experience) ranging from negative to positive

and arousal, ranging from low to high [13]. This two dimensional space tends to

be adequate for generating agreement when placing an affective label on it and

has been used in connection with observing facial expressions and physiological

features since 1954 [14, 15]. A similar two dimensional scale developed by Ralph

Bierman was created for the purpose of personal interaction (PICI) [16]. The

two dimensions are rejecting-accepting and passive-active. They relate to how

one individual is interacting with another. Though we do not use the valence

and arousal dimensions directly, they may become useful factors for audio. In

addition the PICI may be a good first step for looking at personal interaction.

In the case of a student interacting with an intelligent tutoring system, this

scale may be useful at the extremes of accepting and rejecting, however the

personal and impersonal parts of the scale may be skipped altogether. In the

personal interaction domain, location of the two persons relating to each other

to determine proximity could imply acceptance vs. rejection, and an ability to

detect the amount of motion of each body may indicate activity. Looking at full

body motion in video has been done for identification [17] and for estimating

interaction cues such as head pose, fidgeting, body pose, etc. [18].

The time-line for this research is to perform a nursing student study over

the next four to five months using a lab that is already instrumented and using

pre-test and post-test emotional reports as labels. I will spend a few months

developing emotional classification methods on the data from the ITS, and then

I will spend another few months adapting and applying those methods to the

clinical voice data. I will then apply both the voice analysis and the ITS based

video interaction analysis to the Nursing Study. The goal is to finish this research

by May of 2011.

I would appreciate advice on the details of the study to be performed. If there

is a group with an instrumented room that might be interested in this research,

then that would be a great help.

Modeling Long-Term Search Engine Usage

Ryen W. White, Ashish Kapoor, and Susan T. Dumais

Microsoft Research,

One Microsoft Way, Redmond WA 98052, USA

{ryenw,akapoor,sdumais}@microsoft.com

Abstract. Search engines are key components in the online world and the

choice of search engine is an important determinant of the user experience. In

this work we seek to model user behaviors and determine key variables that affect

search engine usage. In particular, we study the engine usage behavior of

more than ten thousand users over a period of six months and use machine

learning techniques to identify key trends in the usage of search engines and

their relationship with user satisfaction. We also explore methods to determine

indicators that are predictive of user trends and show that accurate predictive

user models of search engine usage can be developed. Our findings have implications

for users as well as search engine designers and marketers seeking to

better understand and retain their users.

Keywords: Search Engine, Predictive Model.

Соседние файлы в папке Desktop