Recognizing and Predicting the
Impact on Human Emotion (Affect)
Using Computing Systems
David G. Cooper
University of Massachusetts, Department of Computer Science,
140 Governors Drive, Amherst MA 01003, USA
dcooper@cs.umass.edu
http://www.cs.umass.edu/~dcooper
Abstract. Emotional intelligence is a clear factor in education [1–3],
health care [4], and day to day interaction. With the increasing use of
computer technology, computers are interacting with more and more individuals.
This interaction provides an opportunity to increase knowledge
about human emotion for human consumption, well-being, and improved
computer adaptation.
This research makes five main contributions. 1) Construct a method
for determining a set of sensor features that can be automatically
processed to predict human emotional changes in observed people. 2)
Identify principles, algorithms, and classifiers that enable computational
recognition of human emotion. 3) Apply this method to an intelligent tutoring
system instrumented with sensors. 4) Apply and adapt the method
to audio and video sensors for a number of applications such as a) detection
of psychological disorders, b) detection of emotional changes in
health care providers, c) detection of emotional impact of one person on
another during video chat, and/or d) detection of emotional impact of
one fictional character on another in a motion picture. 5) Integrate emotional
detection technologies so that they can be used in more realistic
settings.
Keywords: emotional interaction, multi-sensor affective processing,
smart environments, actionable affect, social signal processing.
Approach
I intend to research affective processing in three domains: Intelligent Tutoring
Systems (ITS), clinical voice analysis, and personal interaction. The method
will consist of data exploration over a number of data sets. For the ITS domain,
we have already collected data from five different schools with more than ten
classrooms and over 600 students using between 0 and 4 sensors for one ITS. For
the clinical voice analysis we have data from five studies. Three of the studies
have a very consistent protocol as far as data collection, however the population
and the reason for collection differ. The other two studies are not as consistently
P. De Bra, A. Kobsa, and D. Chin (Eds.): UMAP 2010, LNCS 6075, pp. 399–402, 2010.
_c Springer-Verlag Berlin Heidelberg 2010
400 D.G. Cooper
controlled, and may have more artifacts. The studies are a multicultural study
with 4 different cultures, 10 subjects from each culture balanced for gender and
age [5]; a study with a Greek population where the individuals were shown
pictures meant to elicit an emotion and spontaneous speech was collected; one
additional study is with an examiner and a child with typical development,
apraxia of speech, and autism conditions. For the personal interaction domain
a study will be developed in either a nursing lab, an architecture critique, or
an office interaction and will use audio and video processing as the source of
affective features.
The first domain is using affect for the Intelligent Tutoring System (ITS)
Wayang Outpost. For Wayang Outpost, data has been collected for more than
600 students using between 0 and 4 sensors in a classroom environment. Along
with this data are a sparse set of emotional labels pertaining to 4 different emotional
states (Frustration, Confidence, Interest, and Excitement). So far I have
shown that linear classifiers can be created to get good Specificity for Confidence,
and good Sensitivity for Interest and Excitement using basic statistics on
a per problem basis. Results of a feature selection and ranking are summarized
in Table 1. This is a follow-on study to [6].
Table 1. Classifier Ranking Using Validation data from the Spring of 2009. Parametric
results and Non-parametric results are shown side by side.
Confident Tukey HSD NPMC
Specificity (confCameraA ∼ confT utorA ∼
confT utorM) > (confSeat ∼
confT utorW) > confBasline
confCameraB > confT utorW >
confBaseline
(confCameraA ∼ confT utorA ∼
confT utorM) > (confSeat ∼
confT utorW) > confBasline
confCameraB > confT utorW >
confBaseline
Interested Tukey HSD NPMC
Sensitivity intCamera > intBaseline intCamera > intBaseline
Excited Tukey HSD NPMC
Sensitivity ((excCamera > excTutor) ∼
excCameraSeat) > excBaseline
excCamera > excCameraSeat >
excTutor > excBaseline
The next steps include feature improvement, such as finding event related
sensor features, finding ‘time series motifs’ [7] based on the time-series data, and
using other sensor specific methods; in addition, applying more advanced classifiers
such as support vector clustering [8], the group method of data handling
[9, 10], decision trees, and random forests will likely improve the current results.
In addition, in order to move to sensors that don’t have to be on or near the
body, I plan to integrate video and audio based emotional detection systems in
order to run meaningful experiments for the detection of emotional impact. To
that end, I intend to both extract new features from the video in the tutor data
(e.g. head position, head motion, looking away) and utilize new video features
Recognizing and Predicting the Impact on Human Emotion 401
from distal video, such as body and face position relative to another body, body
and hand gestures and articulation, as well as audio features, such as prosody
(rate of speech), inflection, and other acoustical changes in speech. The audio
features will be explored in the clinical voice domain before they are used in the
personal interaction domain.
Using the Viola-Jones face detector [11] implemented in OpenCV, the faces
of the tracked people can be detected, extracted and sent to a facial feature
tracker such as [12] used with the Wayang Outpost Tutor. The difficulties in
this are getting a connected sequence of faces at a fast enough frame rate, and
the resolution. Thus, it is likely that when the face is too far from any camera,
that other features will need to be relied upon such as audio features and body
gesture, head position and motion.
There are a number of ways that researchers have categorized the observation
of emotion. The two most prevalent are 1) a two dimensional feature space, and 2)
a discrete set of classes for emotion. The two dimensional feature space consists of
valence (or the pleasantness of an experience) ranging from negative to positive
and arousal, ranging from low to high [13]. This two dimensional space tends to
be adequate for generating agreement when placing an affective label on it and
has been used in connection with observing facial expressions and physiological
features since 1954 [14, 15]. A similar two dimensional scale developed by Ralph
Bierman was created for the purpose of personal interaction (PICI) [16]. The
two dimensions are rejecting-accepting and passive-active. They relate to how
one individual is interacting with another. Though we do not use the valence
and arousal dimensions directly, they may become useful factors for audio. In
addition the PICI may be a good first step for looking at personal interaction.
In the case of a student interacting with an intelligent tutoring system, this
scale may be useful at the extremes of accepting and rejecting, however the
personal and impersonal parts of the scale may be skipped altogether. In the
personal interaction domain, location of the two persons relating to each other
to determine proximity could imply acceptance vs. rejection, and an ability to
detect the amount of motion of each body may indicate activity. Looking at full
body motion in video has been done for identification [17] and for estimating
interaction cues such as head pose, fidgeting, body pose, etc. [18].
The time-line for this research is to perform a nursing student study over
the next four to five months using a lab that is already instrumented and using
pre-test and post-test emotional reports as labels. I will spend a few months
developing emotional classification methods on the data from the ITS, and then
I will spend another few months adapting and applying those methods to the
clinical voice data. I will then apply both the voice analysis and the ITS based
video interaction analysis to the Nursing Study. The goal is to finish this research
by May of 2011.
I would appreciate advice on the details of the study to be performed. If there
is a group with an instrumented room that might be interested in this research,
then that would be a great help.
Modeling Long-Term Search Engine Usage
Ryen W. White, Ashish Kapoor, and Susan T. Dumais
Microsoft Research,
One Microsoft Way, Redmond WA 98052, USA
{ryenw,akapoor,sdumais}@microsoft.com
Abstract. Search engines are key components in the online world and the
choice of search engine is an important determinant of the user experience. In
this work we seek to model user behaviors and determine key variables that affect
search engine usage. In particular, we study the engine usage behavior of
more than ten thousand users over a period of six months and use machine
learning techniques to identify key trends in the usage of search engines and
their relationship with user satisfaction. We also explore methods to determine
indicators that are predictive of user trends and show that accurate predictive
user models of search engine usage can be developed. Our findings have implications
for users as well as search engine designers and marketers seeking to
better understand and retain their users.
Keywords: Search Engine, Predictive Model.