Results 1 - 10
of
46
Social Signal Processing: Survey of an Emerging Domain
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 153 (32 self)
- Add to MetaCart
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.
Human-Centred Intelligent Human-Computer Interaction (HCI²): . . .
, 2008
"... A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. To realise this prediction, next-generation computing should develop anticipatory user interfaces that are hum ..."
Abstract
-
Cited by 33 (16 self)
- Add to MetaCart
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. To realise this prediction, next-generation computing should develop anticipatory user interfaces that are human-centred, built for humans and based on naturally occurring multimodal human communication. These interfaces should transcend the traditional keyboard and mouse and have the capacity to understand and emulate human communicative intentions as expressed through behavioural cues, such as affective and social signals. This article discusses how far we are to the goal of human-centred computing and Human-Centred Intelligent Human-Computer Interaction (HCI²) that can understand and respond to multimodal human communication.
Continuous prediction of spontaneous affect from multiple cues and modalities in valencearousal space
- IEEE Transactions on Affective Computing
, 2011
"... Abstract—Past research in analysis of human affect has focused on recognition of prototypic expressions of six basic emotions based on posed data acquired in laboratory settings. Recently, there has been a shift toward subtle, continuous, and context-specific interpretations of affective displays re ..."
Abstract
-
Cited by 33 (5 self)
- Add to MetaCart
(Show Context)
Abstract—Past research in analysis of human affect has focused on recognition of prototypic expressions of six basic emotions based on posed data acquired in laboratory settings. Recently, there has been a shift toward subtle, continuous, and context-specific interpretations of affective displays recorded in naturalistic and real-world settings, and toward multimodal analysis and recognition of human affect. Converging with this shift, this paper presents, to the best of our knowledge, the first approach in the literature that: 1) fuses facial expression, shoulder gesture, and audio cues for dimensional and continuous prediction of emotions in valence and arousal space, 2) compares the performance of two state-of-the-art machine learning techniques applied to the target problem, the bidirectional Long Short-Term Memory neural networks (BLSTM-NNs), and Support Vector Machines for Regression (SVR), and 3) proposes an output-associative fusion framework that incorporates correlations and covariances between the emotion dimensions. Evaluation of the proposed approach has been done using the spontaneous SAL data from four subjects and subject-dependent leave-one-sequence-out cross validation. The experimental results obtained show that: 1) on average, BLSTM-NNs outperform SVR due to their ability to learn past and future context, 2) the proposed output-associative fusion framework outperforms feature-level and model-level fusion by modeling and learning correlations and patterns between the valence and arousal dimensions, and 3) the proposed system is well able to reproduce the valence and arousal ground truth obtained from human coders. Index Terms—Dimensional affect recognition, continuous affect prediction, valence and arousal dimensions, facial expressions, shoulder gestures, emotional acoustic signals, multicue and multimodal fusion, output-associative fusion. Ç 1
Social Signal Processing: State-of-the-art and future perspectives of an emerging domain
- IN PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
(Show Context)
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes aset of recommendations for enabling the development of the next generation of socially-aware computing.
SUPPORT VECTOR REGRESSION FOR AUTOMATIC RECOGNITION OF SPONTANEOUS EMOTIONS IN SPEECH
"... We present novel methods for estimating spontaneously expressed emotions in speech. Three continuous-valued emotion primitives are used to describe emotions, namely valence, activation, and dominance. For the estimation of these primitives, Support Vector Machines (SVMs) are used in their applicatio ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
(Show Context)
We present novel methods for estimating spontaneously expressed emotions in speech. Three continuous-valued emotion primitives are used to describe emotions, namely valence, activation, and dominance. For the estimation of these primitives, Support Vector Machines (SVMs) are used in their application for regression (Support Vector Regression, SVR). Feature selection and parameter optimization are studied. The data was recorded from 47 speakers in a German talk-show on TV. The results were compared to a rule-based Fuzzy Logic classifier and a Fuzzy k-Nearest Neighbor classifier. SVR was found to give the best results and to be suited well for emotion estimation yielding small classification errors and high correlation between estimates and reference.
The sensitive artificial listener: an induction technique for generating emotionally coloured conversation
- in LREC2008 Workshop on Corpora for Research on Emotion and Affect
, 2008
"... The aim of the paper is to document and share an induction technique (The Sensitive Artificial Listener) that generates data that can be both tractable and reasonably naturalistic. The technique focuses on conversation between a human and an agent that either is or appears to be a machine. It is des ..."
Abstract
-
Cited by 25 (6 self)
- Add to MetaCart
The aim of the paper is to document and share an induction technique (The Sensitive Artificial Listener) that generates data that can be both tractable and reasonably naturalistic. The technique focuses on conversation between a human and an agent that either is or appears to be a machine. It is designed to capture a broad spectrum of emotional states, expressed in ‘emotionally coloured discourse’ of the type likely to be displayed in everyday conversation. The technique is based on the observation that it is possible for two people to have a conversation in which one pays little or no attention to the meaning of what the other says, and chooses responses on the basis of superficial cues. In SAL, system responses take the form of a repertoire of stock phrases keyed to the emotional colouring of what the user says. The technique has been used to collect data of sufficient quantity and quality to train machine recognition systems. 1
Audio-Visual Affect Recognition
- IEEE Transactions on Multimedia
, 2007
"... Abstract—The ability of a computer to detect and appropriately respond to changes in a user’s affective state has significant implications to Human–Computer Interaction (HCI). In this paper, we present our efforts toward audio–visual affect recognition on 11 affective states customized for HCI appli ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
(Show Context)
Abstract—The ability of a computer to detect and appropriately respond to changes in a user’s affective state has significant implications to Human–Computer Interaction (HCI). In this paper, we present our efforts toward audio–visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5 % improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1 % improvement over the best component performance. Index Terms—Affect recognition, affective computing, emotion recognition, multimodal human–computer interaction. I.
String-based audiovisual fusion of behavioural events for the assessment of dimensional affect
- in Proc. IEEE Intl. Conf. Automatic Face and Gesture Recognition
, 2011
"... Abstract — The automatic assessment of affect is mostly based on feature-level approaches, such as distances between facial points or prosodic and spectral information when it comes to audiovisual analysis. However, it is known and intuitive that behavioural events such as smiles, head shakes or lau ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
(Show Context)
Abstract — The automatic assessment of affect is mostly based on feature-level approaches, such as distances between facial points or prosodic and spectral information when it comes to audiovisual analysis. However, it is known and intuitive that behavioural events such as smiles, head shakes or laughter and sighs also bear highly relevant information regarding a subject’s affective display. Accordingly, we propose a novel string-based prediction approach to fuse such events and to predict human affect in a continuous dimensional space. Extensive analysis and evaluation has been conducted using the newly released SEMAINE database of human-to-agent communication. For a thorough understanding of the obtained results, we provide additional benchmarks by more conventional feature-level modelling, and compare these and the stringbased approach to fusion of signal-based features and stringbased events. Our experimental results show that the proposed string-based approach is the best performing approach for automatic prediction of Valence and Expectation dimensions, and improves prediction performance for the other dimensions when combined with at least acoustic signal-based features. I.
Audio-visual emotion recognition in adult attachment interview
- In Proceedings of the 8th international conference on Multimodal interfaces
, 2006
"... Automatic multimodal recognition of spontaneous affective expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting--Adult Attachment Interview (AAI). Based on the assumption that facial expressio ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
(Show Context)
Automatic multimodal recognition of spontaneous affective expressions is a largely unexplored and challenging problem. In this paper, we explore audio-visual emotion recognition in a realistic human conversation setting--Adult Attachment Interview (AAI). Based on the assumption that facial expression and vocal expression be at the same coarse affective states, positive and negative emotion sequences are labeled according to Facial Action Coding System Emotion Codes. Facial texture in visual channel and prosody in audio channel are integrated in the framework of Adaboost multi-stream hidden Markov model (AMHMM) in which Adaboost learning scheme is used to build component HMM fusion. Our approach is evaluated in the preliminary AAI spontaneous emotion recognition experiments.
Emotion estimation in speech using a 3D emotion space concept
- Robust Speech Recognition and Understanding, I-Tech Education and Publishing, 2007
"... ..."
(Show Context)