Results 1 - 10
of
593
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
, 2009
"... Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypi ..."
Abstract
-
Cited by 374 (51 self)
- Add to MetaCart
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions, despite the fact that deliberate behavior differs in visual appearance, audio profile, and timing from spontaneously occurring behavior. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behavior have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.
Automatic Facial Expression Analysis: A Survey
- PATTERN RECOGNITION
, 1999
"... Over the last decade, automatic facial expression analysis has become an active research area that finds potential applications in areas such as more engaging human-computer interfaces, talking heads, image retrieval and human emotion analysis. Facial expressions reflect not only emotions, but ot ..."
Abstract
-
Cited by 295 (0 self)
- Add to MetaCart
Over the last decade, automatic facial expression analysis has become an active research area that finds potential applications in areas such as more engaging human-computer interfaces, talking heads, image retrieval and human emotion analysis. Facial expressions reflect not only emotions, but other mental activities, social interaction and physiological signals. In this survey we introduce the most prominent automatic facial expression analysis methods and systems presented in the literature. Facial motion and deformation extraction approaches as well as classification methods are discussed with respect to issues such as face normalization, facial expression dynamics and facial expression intensity, but also with regard to their robustness towards environmental changes.
Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2007
"... Dynamic texture is an extension of texture to the temporal domain. Description and recognition of dynamic textures have attracted growing attention. In this paper, a novel approach for recognizing dynamic textures is proposed and its simplifications and extensions to facial image analysis are also ..."
Abstract
-
Cited by 291 (35 self)
- Add to MetaCart
(Show Context)
Dynamic texture is an extension of texture to the temporal domain. Description and recognition of dynamic textures have attracted growing attention. In this paper, a novel approach for recognizing dynamic textures is proposed and its simplifications and extensions to facial image analysis are also considered. First, the textures are modeled with volume local binary patterns (VLBP), which are an extension of the LBP operator widely used in ordinary texture analysis, combining motion and appearance. To make the approach computationally simple and easy to extend, only the co-occurrences on three orthogonal planes (LBP-TOP) are then considered. A block-based method is also proposed to deal with specific dynamic events, such as facial expressions, in which local information and its spatial locations should also be taken into account. In experiments with two dynamic texture databases, DynTex and MIT, both the VLBP and LBP-TOP clearly outperformed the earlier approaches. The proposed block-based method was evaluated with the Cohn-Kanade facial expression database with excellent results. Advantages of our approach include local processing, robustness to monotonic gray-scale changes and simple computation.
Action MACH: a spatio-temporal maximum average correlation height filter for action recognition
- In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition
, 2008
"... In this paper we introduce a template-based method for recognizing human actions called Action MACH. Our approach is based on a Maximum Average Correlation Height (MACH) filter. A common limitation of template-based methods is their inability to generate a single template using a collection of examp ..."
Abstract
-
Cited by 237 (10 self)
- Add to MetaCart
(Show Context)
In this paper we introduce a template-based method for recognizing human actions called Action MACH. Our approach is based on a Maximum Average Correlation Height (MACH) filter. A common limitation of template-based methods is their inability to generate a single template using a collection of examples. MACH is capable of capturing intra-class variability by synthesizing a single Action MACH filter for a given action class. We generalize the traditional MACH filter to video (3D spatiotemporal volume), and vector valued data. By analyzing the response of the filter in the frequency domain, we avoid the high computational cost commonly incurred in template-based approaches. Vector valued data is analyzed using the Clifford Fourier transform, a generalization of the Fourier transform intended for both scalar and vector-valued data. Finally, we perform an extensive set of experiments and compare our method with some of the most recent approaches in the field by using publicly available datasets, and two new annotated human action datasets which include actions performed in classic feature films and sports broadcast television. 1.
The CMU Pose, Illumination, and Expression (PIE) Database
, 2002
"... Between October 2000 and December 2000 we collected a database of over 40,000 facial images of 68 people. Using the CMU 3D Room we imaged each person across 13 different poses, under 43 different illumination conditions, and with 4 different expressions. We call this database the CMU Pose, Illuminat ..."
Abstract
-
Cited by 235 (6 self)
- Add to MetaCart
Between October 2000 and December 2000 we collected a database of over 40,000 facial images of 68 people. Using the CMU 3D Room we imaged each person across 13 different poses, under 43 different illumination conditions, and with 4 different expressions. We call this database the CMU Pose, Illumination, and Expression (PIE) database. In this paper we describe the imaging hardware, the collection procedure, the organization of the database, several potential uses of the database, and how to obtain the database.
Facial Expression Recognition from Video Sequences: Temporal and Static Modelling
- Computer Vision and Image Understanding
, 2003
"... Human-computer intelligent interaction (HCII) is an emerging field of science aimed at providing natural ways for humans to use computers as aids. It is argued that for the computer to be able to interact with humans, it needs to have the communication skills of humans. One of these skills is the ab ..."
Abstract
-
Cited by 195 (18 self)
- Add to MetaCart
(Show Context)
Human-computer intelligent interaction (HCII) is an emerging field of science aimed at providing natural ways for humans to use computers as aids. It is argued that for the computer to be able to interact with humans, it needs to have the communication skills of humans. One of these skills is the ability to understand the emotional state of the person. The most expressive way humans display emotions is through facial expressions. In this work we report on several advances we have made in building a system for classification of facial expressions from continuous video input. We introduce and test different architectures, focusing on changes in distribution assumptions and feature dependency structures. We also introduce a facial expression recognition from live video input using temporal cues. Methods for using temporal information have been extensively explored for speech recognition applications. Among these methods are template matching using dynamic programming methods and hidden Markov models (HMM). This work exploits existing methods and proposes a new architecture of HMMs for automatically segmenting and recognizing human facial expression from video sequences. The architecture performs both segmentation and recognition of the facial expressions automatically using an multi-level architecture composed of an HMM layer and a Markov model layer. We explore both person-dependent and person-independent recognition of expressions and compare the different methods.
Toward an Affect-Sensitive Multimodal Human-Computer Interaction
- Proceedings of the IEEE
, 2003
"... The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effect ..."
Abstract
-
Cited by 189 (30 self)
- Add to MetaCart
(Show Context)
The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI -- an automatic personalized analyzer of a user's nonverbal affective feedback.
Automatic Recognition of Facial Actions in Spontaneous Expressions
"... Abstract — Spontaneous facial expressions differ from posed expressions in both which muscles are moved, and in the dynamics of the movement. Advances in the field of automatic facial expression measurement will require development and assessment on spontaneous behavior. Here we present preliminary ..."
Abstract
-
Cited by 147 (23 self)
- Add to MetaCart
Abstract — Spontaneous facial expressions differ from posed expressions in both which muscles are moved, and in the dynamics of the movement. Advances in the field of automatic facial expression measurement will require development and assessment on spontaneous behavior. Here we present preliminary results on a task of facial action detection in spontaneous facial expressions. We employ a user independent fully automatic system for real time recognition of facial actions from the Facial Action Coding System (FACS). The system automatically detects frontal faces in the video stream and coded each frame with respect to 20 Action units. The approach applies machine learning methods such as support vector machines and AdaBoost, to texture-based image representations. The output margin for the learned classifiers predicts action unit intensity. Frame-by-frame intensity measurements will enable investigations into facial expression dynamics which were previously intractable by human coding. I.
Facial expression recognition based on Local Binary Patterns: A comprehensive study
, 2009
"... ..."
The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression
"... In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limit ..."
Abstract
-
Cited by 122 (7 self)
- Add to MetaCart
(Show Context)
In 2000, the Cohn-Kanade (CK) database was released for the purpose of promoting research into automatically detecting individual facial expressions. Since then, the CK database has become one of the most widely used test-beds for algorithm development and evaluation. During this period, three limitations have become apparent: 1) While AU codes are well validated, emotion labels are not, as they refer to what was requested rather than what was actually performed, 2) The lack of a common performance metric against which to evaluate new algorithms, and 3) Standard protocols for common databases have not emerged. As a consequence, the CK database has been used for both AU and emotion detection (even though labels for the latter have not been validated), comparison with benchmark algorithms is missing, and use of random subsets of the original database makes meta-analyses difficult. To address these and other concerns, we present the Extended Cohn-Kanade (CK+) database. The number of sequences is increased by 22 % and the number of subjects by 27%. The target expression for each sequence is fully FACS coded and emotion labels have been revised and validated. In addition to this, non-posed sequences for several types of smiles and their associated metadata have been added. We present baseline results using Active Appearance Models (AAMs) and a linear support vector machine (SVM) classifier using a leaveone-out subject cross-validation for both AU and emotion detection for the posed data. The emotion and AU labels, along with the extended image data and tracked landmarks will be made available July 2010. 1.