Results 1 - 10
of
16
Automatic Analysis of Multimodal Group Actions in Meetings
, 2003
"... This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio ..."
Abstract
-
Cited by 90 (26 self)
- Add to MetaCart
This paper investigates the recognition of group actions in meetings. A framework is employed in which group actions result from the interactions of the individual participants. The group actions are modelled using different HMM-based approaches, where the observations are provided by a set of audio-visual features monitoring the actions of individuals. Experiments demonstrate the importance of taking interactions into account in modelling the group actions. It is also shown that the visual modality contains useful information, even for predominantly audio-based events, motivating a multimodal approach to meeting analysis.
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
, 2009
"... Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypi ..."
Abstract
-
Cited by 69 (17 self)
- Add to MetaCart
Automated analysis of human affective behavior has attracted increasing attention from researchers in psychology, computer science, linguistics, neuroscience, and related disciplines. However, the existing methods typically handle only deliberately displayed and exaggerated expressions of prototypical emotions, despite the fact that deliberate behavior differs in visual appearance, audio profile, and timing from spontaneously occurring behavior. To address this problem, efforts to develop algorithms that can process naturally occurring human affective behavior have recently emerged. Moreover, an increasing number of efforts are reported toward multimodal fusion for human affect analysis, including audiovisual fusion, linguistic and paralinguistic fusion, and multicue visual fusion based on facial expressions, head movements, and body gestures. This paper introduces and surveys these recent advances. We first discuss human emotion perception from a psychological perspective. Next, we examine available approaches for solving the problem of machine understanding of human affective behavior and discuss important issues like the collection and availability of training and test data. We finally outline some of the scientific and engineering challenges to advancing human affect sensing technology.
Audio-Visual Affect Recognition
- IEEE Transactions on Multimedia
, 2007
"... Abstract—The ability of a computer to detect and appropriately respond to changes in a user’s affective state has significant implications to Human–Computer Interaction (HCI). In this paper, we present our efforts toward audio–visual affect recognition on 11 affective states customized for HCI appli ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Abstract—The ability of a computer to detect and appropriately respond to changes in a user’s affective state has significant implications to Human–Computer Interaction (HCI). In this paper, we present our efforts toward audio–visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5 % improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1 % improvement over the best component performance. Index Terms—Affect recognition, affective computing, emotion recognition, multimodal human–computer interaction. I.
AN SVM FRONT-END LANDMARK SPEECH RECOGNITION SYSTEM
, 2008
"... Support vector machines (SVMs) can be trained to detect manner transitions between phones and to identify the manner and place of articulation of any given phone. The SVMs can perform these tasks with high accuracy using a variety of acoustic representations. The SVMs generalize well to unseen test ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Support vector machines (SVMs) can be trained to detect manner transitions between phones and to identify the manner and place of articulation of any given phone. The SVMs can perform these tasks with high accuracy using a variety of acoustic representations. The SVMs generalize well to unseen test data if these data were created under identical conditions to the training corpus. Unseen acoustic data from different corpora present a problem for the SVM, even if these acoustic data were generated under similar conditions. The discriminant outputs of these SVMs are used to create both a hybrid SVM/HMM (hidden Markov model) phone recogni-tion system and a hybrid SVM/HMM word recognition system. There is a significant improvement in both phone and word recognition accuracy when these SVM discrim-inant features are used instead of mel frequency cepstral coefficients (MFCCs).
Automatic recognition of emotions from speech: a review of the literature and recommendations for practical realisation
- In LNCS 4868
, 2008
"... Abstract. In this article we give guidelines on how to address the major technical challenges of automatic emotion recognition from speech in human-computer interfaces, which include audio segmentation to find appropriate units for emotions, extraction of emotion relevant features, classification of ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract. In this article we give guidelines on how to address the major technical challenges of automatic emotion recognition from speech in human-computer interfaces, which include audio segmentation to find appropriate units for emotions, extraction of emotion relevant features, classification of emotions, and training databases with emotional speech. Research so far has mostly dealt with offline evaluation of vocal emotions, and online processing has hardly been addressed. Online processing is, however, a necessary prerequisite for the realization of human-computer interfaces that analyze and respond to the user’s emotions while he or she is interacting with an application. By means of a sample application, we demonstrate how the challenges arising from online processing may be solved. The overall objective of the paper is to help readers to assess the feasibility of human-computer interfaces that are sensitive to the user’s emotional voice and to provide them with guidelines of how to technically realize such interfaces. 1
Speech Emotion Analysis: Exploring the Role of Context
- TO APPEAR IN THE “IEEE TRANSACTIONS ON MULTIMEDIA, 2010”
, 2010
"... Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g. lexical cues apart from prosod ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Automated analysis of human affective behavior has attracted increasing attention in recent years. With the research shift toward spontaneous behavior, many challenges have come to surface ranging from database collection strategies to the use of new feature sets (e.g. lexical cues apart from prosodic features). Use of contextual information, however, is rarely addressed in the field of affect expression recognition. Yet it is evident that affect recognition by human is largely influenced by the context information. Our contribution in this paper is three fold. First, we introduce a novel set of features based on cepstrum analysis of pitch and intensity contours. We evaluate the usefulness of these features on two different databases: Berlin Database of emotional speech (EMO-DB) and locally collected audiovisual database in car settings (CVRRCar-AVDB). The overall recognition accuracy achieved for seven emotions in EMO-DB database is over 84% and over 87 % for three emotion classes in CVRRCar-AVDB. This is based on 10 fold stratified cross validation. Second, we introduce the collection of a new audiovisual database in an automobile setting (CVRRCar-AVDB). In this current study, we only use the audio channel of the database. Third, we systematically analyze the effects of different contexts on two different databases. We present context analysis of subject and text based on speaker/text dependent/independent analysis on EMO-DB database. Furthermore, we perform context analysis based on gender information on EMO-DB and CVRRCar-AVDB. The results based on these analyses are promising.
International Conference on Computer Systems and Technologies- CompSysTech’06 The
"... recognition of emotions from speech using GentleBoost classifier. ..."
Emotion Recognition Using IG-based Feature Compensation and Continuous Support Vector Machines
"... This paper presents an approach to feature compensation for emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are firstly extracted. The speech features in each selected intonation group are then extracted. With the assumption of linea ..."
Abstract
- Add to MetaCart
This paper presents an approach to feature compensation for emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are firstly extracted. The speech features in each selected intonation group are then extracted. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to characterize the feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Minimum Classification Error (MCE) algorithm. For the final emotional state decision, the IG-based feature vectors compensated by the compensation vectors are used to train the Continuous Support Vector Machine (CSVMs) for each emotional state. The emotional state with the maximal output probability is determined as the final output. The kernel function of CSVM model is experimentally decided as Radial basis function and the experimental result shows that IG-based feature extraction and compensation can obtain encouraging performance for emotion recognition. 1.
Emotion Recognition from Speech Using IG-Based Feature Compensation
"... This paper presents an approach to feature compensation for emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are extracted first. The speech features in each selected intonation group are then extracted. With the assumption of linear ..."
Abstract
- Add to MetaCart
This paper presents an approach to feature compensation for emotion recognition from speech signals. In this approach, the intonation groups (IGs) of the input speech signals are extracted first. The speech features in each selected intonation group are then extracted. With the assumption of linear mapping between feature spaces in different emotional states, a feature compensation approach is proposed to characterize feature space with better discriminability among emotional states. The compensation vector with respect to each emotional state is estimated using the Minimum Classification Error (MCE) algorithm. For the final emotional state decision, the compensated IG-based feature vectors are used to train the Gaussian Mixture Models (GMMs) and Continuous Support Vector Machine (CSVMs) for each emotional state. For GMMs, the emotional state with the GMM having the maximal likelihood ratio is determined as the final output. For CSVMs, the emotional state is determined according to the probability outputs from the CSVMs. The kernel function in CSVM is experimentally decided as a Radial basis function. A comparison in the experiments shows that the proposed IG-based feature compensation can obtain encouraging performance for emotion recognition.
Spectral Features Detection of Speech Emotion and Speaking Styles Recognition Based on HMM Classifier
"... Abstract:-This paper deals with the influence of the spectral features in recognizing emotions and speaking styles from speech signals. Through out this study MFCC and Mel band energies are used as the base features. The investigation shows that each feature mentioned has an important impact in reco ..."
Abstract
- Add to MetaCart
Abstract:-This paper deals with the influence of the spectral features in recognizing emotions and speaking styles from speech signals. Through out this study MFCC and Mel band energies are used as the base features. The investigation shows that each feature mentioned has an important impact in recognizing several emotions and speaking styles with a satisfying recognition rate. The best recognition accuracy is reached with Mel band energies attaining 79 % for the slow speaking style among 10 different states to be recognized by the system. These results can be significantly enhanced by combining both features. The aim of this work is to identify which of the speaking state will be better recognized with MFCCs or Mel band energies. For this approach, we use the HMM classifier. Results are given on text-independent emotion recognition using SUSAS database (Speech Under Simulated and Actual Stress). We compare emotion recognition performance based on the features mentioned. Experimental results show that stressed and loud styles are better recognized using MFCC features than those of Mel Band energies, by more than 12.5 % which is an important improvement. The average recognition accuracy of ten different emotions and speaking styles with HMM models exceeds 60 % using the spectral features. We achieve with the text-independent SUSAS database 62 % average accuracy for 10 emotions and speaking styles recognition using the MFCCs features.

