Results 1 - 10
of
19
Social Signal Processing: Survey of an Emerging Domain
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 153 (32 self)
- Add to MetaCart
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.
Social Signal Processing: State-of-the-art and future perspectives of an emerging domain
- IN PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
(Show Context)
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes aset of recommendations for enabling the development of the next generation of socially-aware computing.
Imitating conversational laughter with an articulatory speech synthesis
- in Proceedings of the Interdisciplinary Workshop on The Phonetics of Laughter
, 2007
"... In this study we present initial efforts to model laughter with an articulatory speech synthesizer. We aimed at imitating a real laugh taken from a spontaneous speech database and created several synthetic versions of it using articulatory synthesis and diphone synthesis. In modeling laughter with a ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
In this study we present initial efforts to model laughter with an articulatory speech synthesizer. We aimed at imitating a real laugh taken from a spontaneous speech database and created several synthetic versions of it using articulatory synthesis and diphone synthesis. In modeling laughter with articulatory synthesis, we also approximated features like breathing noises that do not normally occur in speech. Evaluation with respect to the perceived degree of naturalness indicated that the laugh stimuli would pass as “laughs ” in an appropriate conversational context. In isolation, though, significant differences could be measured with regard to the degree of variation (durational patterning, fundamental frequency, intensity) within each laugh.
Prediction and realisation of conversational characteristics by utilising spontaneous speech for unit selection
- in Speech Prosody
, 2010
"... Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention and spontaneity associated with everyday conversations. Unit selection is heav ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
Unit selection speech synthesis has reached high levels of naturalness and intelligibility for neutral read aloud speech. However, synthetic speech generated using neutral read aloud data lacks all the attitude, intention and spontaneity associated with everyday conversations. Unit selection is heavily data dependent and thus in order to simulate human conversational speech, or create synthetic voices for believable virtual characters, we need to utilise speech data with examples of how people talk rather than how people read. In this paper we included carefully selected utterances from spontaneous conversational speech in a unit selection voice. Using this voice and by automatically predicting type and placement of lexical fillers and filled pauses we can synthesise utterances with conversational characteristics. A perceptual listening test showed that it is possible to make synthetic speech sound more conversational without degrading naturalness. Index Terms: speech synthesis, unit selection, conversation, spontaneous speech, lexical fillers, filled pauses
Towards conversational speech synthesis; lessons learned from the expressive speech processing project
- in SSW6
, 2007
"... Thispaperdiscussessomeideasfortherequirementsandmethods of conversational speech synthesis, based on experience gained from the collection and analysis of a very large corpus of conversational speech in a variety of real-life everyday contexts. It shows that because variation in voice quality plays ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Thispaperdiscussessomeideasfortherequirementsandmethods of conversational speech synthesis, based on experience gained from the collection and analysis of a very large corpus of conversational speech in a variety of real-life everyday contexts. It shows that because variation in voice quality plays a significant part in the transmission of interpersonal and affect-related social information, this feature should be given priority in future speech synthesis research. Several solutions to this problem are proposed.
A new prosody annotation protocol for live sports commentaries
- in Interspeech
, 2013
"... This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two levels of annotation are defined with HMM-based speech synthesis in view. Local labels are assigned to all syllables and refer to accentual phenomena. Global labels classify sequences of words into five d ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
This paper proposes a new prosody annotation protocol specific to live sports commentaries. Two levels of annotation are defined with HMM-based speech synthesis in view. Local labels are assigned to all syllables and refer to accentual phenomena. Global labels classify sequences of words into five distinct subgenres, defined in terms of valence and arousal. The objective of the study is to provide a set of labels both related to a specific function and characterized by a distinct acoustic realization. The consideration of these constraints should allow for an automatic prediction of the labels both from the text or from the speech signal. Reasonable inter-annotator scores are achieved for both annotation levels. A prosodic analysis of all labels also shows that they can usually be distinguished by specific acoustic realizations. The integration of this new annotation protocol within HMM-based speech synthesis shows promising results. Index Terms: Prosody, Expressive speech synthesis, Sports 1.
Expressive speech synthesis: a review
- INT J SPEECH TECHNOL
, 2012
"... The objective of the present work is to provide a detailed review of expressive speech synthesis (ESS). Among various approaches for ESS, the present paper focuses the development of ESS systems by explicit control. In this approach, the ESS is achieved by modifying the parameters of the neutral s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The objective of the present work is to provide a detailed review of expressive speech synthesis (ESS). Among various approaches for ESS, the present paper focuses the development of ESS systems by explicit control. In this approach, the ESS is achieved by modifying the parameters of the neutral speech which is synthesized from the text. The present paper reviews the works addressing various issues related to the development of ESS systems by explicit control. The review provided in this paper include, review of the various approaches for text to speech synthesis, various studies on the analysis and estimation of expressive parameters and various studies on methods to incorporate expressive parameters. Finally the review is concluded by mentioning the scope of future work for ESS by explicit control.
Differences in the speaking styles of a japanese male according to interlocutor ; showing the effects of affect
- in conversational speech,” Computational Linguistics Chinese Language Processing
, 2007
"... There has been considerable interest recently in the processing of affect in spoken interactions. This paper presents an analysis of some conversational speech corpus data showing that the four prosodic characteristics, duration, pitch, power, and voicing all vary significantly according to both int ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
There has been considerable interest recently in the processing of affect in spoken interactions. This paper presents an analysis of some conversational speech corpus data showing that the four prosodic characteristics, duration, pitch, power, and voicing all vary significantly according to both interlocutor differences and differences in familiarity over a fixed period of time with the same interlocutor.
Phone set selection for HMM-based dialect speech synthesis
"... This paper describes a method for selecting an appropriate phone set in dialect speech synthesis for a so far undescribed dialect by applying hidden Markov model (HMM) based training and clustering methods. In this pilot study we show how a phone set derived from the phonetic surface can be optimize ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
This paper describes a method for selecting an appropriate phone set in dialect speech synthesis for a so far undescribed dialect by applying hidden Markov model (HMM) based training and clustering methods. In this pilot study we show how a phone set derived from the phonetic surface can be optimized given a small amount of dialect speech training data. 1