Results 1 - 10
of
649
An overview of text-independent speaker recognition: from features to supervectors
, 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract
-
Cited by 156 (37 self)
- Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.
Social Signal Processing: Survey of an Emerging Domain
, 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract
-
Cited by 153 (32 self)
- Add to MetaCart
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.
Using linguistic cues for the automatic recognition of personality in conversation and text
- Journal of Artificial Intelligence Research (JAIR
, 2007
"... It is well known that utterances convey a great deal of information about the speaker in addition to their semantic content. One such type of information consists of cues to the speaker’s personality traits, the most fundamental dimension of variation between humans. Recent work explores the automat ..."
Abstract
-
Cited by 99 (4 self)
- Add to MetaCart
(Show Context)
It is well known that utterances convey a great deal of information about the speaker in addition to their semantic content. One such type of information consists of cues to the speaker’s personality traits, the most fundamental dimension of variation between humans. Recent work explores the automatic detection of other types of pragmatic variation in text and conversation, such as emotion, deception, speaker charisma, dominance, point of view, subjectivity, opinion and sentiment. Personality affects these other aspects of linguistic production, and thus personality recognition may be useful for these tasks, in addition to many other potential applications. However, to date, there is little work on the automatic recognition of personality traits. This article reports experimental results for recognition of all Big Five personality traits, in both conversation and text, utilising both self and observer ratings of personality. While other work reports classification results, we experiment with classification, regression and ranking models. For each model, we analyse the effect of different feature sets on accuracy. Results show that for some traits, any type of statistical model performs significantly better than the baseline, but ranking models
Reducing audible spectral discontinuities
- IEEE Trans. Speech and Audio Processing
, 2001
"... ..."
Suitability of dysphonia measurements for telemonitoring of Parkinson‟s disease
- IEEE Trans. Biomedical Engineering
"... We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson‟s disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrol ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
(Show Context)
We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson‟s disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency. We collected sustained phonations from 31 people, 23 with PD. We then selected 10 highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine. In conclusion, we find that non-standard methods in combination with traditional harmonics-to-noise ratios are best able to separate healthy from PD subjects. The selected non-standard methods are robust to many uncontrollable variations in acoustic environment and individual subjects, and are thus well-suited to telemonitoring applications. Index Terms: Acoustic measures, nervous system, speech analysis, telemedicine. I.
Predicting the unpredictable: Interpreting neutralized segments in Dutch. Language 79.5–38
, 2003
"... Among the most fascinating data for phonology are those showing how speakers incorporate new words and foreign words into their language system, since these data provide cues to the actual principles underlying language. In this article, we address how speakers deal with neutral-ized obstruents in n ..."
Abstract
-
Cited by 42 (10 self)
- Add to MetaCart
Among the most fascinating data for phonology are those showing how speakers incorporate new words and foreign words into their language system, since these data provide cues to the actual principles underlying language. In this article, we address how speakers deal with neutral-ized obstruents in new words. We formulate four hypotheses and test them on the basis of Dutch word-final obstruents, which are neutral for [voice]. Our experiments show that speakers predict the characteristics of neutralized segments on the basis of phonologically similar morphemes stored in the mental lexicon. This effect of the similar morphemes can be modeled in several ways. We compare five models, among them STOCHASTIC OPTIMALITY THEORY and ANALOGICAL MODELING OF LANGUAGE; all perform approximately equally well, but they differ in their complex-ity, with analogical modeling of language providing the most economical explanation.* 1. INTRODUCTION. The
Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection
, 2007
"... Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased “breathiness”. Modelling and surrogat ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased “breathiness”. Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness. Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a “hoarseness ” diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices. 1
Detecting Certainness in Spoken Tutorial Dialogues
- Proceedings of Interspeech
, 2005
"... What role does affect play in spoken tutorial systems and is it automatically detectable? We investigated the classification of student certainness in a corpus collected for ITSPOKE, a speech-enabled Intelligent Tutorial System (ITS). Our study suggests that tutors respond to indications of student ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
(Show Context)
What role does affect play in spoken tutorial systems and is it automatically detectable? We investigated the classification of student certainness in a corpus collected for ITSPOKE, a speech-enabled Intelligent Tutorial System (ITS). Our study suggests that tutors respond to indications of student uncertainty differently from student certainty. Results of machine learning experiments indicate that acoustic-prosodic features can distinguish student certainness from other student states. A combination of acoustic-prosodic features extracted at two levels of intonational analysis — breath groups and turns — achieves 76.42 % classification accuracy, a 15.8 % relative improvement over baseline performance. Our results suggest that student certainness can be automatically detected and utilized to create better spoke dialog ITSs. TUTOR: STUDENT:
Deap: A database for emotion analysis using physiological signals
- IEEE TRANSACTIONS ON AFFECTIVE COMPUTING
, 2012
"... We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arou ..."
Abstract
-
Cited by 36 (9 self)
- Add to MetaCart
We present a multimodal data set for the analysis of human affective states. The electroencephalogram (EEG) and peripheral physiological signals of 32 participants were recorded as each watched 40 one-minute long excerpts of music videos. Participants rated each video in terms of the levels of arousal, valence, like/dislike, dominance, and familiarity. For 22 of the 32 participants, frontal face video was also recorded. A novel method for stimuli selection is proposed using retrieval by affective tags from the last.fm website, video highlight detection, and an online assessment tool. An extensive analysis of the participants’ ratings during the experiment is presented. Correlates between the EEG signal frequencies and the participants’ ratings are investigated. Methods and results are presented for single-trial classification of arousal, valence, and like/dislike ratings using the modalities of EEG, peripheral physiological signals, and multimedia content analysis. Finally, decision fusion of the classification results from different modalities is performed. The data set is made publicly available and we encourage other researchers to use it for testing their own affective state estimation methods.
A New Model Of Sensorimotor Coupling In The Development Of Speech
, 2004
"... We present a computational model that learns a coupling between motor parameters and their sensory consequences in vocal production during a babbling phase. Based on the coupling, preferred motor parameters and prototypically perceived sounds develop concurrently. Exposure to an ambient language mod ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
We present a computational model that learns a coupling between motor parameters and their sensory consequences in vocal production during a babbling phase. Based on the coupling, preferred motor parameters and prototypically perceived sounds develop concurrently. Exposure to an ambient language modifies perception to coincide with the sounds from the language. The model develops motor mirror neurons that are active when an external sound is perceived. An extension to visual mirror neurons for oral gestures is suggested.