Results 1 - 10
of
260
Functional Phonology -- Formalizing the interactions between articulatory and perceptual drives
, 1998
"... ..."
A Matlab toolbox for musical feature extraction from audio, DAFx
, 2007
"... We present MIRtoolbox, an integrated set of functions written in Matlab, dedicated to the extraction of musical features from audio files. The design is based on a modular framework: the different algorithms are decomposed into stages, formalized using a minimal set of elementary mechanisms, and int ..."
Abstract
-
Cited by 114 (7 self)
- Add to MetaCart
(Show Context)
We present MIRtoolbox, an integrated set of functions written in Matlab, dedicated to the extraction of musical features from audio files. The design is based on a modular framework: the different algorithms are decomposed into stages, formalized using a minimal set of elementary mechanisms, and integrating different variants proposed by alternative approaches – including new strategies we have developed –, that users can select and parametrize. This paper offers an overview of the set of features, related, among others, to timbre, tonality, rhythm or form, that can be extracted with MIRtoolbox. Four particular analyses are provided as examples. The toolbox also includes functions for statistical analysis, segmentation and clustering. Particular attention has been paid to the design of a syntax that offers both simplicity of use and transparent adaptiveness to a multiplicity of possible input types. Each feature extraction method can accept as argument an audio file, or any preliminary result from intermediary stages of the chain of operations. Also the same syntax can be used for analyses of single audio files, batches of files, series of audio segments, multichannel signals, etc. For that purpose, the data and methods of the toolbox are organised in an object-oriented architecture. 1.
Loudness predicts prominence: fundamental frequency lends little
- J. Acoust. Soc. Am
, 2005
"... We explored a database covering seven dialects of British and Irish English and three different styles of speech to find acoustic correlates of prominence. We built classifiers, trained the classifiers on human prominence/non-prominence judgements, and then evaluated how well they behaved. The class ..."
Abstract
-
Cited by 88 (8 self)
- Add to MetaCart
(Show Context)
We explored a database covering seven dialects of British and Irish English and three different styles of speech to find acoustic correlates of prominence. We built classifiers, trained the classifiers on human prominence/non-prominence judgements, and then evaluated how well they behaved. The classifiers operate on 452 ms windows centered on syllables, using different acoustic measures. By comparing the performance of classifiers based on different measures, we can learn how prominence is expressed in speech. Contrary to textbooks and common assumption, fundamental frequency (f0) played a minor role in distinguishing prominent syllables from the rest of the utterance. Instead, speakers primarily marked prominence with patterns of loudness and duration. Two other acoustic measures that we examined also played a minor role, comparable to f0. All dialects and speaking styles studied here share a common definition of prominence. The result is robust to differences in labeling practice and the dialect of the labeler.
The Production and Recognition of Emotions in Speech: Features and Algorithms
- Int’l J. Human-Computer Studies
, 2003
"... This paper presents algorithms that allow a robot to express its emotions by modulating the intonation of its voice. They are very simple and efficiently provide life-like speech thanks to the use of concatenative speech synthesis. We describe a technique which allows to continuously control both th ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
This paper presents algorithms that allow a robot to express its emotions by modulating the intonation of its voice. They are very simple and efficiently provide life-like speech thanks to the use of concatenative speech synthesis. We describe a technique which allows to continuously control both the age of a synthetic voice and the quantity of emotions that are expressed. Also, we present the first large-scale data mining experiment about the automatic recognition of basic emotions in informal everyday short utterances. We focus on the speaker-dependent problem. We compare a large set of machine learning algorithms, ranging from neural networks, Support Vector Machines or decision trees, together with 200 features, using a large database of several thousands examples. We show that the difference of performance among learning schemes can be substantial, and that some features which were previously unexplored are of crucial importance. An optimal feature set is derived through the use of a genetic algorithm. Finally, we explain how this study can be applied to real world situations in which very few examples are available. Furthermore, we describe a game to play with a personal robot which facilitates teaching of examples of emotional utterances in a natural and rather unconstrained manner.
Comparing Feature Sets for Acted and Spontaneous Speech in View of Automatic Emotion Recognition
- Multimedia and Expo, 2005. ICME 2005. IEEE International Conference on (2005
"... We present a data-mining experiment on feature selection for automatic emotion recognition. Starting from more than 1000 features derived from pitch, energy and MFCC time series, the most relevant features in respect to the data are selected from this set by removing correlated features. The feature ..."
Abstract
-
Cited by 58 (11 self)
- Add to MetaCart
(Show Context)
We present a data-mining experiment on feature selection for automatic emotion recognition. Starting from more than 1000 features derived from pitch, energy and MFCC time series, the most relevant features in respect to the data are selected from this set by removing correlated features. The features selected for acted and realistic emotions are anal-ysed and show significant differences. All features are com-puted automatically and we also contrast automatically with manually units of analysis. A higher degree of automation did not prove to be a disadvantage in terms of recognition accuracy. 1.
Suitability of dysphonia measurements for telemonitoring of Parkinson‟s disease
- IEEE Trans. Biomedical Engineering
"... We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson‟s disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrol ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
(Show Context)
We present an assessment of the practical value of existing traditional and non-standard measures for discriminating healthy people from people with Parkinson‟s disease (PD) by detecting dysphonia. We introduce a new measure of dysphonia, Pitch Period Entropy (PPE), which is robust to many uncontrollable confounding effects including noisy acoustic environments and normal, healthy variations in voice frequency. We collected sustained phonations from 31 people, 23 with PD. We then selected 10 highly uncorrelated measures, and an exhaustive search of all possible combinations of these measures finds four that in combination lead to overall correct classification performance of 91.4%, using a kernel support vector machine. In conclusion, we find that non-standard methods in combination with traditional harmonics-to-noise ratios are best able to separate healthy from PD subjects. The selected non-standard methods are robust to many uncontrollable variations in acoustic environment and individual subjects, and are thus well-suited to telemonitoring applications. Index Terms: Acoustic measures, nervous system, speech analysis, telemedicine. I.
Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection
, 2007
"... Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased “breathiness”. Modelling and surrogat ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
(Show Context)
Background: Voice disorders affect patients profoundly, and acoustic tools can potentially measure voice function objectively. Disordered sustained vowels exhibit wide-ranging phenomena, from nearly periodic to highly complex, aperiodic vibrations, and increased “breathiness”. Modelling and surrogate data studies have shown significant nonlinear and non-Gaussian random properties in these sounds. Nonetheless, existing tools are limited to analysing voices displaying near periodicity, and do not account for this inherent biophysical nonlinearity and non-Gaussian randomness, often using linear signal processing methods insensitive to these properties. They do not directly measure the two main biophysical symptoms of disorder: complex nonlinear aperiodicity, and turbulent, aeroacoustic, non-Gaussian randomness. Often these tools cannot be applied to more severe disordered voices, limiting their clinical usefulness. Methods: This paper introduces two new tools to speech analysis: recurrence and fractal scaling, which overcome the range limitations of existing tools by addressing directly these two symptoms of disorder, together reproducing a “hoarseness ” diagram. A simple bootstrapped classifier then uses these two features to distinguish normal from disordered voices. 1
MySong: Automatic Accompaniment Generation for Vocal Melodies
"... We introduce MySong, a system that automatically chooses chords to accompany a vocal melody. A user with no musical experience can create a song with instrumental accompaniment just by singing into a microphone, and can experiment with different styles and chord patterns using interactions designed ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
(Show Context)
We introduce MySong, a system that automatically chooses chords to accompany a vocal melody. A user with no musical experience can create a song with instrumental accompaniment just by singing into a microphone, and can experiment with different styles and chord patterns using interactions designed to be intuitive to non-musicians. We describe the implementation of MySong, which trains a Hidden Markov Model using a music database and uses that model to select chords for new melodies. Model parameters are intuitively exposed to the user. We present results from a study demonstrating that chords assigned to melodies using MySong and chords assigned manually by musicians receive similar subjective ratings. We then present results from a second study showing that thirteen users with no background in music theory are able to rapidly create musical accompaniments using MySong, and that these accompaniments are rated positively by evaluators.
A sawtooth waveform inspired pitch estimator for speech and music
- The Journal of the Acoustical Society of America
, 2008
"... 2 Dedico esta disertación a mis queridos abuelos ..."
(Show Context)
Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio
- in Proc. of ICASSP,Orlando,USA, 2002
"... This paper presents an improvement of a previously proposed pitch determination algorithm (PDA). Particularly aiming at handling alternate cycles in speech signal, the algorithm estimates pitch through spectrum shifting on logarithmic frequency scale and calculating the Subharmonic-to-Harmonic Ratio ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
(Show Context)
This paper presents an improvement of a previously proposed pitch determination algorithm (PDA). Particularly aiming at handling alternate cycles in speech signal, the algorithm estimates pitch through spectrum shifting on logarithmic frequency scale and calculating the Subharmonic-to-Harmonic Ratio (SHR). The evaluation results on two databases show that this algorithm performs considerably better than other PDAs compared. Application of SHR to voice quality analysis task is also presented. The implementation and evaluation routines are available from