Results 1  10
of
16
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
(Show Context)
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Switching Linear Dynamical Systems for Noise Robust Speech Recognition
 IEEE Trans. Audio, Speech and Language Processing
, 2007
"... to appear in ..."
(Show Context)
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a technical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
Modeling musical sounds with an interpolating state model
 in Proc. Eur. Signal Process. Conf
, 2005
"... A computationally efficient algorithm is proposed for modeling and coding the timevarying spectra of musical sounds. The aim is to encode individual data sets and not the statistical properties of the sounds. A given sequence of acoustic feature vectors is modeled by finding such a set of “states ” ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
A computationally efficient algorithm is proposed for modeling and coding the timevarying spectra of musical sounds. The aim is to encode individual data sets and not the statistical properties of the sounds. A given sequence of acoustic feature vectors is modeled by finding such a set of “states ” (anchor points in the feature space) that the input data can be efficiently represented by interpolating between them. The achieved modeling accuracy for a database of musical sounds was approximately two times better than that of a conventional “vector quantization ” model where the input data was kmeans clustered and the input data vectors were then replaced by their corresponding cluster centroids. The computational complexity of the proposed algorithm as a function of the input sequence length T is O(TlogT). 1.
Development and Exploration of a Timbre Space Representation of Audio
, 2005
"... Sound is an important part of the human experience and provides valuable information about the world around us. Auditory humancomputer interfaces do not have the same richness of expression and variety as audio in the world, and it has been said that this is primarily due to a lack of reasonable d ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Sound is an important part of the human experience and provides valuable information about the world around us. Auditory humancomputer interfaces do not have the same richness of expression and variety as audio in the world, and it has been said that this is primarily due to a lack of reasonable design tools for audio interfaces. There are a number of good guidelines for audio design and a strong psychoacoustic understanding of how sounds are interpreted. There are also a number of sound manipulation techniques developed for computer music. This research takes these ideas as the basis for an audio interface design system. A proofofconcept of this system has been developed in order to explore the design possibilities allowed by the new system. The core of this novel audio design system is the timbre space. This provides a multidimensional representation of a sound. Each sound is represented as a path in the timbre space and this path can be manipulated geometrically. Several timbre spaces are compared to determine which amongst them is the best one for audio interface design. The various transformations available in the timbre space are discussed and the perceptual relevance of two novel transformations are explored by encoding “urgency ” as a design parameter. This research demonstrates that the timbre space is a viable option for audio interface design and provides novel features that are not found in current audio design systems. A number of problems with the approach and some suggested solutions are discussed. The timbre space opens up new possibilities for audio designers to explore combinations of sounds and sound design based on perceptual cues rather than synthesiser parameters.
Interpolating hidden Markov model and its application to automatic instrument recognition
 in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process
, 2009
"... This paper proposes an interpolating extension to hidden Markov models (HMMs), which allows more accurate modeling of natural sounds sources. The model is able to produce observations from distributions which are interpolated between discrete HMM states. The model uses Gaussian mixture state emissi ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
This paper proposes an interpolating extension to hidden Markov models (HMMs), which allows more accurate modeling of natural sounds sources. The model is able to produce observations from distributions which are interpolated between discrete HMM states. The model uses Gaussian mixture state emission densities, and the interpolation is implemented by introducing interpolating states in which the mixture weights, means, and variances are interpolated from the discrete HMM state densities. We propose an algorithm extended from the BaumWelch algorithm for estimating the parameters of the interpolating model. The model was evaluated in automatic instrument classification task, where it produced systematically better recognition accuracy than a baseline HMM recognition algorithm. Index Terms — Hidden Markov models, acoustic signal processing, musical instruments, pattern classification 1.
Generative factor analyzed HMM for automatic speech recognition
, 2005
"... We present a generativefacer analyzed hidden Markov model (GFAHMM) forautomatic speec recticJ)F In a standard HMM, observationvecerv are represented by mixture of Gaussians (MoG) that are dependent ondiscFfifiT valued hidden statesequencfi The GFAHMMintroducE a hierarcJ ofc)AfiqSTJAS)))Tc ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a generativefacer analyzed hidden Markov model (GFAHMM) forautomatic speec recticJ)F In a standard HMM, observationvecerv are represented by mixture of Gaussians (MoG) that are dependent ondiscFfifiT valued hidden statesequencfi The GFAHMMintroducE a hierarcJ ofc)AfiqSTJAS)))Tc latent representation of observationvecerva where latent vecntT in one level areacAq)qFTJAfiA dependent and latent vecntT in a higher level are acxI)ETJAEF) independent.
Time (s)
"... Abstract—A computationally efficient algorithm is proposed for modeling and representing timevarying musical sounds. The aim is to encode individual sounds and not the statistical properties of several sounds representing a certain class. A given sequence of acoustic feature vectors is modeled by f ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract—A computationally efficient algorithm is proposed for modeling and representing timevarying musical sounds. The aim is to encode individual sounds and not the statistical properties of several sounds representing a certain class. A given sequence of acoustic feature vectors is modeled by finding such a set of “states ” (anchor points in the feature space) that the input data can be efficiently represented by interpolating between them. The proposed interpolating state model is generic and can be used to represent any multidimensional data sequence. In this paper, it is applied to represent musical instrument sounds in a compact and accurate form. Simulation experiments were carried out which show that the proposed method clearly outperforms the conventional vector quantization approach where the acoustic feature data is kmeans clustered and the feature vectors are replaced by the corresponding cluster centroids. The computational complexity of the proposed algorithm as a function of the input sequence length T is O(T log T). Index Terms—Acoustic signal processing, audio coding, interpolation, vector quantization, discrete cosine transforms
SECTION DE GÉNIE ÉLECTRIQUE ET ÉLECTRONIQUE ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE POUR L'OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR
, 2008
"... ingénieur informaticien diplômé EPF de nationalité suisse et originaire de Siviriez (FR) acceptée sur proposition du jury: Prof. A. Ijspeert, président du jury Prof. H. Bourlard, directeur de thèse ..."
Abstract
 Add to MetaCart
(Show Context)
ingénieur informaticien diplômé EPF de nationalité suisse et originaire de Siviriez (FR) acceptée sur proposition du jury: Prof. A. Ijspeert, président du jury Prof. H. Bourlard, directeur de thèse
Inference in Switching Linear Dynamical Systems Applied to Noise Robust Speech Recognition of Isolated Digits
"... Real world applications such as handsfree dialling in cars may have to perform recognition of spoken digits in potentially very noisy environments. Existing stateoftheart solutions to this problem use featurebased Hidden Markov Models (HMMs), with a preprocessing stage to clean the noisy signal ..."
Abstract
 Add to MetaCart
(Show Context)
Real world applications such as handsfree dialling in cars may have to perform recognition of spoken digits in potentially very noisy environments. Existing stateoftheart solutions to this problem use featurebased Hidden Markov Models (HMMs), with a preprocessing stage to clean the noisy signal. However, the effect that the noise has on the induced HMM features is difficult to model exactly and limits the performance of the HMM system. An alternative to featurebased HMMs is to model the clean speech waveform directly, which has the potential advantage that including an explicit model of additive noise is straightforward. One of the most simple model of the clean speech waveform is the autoregressive (AR) process. Being too simple to cope with the nonlinearity of the speech signal, the AR process is generally embedded into a more elaborate model, such as the Switching Autoregressive HMM (SARHMM). In this thesis, we extend the SARHMM to jointly model the clean speech waveform and additive Gaussian white noise. This is achieved by using a Switching Linear Dynamical System (SLDS) whose internal dynamics is autoregressive. On an isolated digit recognition task where utterances have been corrupted by additive Gaussian white noise, the proposed SLDS outperforms a stateoftheart HMM system. For more natural noise sources, at low signal to noise ratios (SNRs), it is also significantly