Results 1 -
6 of
6
Uncertainty decoding for noise robust speech recognition
- in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Switching Linear Dynamical Systems for Noise Robust Speech Recognition
- IEEE Trans. Audio, Speech and Language Processing
, 2007
"... to appear in ..."
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a tech-nical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
Modeling musical sounds with an interpolating state model
- In Proceedings of European Signal Processing Conference
, 2005
"... A computationally efficient algorithm is proposed for modeling and coding the time-varying spectra of musical sounds. The aim is to encode individual data sets and not the statistical properties of the sounds. A given sequence of acoustic feature vectors is modeled by finding such a set of “states ” ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A computationally efficient algorithm is proposed for modeling and coding the time-varying spectra of musical sounds. The aim is to encode individual data sets and not the statistical properties of the sounds. A given sequence of acoustic feature vectors is modeled by finding such a set of “states ” (anchor points in the feature space) that the input data can be efficiently represented by interpolating between them. The achieved modeling accuracy for a database of musical sounds was approximately two times better than that of a conventional “vector quantization ” model where the input data was k-means clustered and the input data vectors were then replaced by their corresponding cluster centroids. The computational complexity of the proposed algorithm as a function of the input sequence length T is O(TlogT). 1.
Development and Exploration of a Timbre Space Representation of Audio
, 2005
"... Sound is an important part of the human experience and provides valuable infor-mation about the world around us. Auditory human-computer interfaces do not have the same richness of expression and variety as audio in the world, and it has been said that this is primarily due to a lack of reasonable d ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Sound is an important part of the human experience and provides valuable infor-mation about the world around us. Auditory human-computer interfaces do not have the same richness of expression and variety as audio in the world, and it has been said that this is primarily due to a lack of reasonable design tools for audio interfaces. There are a number of good guidelines for audio design and a strong psychoacoustic understanding of how sounds are interpreted. There are also a number of sound manipulation techniques developed for computer music. This research takes these ideas as the basis for an audio interface design system. A proof-of-concept of this system has been developed in order to explore the design possibilities allowed by the new system. The core of this novel audio design system is the timbre space. This provides a multi-dimensional representation of a sound. Each sound is represented as a path in the timbre space and this path can be manipulated geometrically. Several timbre spaces are compared to determine which amongst them is the best one for audio interface design. The various transformations available in the timbre space are discussed and the perceptual relevance of two novel transformations are explored by encoding “urgency ” as a design parameter. This research demonstrates that the timbre space is a viable option for audio inter-face design and provides novel features that are not found in current audio design systems. A number of problems with the approach and some suggested solutions are discussed. The timbre space opens up new possibilities for audio designers to explore combinations of sounds and sound design based on perceptual cues rather than synthesiser parameters.
Generative factor analyzed HMM for automatic speech recognition
, 2005
"... We present a generativefacer analyzed hidden Markov model (GFA-HMM) forautomatic speec recticJ)F In a standard HMM, observationvecerv are represented by mixture of Gaussians (MoG) that are dependent ondiscFfifiT valued hidden statesequencfi The GFA-HMMintroducE a hierarcJ ofc)AfiqSTJAS)))Tc ..."
Abstract
- Add to MetaCart
We present a generativefacer analyzed hidden Markov model (GFA-HMM) forautomatic speec recticJ)F In a standard HMM, observationvecerv are represented by mixture of Gaussians (MoG) that are dependent ondiscFfifiT valued hidden statesequencfi The GFA-HMMintroducE a hierarcJ ofc)AfiqSTJAS)))Tc latent representation of observationvecerva where latent vecntT in one level areacAq)qFTJAfiA dependent and latent vecntT in a higher level are acxI)ETJAEF) independent.

