Results 1  10
of
220
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 598 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Maximum Likelihood Discriminant Feature Spaces
 in Proc. ICASSP
, 2000
"... Linear discriminant analysis (LDA) is known to be inappropriate for the case of classes with unequal sample covariances. In recent years, there has been an interest in generalizing LDA to heteroscedastic discriminant analysis (HDA) by removing the equal withinclass covariance constraint. This paper ..."
Abstract

Cited by 82 (17 self)
 Add to MetaCart
(Show Context)
Linear discriminant analysis (LDA) is known to be inappropriate for the case of classes with unequal sample covariances. In recent years, there has been an interest in generalizing LDA to heteroscedastic discriminant analysis (HDA) by removing the equal withinclass covariance constraint. This paper presents a new approach to HDA by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we will investigate the link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained ML projection for a full covariance gaussian model, the constraint being given by the maximization of the projected betweenclass scatter volume. It will be shown that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (MLLT) to the HDA space results in increased classification accuracy even though HDA alone actually...
Graphical models and automatic speech recognition
 Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract

Cited by 69 (13 self)
 Add to MetaCart
(Show Context)
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic, pronunciation, and languagemodeling levels. A number of speech recognition techniques born directly out of the graphicalmodels paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov modelbased speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Robust speakeradaptive HMMbased texttospeech synthesis
 IEEE Trans. on Audio, Speech and Language Processing
, 2009
"... Abstract—This paper describes a speakeradaptive HMMbased speech synthesis system. The new system, called “HTS2007, ” employs speaker adaptation (CSMAPLR+MAP), featurespace adaptive training, mixedgender modeling, and fullcovariance modeling using CSMAPLR transforms, in addition to several othe ..."
Abstract

Cited by 39 (14 self)
 Add to MetaCart
(Show Context)
Abstract—This paper describes a speakeradaptive HMMbased speech synthesis system. The new system, called “HTS2007, ” employs speaker adaptation (CSMAPLR+MAP), featurespace adaptive training, mixedgender modeling, and fullcovariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speakerdependent approaches with realistic amounts of speech data, and that it bears comparison with speakerdependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from lessthanideal speech data and synthesize goodquality speech even for outofdomain sentences. Index Terms—Average voice, HMMbased speech synthesis, HMM Speech Synthesis System, HTS, speaker adaptation, speech synthesis, voice conversion.
Factored sparse inverse covariance matrices
 In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing
, 2000
"... Most HMMbased speech recognition systems use Gaussian mixtures as observation probability density functions. An important goal in all such systems is to improve parsimony. One method is to adjust the type of covariance matrices used. In this work, factored sparse inverse covariance matrices are int ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
(Show Context)
Most HMMbased speech recognition systems use Gaussian mixtures as observation probability density functions. An important goal in all such systems is to improve parsimony. One method is to adjust the type of covariance matrices used. In this work, factored sparse inverse covariance matrices are introduced. Based on Í �Í factorization, the inverse covariance matrix can be represented using linear regressive coefficients which 1) correspond to sparse patterns in the inverse covariance matrix (and therefore represent conditional independence properties of the Gaussian), and 2), result in a method of partial tying of the covariance matrices without requiring nonlinear EM update equations. Results show that the performance of fullcovariance Gaussians can be matched by factored sparse inverse covariance Gaussians having significantly fewer parameters. 1.
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 37 (12 self)
 Add to MetaCart
(Show Context)
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Modeling inverse covariance matrices by basis expansion
 IEEE Transactions on Acoustic, Speech, and Signal Processing
"... ..."
Modeling With A Subspace Constraint On Inverse Covariance Matrices
 in Proc. ICSLP
, 2002
"... We consider a family of Gaussian mixture models for use in HMM based speech recognition system. These "SPAM" models have state independent choices of subspaces to which the precision (inverse covariance) matrices and means are restricted to belong. They provide a flexible tool for robust, ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
(Show Context)
We consider a family of Gaussian mixture models for use in HMM based speech recognition system. These "SPAM" models have state independent choices of subspaces to which the precision (inverse covariance) matrices and means are restricted to belong. They provide a flexible tool for robust, compact, and fast acoustic modeling. The focus of this paper is on the case where the means are unconstrained. The models in the case already generalize the recently introduced EMLLT models, which themselves interpolate between MLLT and full covariance models. We describe an algorithm to train both the statedependent and stateindependent parameters. Results are reported on one speech recognition task. The SPAM models are seen to yield significant improvements in accuracy over EMLLT models with comparable model size and runtime speed. We find a 10% relative reduction in error rate over an MLLT model can be obtained while decreasing the acoustic modeling time by 20%.
Joint uncertainty decoding for robust large vocabulary speech recognition
, 2006
"... Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean ” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather ..."
Abstract

Cited by 32 (28 self)
 Add to MetaCart
Standard techniques to increase automatic speech recognition noise robustness typically assume recognition models are clean trained. This “clean ” training data may in fact not be clean at all, but may contain channel variations, varying noise conditions, as well as different speakers. Hence rather than considering noise robustness techniques as compensating clean acoustic models for environmental noise, they may be thought of as reducing the acoustic mismatch between training and test conditions. This report examines the application of VTS model compensation or modelbased Joint uncertainty decoding to clean and multistyle trained systems. An EMbased noise estimation procedure is also presented to produce ML VTS or Joint noise models depending on the form of compensation used. Alternatively, compared to multistyle training, adaptive training with Joint uncertainty transforms, also referred to as JAT in this work, provides a better method for handling heterogeneous data. With JAT, the uncertainty bias added to the model variances deweights observations proportional to the noise level. In this way, Joint transforms normalise the noise from the data allowing the canonical model to solely represent the underlying “clean ” acoustic signal. This
The kaldi speech recognition toolkit
 In IEEE 2011 workshop
, 2011
"... Abstract—We describe the design of Kaldi, a free, opensource toolkit for speech recognition research. Kaldi provides a speech recognition system based on finitestate transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
(Show Context)
Abstract—We describe the design of Kaldi, a free, opensource toolkit for speech recognition research. Kaldi provides a speech recognition system based on finitestate transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phoneticcontext sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users. I.