Results 1  10
of
251
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 759 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
The kaldi speech recognition toolkit
 In IEEE 2011 workshop
, 2011
"... Abstract—We describe the design of Kaldi, a free, opensource toolkit for speech recognition research. Kaldi provides a speech recognition system based on finitestate transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition ..."
Abstract

Cited by 118 (14 self)
 Add to MetaCart
(Show Context)
Abstract—We describe the design of Kaldi, a free, opensource toolkit for speech recognition research. Kaldi provides a speech recognition system based on finitestate transducers (using the freely available OpenFst), together with detailed documentation and scripts for building complete recognition systems. Kaldi is written is C++, and the core library supports modeling of arbitrary phoneticcontext sizes, acoustic modeling with subspace Gaussian mixture models (SGMM) as well as standard Gaussian mixture models, together with all commonly used linear and affine transforms. Kaldi is released under the Apache License v2.0, which is highly nonrestrictive, making it suitable for a wide community of users. I.
Maximum Likelihood Discriminant Feature Spaces
 in Proc. ICASSP
, 2000
"... Linear discriminant analysis (LDA) is known to be inappropriate for the case of classes with unequal sample covariances. In recent years, there has been an interest in generalizing LDA to heteroscedastic discriminant analysis (HDA) by removing the equal withinclass covariance constraint. This paper ..."
Abstract

Cited by 95 (18 self)
 Add to MetaCart
(Show Context)
Linear discriminant analysis (LDA) is known to be inappropriate for the case of classes with unequal sample covariances. In recent years, there has been an interest in generalizing LDA to heteroscedastic discriminant analysis (HDA) by removing the equal withinclass covariance constraint. This paper presents a new approach to HDA by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we will investigate the link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained ML projection for a full covariance gaussian model, the constraint being given by the maximization of the projected betweenclass scatter volume. It will be shown that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (MLLT) to the HDA space results in increased classification accuracy even though HDA alone actually...
Analysis of speaker adaptation algorithms for HMMbased speech synthesis and a constrained SMAPLR adaptation algorithm
 IEEE Trans. Audio Speech Lang. Process
, 2009
"... Abstract—In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMMbased speech synthesis. We then propose a new adaptation algorithm called constrained structu ..."
Abstract

Cited by 87 (28 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we analyze the effects of several factors and configuration choices encountered during training and model construction when we want to obtain better and more stable adaptation in HMMbased speech synthesis. We then propose a new adaptation algorithm called constrained structural maximum a posteriori linear regression (CSMAPLR) whose derivation is based on the knowledge obtained in this analysis and on the results of comparing several conventional adaptation algorithms. Here, we investigate six major aspects of the speaker adaptation: initial models; the amount of the training data for the initial models; the transform functions, estimation criteria, and sensitivity of several linear regression adaptation algorithms; and combination algorithms. Analyzing the effect of the initial model, we compare speakerdependent models, genderindependent models, and the simultaneous use of the genderdependent models to single use of the genderdependent models. Analyzing the effect of the transform functions, we compare the transform function for only mean vectors with that for mean vectors and covariance matrices. Analyzing the effect of the estimation criteria, we compare the ML criterion with a robust estimation criterion called structural MAP. We evaluate the sensitivity of several thresholds for the piecewise linear regression algorithms and take up methods combining MAP adaptation with the linear regression algorithms. We incorporate these adaptation algorithms into our speech synthesis system and present several subjective and objective evaluation results showing the utility and effectiveness of these algorithms in speaker adaptation for HMMbased speech synthesis. Index Terms—Average voice, hidden Markov model (HMM)based speech synthesis, speaker adaptation, speech synthesis, voice conversion. I.
Graphical models and automatic speech recognition
 Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract

Cited by 77 (13 self)
 Add to MetaCart
(Show Context)
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic, pronunciation, and languagemodeling levels. A number of speech recognition techniques born directly out of the graphicalmodels paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov modelbased speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Robust speakeradaptive HMMbased texttospeech synthesis
 IEEE Trans. on Audio, Speech and Language Processing
, 2009
"... Abstract—This paper describes a speakeradaptive HMMbased speech synthesis system. The new system, called “HTS2007, ” employs speaker adaptation (CSMAPLR+MAP), featurespace adaptive training, mixedgender modeling, and fullcovariance modeling using CSMAPLR transforms, in addition to several othe ..."
Abstract

Cited by 58 (18 self)
 Add to MetaCart
(Show Context)
Abstract—This paper describes a speakeradaptive HMMbased speech synthesis system. The new system, called “HTS2007, ” employs speaker adaptation (CSMAPLR+MAP), featurespace adaptive training, mixedgender modeling, and fullcovariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speakerdependent approaches with realistic amounts of speech data, and that it bears comparison with speakerdependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from lessthanideal speech data and synthesize goodquality speech even for outofdomain sentences. Index Terms—Average voice, HMMbased speech synthesis, HMM Speech Synthesis System, HTS, speaker adaptation, speech synthesis, voice conversion.
Sequencediscriminative training of deep neural networks
 in Proc. INTERSPEECH
, 2013
"... Sequencediscriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria — maximum mutual information (MMI), minimum phone error (MPE), statelevel minimum Bayes risk (sMBR), and boo ..."
Abstract

Cited by 49 (4 self)
 Add to MetaCart
(Show Context)
Sequencediscriminative training of deep neural networks (DNNs) is investigated on a 300 hour American English conversational telephone speech task. Different sequencediscriminative criteria — maximum mutual information (MMI), minimum phone error (MPE), statelevel minimum Bayes risk (sMBR), and boosted MMI — are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequencebased criteria — lattices are regenerated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hypotheses are disjoint are removed from the gradient computation. Starting from a competitive DNN baseline trained using crossentropy, different sequencediscriminative criteria are shown to lower word error rates by 89 % relative, on average. Little difference is noticed between the different sequencebased criteria that are investigated. The experiments are done using the opensource Kaldi toolkit, which makes it possible for the wider community to reproduce these results. Index Terms: speech recognition, deep learning, sequencecriterion training, neural networks, reproducible research
Factored sparse inverse covariance matrices
 In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing
, 2000
"... Most HMMbased speech recognition systems use Gaussian mixtures as observation probability density functions. An important goal in all such systems is to improve parsimony. One method is to adjust the type of covariance matrices used. In this work, factored sparse inverse covariance matrices are int ..."
Abstract

Cited by 47 (10 self)
 Add to MetaCart
(Show Context)
Most HMMbased speech recognition systems use Gaussian mixtures as observation probability density functions. An important goal in all such systems is to improve parsimony. One method is to adjust the type of covariance matrices used. In this work, factored sparse inverse covariance matrices are introduced. Based on Í �Í factorization, the inverse covariance matrix can be represented using linear regressive coefficients which 1) correspond to sparse patterns in the inverse covariance matrix (and therefore represent conditional independence properties of the Gaussian), and 2), result in a method of partial tying of the covariance matrices without requiring nonlinear EM update equations. Results show that the performance of fullcovariance Gaussians can be matched by factored sparse inverse covariance Gaussians having significantly fewer parameters. 1.
Application of Pretrained Deep Neural Networks to Large Vocabulary Conversational Speech Recognition
, 2012
"... ..."
(Show Context)
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
(Show Context)
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings