Results 1  10
of
13
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 564 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
The graphical models toolkit: An open source software system for speech and timeseries processing
 In Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing
, 2002
"... This paper describes the Graphical Models Toolkit (GMTK), an open source, publically available toolkit for developing graphicalmodel based speech recognition and general time series systems. Graphical models are a flexible, concise, and expressive probabilistic modeling framework with which one may ..."
Abstract

Cited by 105 (28 self)
 Add to MetaCart
This paper describes the Graphical Models Toolkit (GMTK), an open source, publically available toolkit for developing graphicalmodel based speech recognition and general time series systems. Graphical models are a flexible, concise, and expressive probabilistic modeling framework with which one may rapidly specify a vast collection of statistical models. This paper begins with a brief description of the representational and computational aspects of the framework. Following that is a detailed description of GMTK’s features, including a language for specifying structures and probability distributions, logarithmic space exact training and decoding procedures, the concept of switching parents, and a generalized EM training method which allows arbitrary subGaussian parameter tying. Taken together, these features endow GMTK with a degree of expressiveness and functionality that significantly complements other publically available packages. GMTK was recently used in the 2001 Johns Hopkins Summer Workshop, and experimental results are described in detail both herein and in a companion paper. 1.
LatticeBased Unsupervised MLLR For Speaker Adaptation
 Proc. ISCA ITRW ASR2000
, 2000
"... In this paper we explore the use of latticebased information for unsupervised speaker adaptation. As initially formulated, maximum likelihood linear regression (MLLR) aims to linearly transform the means of the gaussian models in order to maximize the likelihood of the adaptation data given the cor ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In this paper we explore the use of latticebased information for unsupervised speaker adaptation. As initially formulated, maximum likelihood linear regression (MLLR) aims to linearly transform the means of the gaussian models in order to maximize the likelihood of the adaptation data given the correct hypothesis (supervised MLLR) or the decoded hypothesis (unsupervised MLLR). For the latter, if the firstpass decoded hypothesis is extremely erroneous (as it is the case for large vocabulary telephony applications) MLLR will often find a transform that increases the likelihood for the incorrect models, and may even lower the likelihood of the correct hypothesis. Since the oracle word error rate of a lattice is much lower than that of the 1best or Nbest hypotheses, by performing adaptation against a word lattice, the correct models are more likely to be used in estimating the transform. Furthermore, the particular MAP lattice that we propose enables the use of a natural confidence mea...
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of
A TIMESYNCHRONOUS PHONETIC DECODER FOR A LONGCONTEXTUALSPAN HIDDEN TRAJECTORY MODEL
"... A novel timesynchronous decoder, designed specifically for a Hidden Trajectory Model (HTM) whose likelihood score computation depends on longspan phonetic contexts, is presented. HTM is a recently developed acoustic model aimed to capture the underlying dynamic structure of speech coarticulation a ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
A novel timesynchronous decoder, designed specifically for a Hidden Trajectory Model (HTM) whose likelihood score computation depends on longspan phonetic contexts, is presented. HTM is a recently developed acoustic model aimed to capture the underlying dynamic structure of speech coarticulation and reduction using a compact set of parameters. The longspan nature of the HTM had posed a great technical challenge for developing efficient search algorithms for full evaluation of the model. Taking on the challenge, the decoding algorithm is developed to deal effectively with the exponentially increased search space by HTMspecific techniques for hypothesis representation, wordending recombination, and hypothesis pruning. Experimental results obtained on the TIMIT phonetic recognition task are reported, extending our earlier HTM evaluation paradigms based on Nbest and A * lattice rescoring.
Clustering WideContexts and HMM Topologies for Spontaneous Speech Recognition
, 2001
"... In most speech recognition systems today, all the acoustic variation associated with a phoneme is characterized in terms of the identity of its neighboring phonemes. The neighbors influence only the state observation density of a fixed Hidden Markov Model. Other sources of variation are captured imp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In most speech recognition systems today, all the acoustic variation associated with a phoneme is characterized in terms of the identity of its neighboring phonemes. The neighbors influence only the state observation density of a fixed Hidden Markov Model. Other sources of variation are captured implicitly by using Gaussian mixture models for the state observations. Consequently, these models can be very broad, particularly for casual spontaneous speech. In this thesis, we explore conditioning of phonemes on higher level linguistic structure, specifically syllable and wordlevel structure to learn models for phonemes that are more specific to the context, reporting experimental results on a large vocabulary (35k words) conversational speech task (Switchboard). In particular, this thesis makes three main contributions related to wide context conditioning. First, we demonstrate that syllable and wordlevel structure can be incorporated into current acoustic models to improve recognition accuracy over triphones. For a fixed number of parameters, these models are computationally more efficient than pentaphones, both in training and in testing. In addition, use of syllable and word features leads to a small but significant improvement in performance. The widecontexts used in our acoustic model can implicitly capture resyllabification effects to a certain extent. However, we find that explicitly modeling resyllabification does not improve recognition further, because there are only a small number of phones that exhibit acoustic difference after resyllabification. The second contribution addresses the difficulties that arise when a large number of additional conditioning features are used. As the number of conditioning features increases, the training cost can increase exponentially. Moreover, a large fraction of the training labels tends to have too few examples to have reliable statistics associated with them, and this could potentially cause decision trees to learn bad clusters. A new method has been developed for clustering with multiple stages, where each stage clusters a different subset of features, and also has a choice of using the partitions learned in the previous stages. Apart from reducing the risk of unreliable statistics, it is designed to ameliorate data fragmentation problem and is computationally less expensive. This method was successfully demonstrated with pentaphones, resulting in equivalent performance at a lower cost. Finally, a new algorithm is described to design contextspecific HMMs. The idea is to model reduction of a phone for certain contexts, and to learn a more constrained topology. Using contextual information, the algorithm clusters HMM paths where each path has a different number of states. An HMM distance measure has been formulated to prune out the paths which are similar. During decoding, the paths are allocated dynamically for each subword unit according to their context. We investigated this algorithm to model phone topologies, finding improved characterization of speech given known word sequences but no significant improvement in word error rate.
Proposed design for gR, a graphical models toolkit for R
, 2003
"... This document is an extension of the talk I gave at the gR meeting in Aalborg on 19 September 2003. It outlines a proposed design for gR, a graphical models library for R. This design is similar to the design of BNT, but is much more general, in that it supports undirected models and chain graphs, a ..."
Abstract
 Add to MetaCart
This document is an extension of the talk I gave at the gR meeting in Aalborg on 19 September 2003. It outlines a proposed design for gR, a graphical models library for R. This design is similar to the design of BNT, but is much more general, in that it supports undirected models and chain graphs, and allows parameters to be represented as random variables (Bayesian modeling).
Automatic Speech Recognition using Dynamic Bayesian Networks
"... New ideas to improve automatic speech recognition have been proposed that make use of context user information such as gender, age and dialect. To incorporate this information into a speech recognition system a new framework is being developed at the mmi department of the ewi faculty at the Delft Un ..."
Abstract
 Add to MetaCart
New ideas to improve automatic speech recognition have been proposed that make use of context user information such as gender, age and dialect. To incorporate this information into a speech recognition system a new framework is being developed at the mmi department of the ewi faculty at the Delft University of Technology. This toolkit is called Gaia and makes use of Dynamic Bayesian networks. In this thesis a basic speech recognition system was built using Gaia to test if speech recognition is possible using Gaia and dbns. dbn models were designed for the acoustic model, language model and training part of the speech recognizer. Experiments using a small data set proved that speech recognition is possible using Gaia. Other results showed that training using Gaia is not working yet. This issue needs to be
GMTK: The Graphical Models Toolkit
"... 0.2 Representation and Computation.............................. 5 ..."