Results 1  10
of
43
Learning in graphical models
, 2004
"... Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for ..."
Abstract

Cited by 612 (11 self)
 Add to MetaCart
Statistical applications in fields such as bioinformatics, information retrieval, speech processing, image processing and communications often involve largescale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in largescale data analysis problems. We also present examples of graphical models in bioinformatics, errorcontrol coding and language processing. Key words and phrases: Probabilistic graphical models, junction tree algorithm, sumproduct algorithm, Markov chain Monte Carlo, variational inference, bioinformatics, errorcontrol coding.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 564 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
Linear Time Inference in Hierarchical HMMs
 In Proceedings of Neural Information Processing Systems
, 2001
"... The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original inference algorithm is rather complicated, and takes O(T ) time, where T is the length of the s ..."
Abstract

Cited by 87 (4 self)
 Add to MetaCart
The hierarchical hidden Markov model (HHMM) is a generalization of the hidden Markov model (HMM) that models sequences with structure at many length/time scales [FST98]. Unfortunately, the original inference algorithm is rather complicated, and takes O(T ) time, where T is the length of the sequence, making it impractical for many domains. In this paper, we show how HHMMs are a special kind of dynamic Bayesian network (DBN), and thereby derive a much simpler inference algorithm, which only takes O(T ) time. Furthermore, by drawing the connection between HHMMs and DBNs, we enable the application of many standard approximation techniques to further speed up inference.
HiddenArticulator Markov Models For Speech Recognition
 In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing
, 2000
"... In traditional speech recognition using Hidden Markov Models (HMMs), each state represents an acoustic portion of a phoneme. We explore the concept of an articulator based HMM, where each state represents a particular articulatory configuration [Erler 1996]. In this paper, we present a novel articul ..."
Abstract

Cited by 85 (20 self)
 Add to MetaCart
In traditional speech recognition using Hidden Markov Models (HMMs), each state represents an acoustic portion of a phoneme. We explore the concept of an articulator based HMM, where each state represents a particular articulatory configuration [Erler 1996]. In this paper, we present a novel articulatory feature mapping and a new technique for model initialization. In addition, we use diphone modeling which allows context dependent training of transition probabilities. Our goal is to confirm that articulatory knowledge can assist speech recognition. We demonstrate this by showing that our mapping of articulatory configurations to phonemes performs better than random mappings. Furthermore, we demonstrate the practicality of the model by showing that, in combination with a standard model, a 1221% relative word error rate decrease occurs relative to the standard model alone. 1. INTRODUCTION Hidden Markov Models (HMMs) are a popular approach for speech recognition. Commonly, a lefttor...
Graphical models and automatic speech recognition
 Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract

Cited by 67 (13 self)
 Add to MetaCart
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic, pronunciation, and languagemodeling levels. A number of speech recognition techniques born directly out of the graphicalmodels paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov modelbased speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Moving Beyond the `BeadsOnAString' Model of Speech
 In Proc. IEEE ASRU Workshop
, 1999
"... The notion that a word is composed of a sequence of phone segments, sometimes referred to as `beads on a string', has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
The notion that a word is composed of a sequence of phone segments, sometimes referred to as `beads on a string', has formed the basis of most speech recognition work for over 15 years. However, as more researchers tackle spontaneous speech recognition tasks, that view is being called into question. This paper raises problems with the phoneme as the basic subword unit in speech recognition, suggesting that finergrained control is needed to capture the sort of pronunciation variability observed in spontaneous speech. We offer two different alternatives  automatically derived subword units and linguistically motivated distinctive feature systems  and discuss current work in these directions. In addition, we look at problems that arise in acoustic modeling when trying to incorporate higherlevel structure with these two strategies. 1. INTRODUCTION It has often been noted that automatic speech recognition performance is much worse on spontaneous speech than on carefully planned or r...
Bayesian Clustering by Dynamics
, 2001
"... This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase efficiency, the method uses an entropy ..."
Abstract

Cited by 56 (7 self)
 Add to MetaCart
This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase efficiency, the method uses an entropybased heuristic search strategy. A controlled experiment suggests that the method is very accurate when applied to articial time series in a broad range of conditions and, when applied to clustering sensor data from mobile robots, it produces clusters that are meaningful in the domain of application.
HMMs and Coupled HMMs for Multichannel EEG Classification
, 2002
"... A variety of Coupled HMMs (CHMMs) have recently been proposed as extensions of HMM to better characterize multiple interdependent sequences. This paper introduces a novel distance coupled HMM. It then compares the performance of several HMM and CHMM models for a multichannel EEG classification prob ..."
Abstract

Cited by 31 (6 self)
 Add to MetaCart
A variety of Coupled HMMs (CHMMs) have recently been proposed as extensions of HMM to better characterize multiple interdependent sequences. This paper introduces a novel distance coupled HMM. It then compares the performance of several HMM and CHMM models for a multichannel EEG classification problem. The results show that, of all approaches examined, the multivariate HMM that has low computational complexity surprisingly outperforms all other models.
What HMMs can do
, 2002
"... Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most stateoftheart speech systems are HMMbased. There have been a number of ways to explain HMMs and to list their capabil ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Since their inception over thirty years ago, hidden Markov models (HMMs) have have become the predominant methodology for automatic speech recognition (ASR) systems — today, most stateoftheart speech systems are HMMbased. There have been a number of ways to explain HMMs and to list their capabilities, each of these ways having both advantages and disadvantages. In an effort to better understand what HMMs can do, this tutorial analyzes HMMs by exploring a novel way in which an HMM can be defined, namely in terms of random variables and conditional independence assumptions. We prefer this definition as it allows us to reason more throughly about the capabilities of HMMs. In particular, it is possible to deduce that there are, in theory at least, no theoretical limitations to the class of probability distributions representable by HMMs. This paper concludes that, in search of a model to supersede the HMM for ASR, we should rather than trying to correct for HMM limitations in the general case, new models should be found based on their potential for better parsimony, computational requirements, and noise insensitivity.
HiddenArticulator Markov Models: Performance Improvements And Robustness To Noise
 in Proc. ICSLP
, 2000
"... A HiddenArticulator Markov Model (HAMM) is a Hidden Markov Model (HMM) in which each state represents an articulatory configuration. Articulatory knowledge, known to be useful for speech recognition [4], is represented by specifying a mapping of phonemes to articulatory configurations; vocal tract ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
A HiddenArticulator Markov Model (HAMM) is a Hidden Markov Model (HMM) in which each state represents an articulatory configuration. Articulatory knowledge, known to be useful for speech recognition [4], is represented by specifying a mapping of phonemes to articulatory configurations; vocal tract dynamics are represented via transitions between articulatory configurations. In previous work [13], we extended the articulatoryfeature model introduced by Erler [7] by using diphone units and a new technique for model initialization. By comparing it with a purely random model, we showed that the HAMM can take advantage of articulatory knowledge. In this paper, we extend that work in three ways. First, we decrease the number of parameters, making it comparable in size to standard HMMs. Second, we evaluate our model in noisy contexts, verifying that articulatory knowledge can provide benefits in adverse acoustic conditions. Third, we use a corpus of sideby side speech and articulator tra...