## The graphical models toolkit: An open source software system for speech and time-series processing (2002)

### Cached

### Download Links

Venue: | In Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing |

Citations: | 109 - 28 self |

### BibTeX

@INPROCEEDINGS{Bilmes02thegraphical,

author = {Jeff Bilmes and Geoffrey Zweig},

title = {The graphical models toolkit: An open source software system for speech and time-series processing},

booktitle = {In Proceedings of IEEE Int. Conf. Acoust., Speech, and Signal Processing},

year = {2002},

pages = {3916--3919}

}

### Years of Citing Articles

### OpenURL

### Abstract

This paper describes the Graphical Models Toolkit (GMTK), an open source, publically available toolkit for developing graphical-model based speech recognition and general time series systems. Graphical models are a flexible, concise, and expressive probabilistic modeling framework with which one may rapidly specify a vast collection of statistical models. This paper begins with a brief description of the representational and computational aspects of the framework. Following that is a detailed description of GMTK’s features, including a language for specifying structures and probability distributions, logarithmic space exact training and decoding procedures, the concept of switching parents, and a generalized EM training method which allows arbitrary sub-Gaussian parameter tying. Taken together, these features endow GMTK with a degree of expressiveness and functionality that significantly complements other publically available packages. GMTK was recently used in the 2001 Johns Hopkins Summer Workshop, and experimental results are described in detail both herein and in a companion paper. 1.

### Citations

7334 |
Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
- Pearl
- 1988
(Show Context)
Citation Context ...hidden. The name hidden Markov model, for example, results from there being a Markov chain consisting only of hidden variables. The first version of GMTK uses the semantics of Bayesian networks (BNs) =-=[15, 13]. This mea-=-ns that the graphs are directed, and conditional independence properties are determined by the notion of "d-separation" [15]. Using d-separation one may read off conditional independence sta... |

1140 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...phical models, such as Bayesian networks (a type of directed graphical model), Markov random fields (undirected models), causal models, chain graphs, and so on. Each type has its own formal semantics =-=[14]-=- for specifying conditional independence relations. Only along with its agreed upon semantics does a GM precisely specify conditional independence properties. Variables in a GM may either be observed ... |

953 |
An Introduction to Bayesian Networks
- Jensen
- 1996
(Show Context)
Citation Context ...hidden. The name hidden Markov model, for example, results from there being a Markov chain consisting only of hidden variables. The first version of GMTK uses the semantics of Bayesian networks (BNs) =-=[15, 13]. This mea-=-ns that the graphs are directed, and conditional independence properties are determined by the notion of "d-separation" [15]. Using d-separation one may read off conditional independence sta... |

366 | The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Hirsch, Pearce
- 2000
(Show Context)
Citation Context ...0 89.2 66.4 21.5 GMTK-PH 99.1 98.3 97.2 94.9 86.4 54.9 2.80 HP 98.5 97.3 96.2 93.6 85.0 57.6 24.0 Table 1. Word recognition rates: baseline GMTK emulating an HMM system as function of SNR. HP is from =-=[12]-=-. from parent values to child distribution is specified using a decision tree, allowing a sparse representation of this mapping. A vector observation variable spans over a region of the feature vector... |

306 |
The generalized distributive law
- AJI, MCELIECE
- 2000
(Show Context)
Citation Context ...ssuming independence relations making the computation probabilistically valid. Graphical models use inference algorithms (e.g., the junction-tree algorithm [15, 13] or the generalized distributed law =-=[5]-=-) that provably correspond to valid calculations on probabilistic equations. These algorithms essentially distribute summations to the right into products as efficiently as possible, as above. There a... |

173 | Probabilistic independence networks for hidden markov probability models
- Smyth, Heckerman, et al.
- 1997
(Show Context)
Citation Context ...R techniques --- no other known statistical abstraction appears to have this property. For example, it has been shown that the standard HMM Baum-Welch algorithm is only a special case of GM inference =-=[16]-=-. More importantly, the space of statistical algorithms representable with a GM is enormous; much larger than what has so far been explored for ASR. The time therefore seems ripe to start seriously ex... |

113 | Speech recognition with dynamic bayesian networks
- Zweig, Russell
- 1998
(Show Context)
Citation Context ...ure also represents constraints on variable values and sequences of values (such as valid phone-sequences in a speech-recognizer). These representational aspects of graphical modeling are detailed in =-=[18, 6]-=-. Lastly, ASR is inherently a problem of pattern classification, and requires statistical models to discriminate between different speech utterances. Apart from discriminatively learned model paramete... |

111 |
Probabilistic temporal reasoning
- Dean, Kanazawa
- 1988
(Show Context)
Citation Context ...s value, given the values of its parents in the graph. Speech is a time signal, and any GM intending to model speech must somehow take this into account. Accordingly, dynamic Bayesian networks (DBNs) =-=[11]-=- are Bayesian networks which include directed edges pointing in the direction of time. Other than the existence of time-edges, DBNs have the same semantics as other BNs. The structure of a graphical m... |

90 | Hidden-articulator Markov models for speech recognition,” ITRW ASR2000
- Richardson, Bilmes, et al.
- 2000
(Show Context)
Citation Context ...atrices [9]. GMTK can also represent arbitrary switching dependencies between individual elements of successive observation vectors. GMTK thus supports both linear and non-linear buried Markov models =-=[7]-=-. All in all, GMTK supports an extremely rich set of observation distributions. 4. EXPERIMENTAL VALIDATION This section validates GMTK by producing a GMTK-based ASR system for the Aurora2.0 noisy digi... |

62 | Dynamic Bayesian Multinets
- Bilmes
- 2000
(Show Context)
Citation Context ...terances. Apart from discriminatively learned model parameters (such as means, variances, or transition matrices), graphical models are ideally suited for experimenting with discriminative structures =-=[8, 19]-=-. 2.2. Computation Probabilistic inference, such as evaluating (or computing the most likely value of) a conditional distribution, is the foundation behind all statistical computing. Graphical models ... |

54 | Natural Statistical Models for Automatic Speech Recognition
- Bilmes
- 1998
(Show Context)
Citation Context ...ure also represents constraints on variable values and sequences of values (such as valid phone-sequences in a speech-recognizer). These representational aspects of graphical modeling are detailed in =-=[18, 6]-=-. Lastly, ASR is inherently a problem of pattern classification, and requires statistical models to discriminate between different speech utterances. Apart from discriminatively learned model paramete... |

38 | Sparse Inverse Covariance Matrices
- Bilmes, “Factor
- 2000
(Show Context)
Citation Context ...ctorization orderings, and all subsets of parents in any of these factorizations. Under this framework, GMTK supports diagonal, full, banded, and semi-tied factored sparse inverse covariance matrices =-=[9]-=-. GMTK can also represent arbitrary switching dependencies between individual elements of successive observation vectors. GMTK thus supports both linear and non-linear buried Markov models [7]. All in... |

32 | Space-efficient inference in dynamic probabilistic networks
- Binder, Murphy, et al.
- 1997
(Show Context)
Citation Context ... inference, which stores all clique values for all time, would result in (an obviously prohibitive) gigabytes of required storage To avoid this problem, GMTK implements a recently developed procedure =-=[10, 20]-=- that reduces memory requirements exponentially from O(T ) to O(log T ). This reduction has a truly dramatic effect on memory usage, and can additionally be combined with GMTK's beam-pruning procedure... |

32 | Probabilistic modeling with bayesian networks for automatic speech recognition
- Zweig, Russell
- 1998
(Show Context)
Citation Context ...eling framework derives from the fact that these algorithms work with any graph structure, and a wide variety of conditional probability representations. GMTK uses the Frontier Algorithm, detailed in =-=[18, 21]-=-, which converts arbitrary graphs into equivalent chain-structured ones, and then executes a forwards-backwards recursion. The chain structure is particularly advantageous because it supports beamprun... |

17 | Exact Alpha-Beta Computation in Logarithmic Space with Application to - Zweig, Padmanabhan - 2000 |

16 | Structurally discriminative graphical models for automatic speech recognition: Results from the 2001 johns hopkins summer workshop
- Zweig, Bilmes, et al.
- 2002
(Show Context)
Citation Context ...4] packages. This paper provides an overview of GMTK, its notation, algorithms, main features, and reports baseline GMTK results. Research-related results are described in detail in a companion paper =-=[19]-=- which describes how GMTK was used in a recent Johns Hopkins University summer workshop. Section 2 describes the main representational ability of graphical models including the meanings of graphs, fac... |