## Speech Recognition with Dynamic Bayesian Networks (1998)

Citations: | 113 - 8 self |

### BibTeX

@TECHREPORT{Zweig98speechrecognition,

author = {Geoffrey Zweig and Stuart Russell},

title = {Speech Recognition with Dynamic Bayesian Networks},

institution = {},

year = {1998}

}

### Years of Citing Articles

### OpenURL

### Abstract

Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNs allows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to model the short-term correlations among multiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EM algorithm to learn models with up to 500,000 parameters. The use of structured DBN models decreased the error rate by 12 to 29% on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improved significantly on other published results for the same task. Th...

### Citations

1641 |
Fundamental of Speech Recognition
- Rabiner, Juang
- 1993
(Show Context)
Citation Context ...speech signal. Introduction Over the last twenty years, probabilistic models have emerged as the method of choice for large-scale speech recognition tasks in two dominant forms: hidden Markov models (=-=Rabiner & Juang 1993-=-), and neural networks with explicitly probabilistic interpretations (Bourlard & Morgan 1994; Robinson & Fallside 1991). Despite numerous successes in both isolatedword recognition and continuous spee... |

837 | Comparison of parametric representations for monosyllabic wired recognition in continuously spoken sentences
- Davis
(Show Context)
Citation Context ...s and telephone transmission characteristics." These characteristics make it a challenging data set. The data was processed in 25ms windows to generate 10 mel-frequency cepstral coefficients (MFC=-=Cs) (Davis & Mermelstein 1980-=-) and their derivatives every 8.4ms. MFCCs are generated by computing the power spectrum with an FFT; then the total energy in 20 different frequency ranges is computed. The cosine transform of the lo... |

515 | Factorial hidden markov models
- Ghahramani, Jordan
- 1997
(Show Context)
Citation Context ...an exponential decrease in the number of parameters required to represent a probability distribution. Often there is a concomitant decrease in the computational load (Smyth, Heckerman, & Jordan 1997; =-=Ghahramani & Jordan 1997-=-; Russell et al. 1995). More precisely, a Bayesian network represents a probability distribution over a set of random variables V = V 1 ; ::V n . The variables are connected by a directed acyclic grap... |

486 |
Connectionist Speech Recognition - A Hybrid Approach
- Bourlard, Morgan
- 1994
(Show Context)
Citation Context ...as the method of choice for large-scale speech recognition tasks in two dominant forms: hidden Markov models (Rabiner & Juang 1993), and neural networks with explicitly probabilistic interpretations (=-=Bourlard & Morgan 1994-=-; Robinson & Fallside 1991). Despite numerous successes in both isolatedword recognition and continuous speech recognition, both methodologies suffer from important deficiencies. HMMs use a single sta... |

230 |
The EM algorithm for graphical association models with missing data
- Lauritzen
- 1995
(Show Context)
Citation Context ...e probability of a set of observations is computed using an algorithm derived from (Peot & Shachter 1991). Conditional probabilities can be learned using gradient methods (Russell et al. 1995) or EM (=-=Lauritzen 1995-=-). We have adapted these algorithms for dynamic Bayesian networks, using special techniques to handle the deterministic variables that are a key feature of our speech models (see below). A full treatm... |

222 |
Automatic Speech Recognition: The Development of the SPHINX System
- Lee
- 1989
(Show Context)
Citation Context ...sues of parameter tying and phonetic transcriptions. keep the number of parameters manageable with these multiple observation streams, a further conditional independence assumption is typically made (=-=Lee 1989-=-): P (o i js i ) = Y j P (o j i js i ) Bayesian Networks A Bayesian network is a general way of representing joint probability distributions with the chain rule and conditional independence assumption... |

173 | Probabilistic Independence Networks for Hidden Markov Probability Models - Smyth, Heckerman, et al. - 1996 |

81 | Local learning in probabilistic networks with hidden variables
- Russell, Binder, et al.
- 1995
(Show Context)
Citation Context ... the number of parameters required to represent a probability distribution. Often there is a concomitant decrease in the computational load (Smyth, Heckerman, & Jordan 1997; Ghahramani & Jordan 1997; =-=Russell et al. 1995-=-). More precisely, a Bayesian network represents a probability distribution over a set of random variables V = V 1 ; ::V n . The variables are connected by a directed acyclic graph whose arcs specify ... |

61 |
PhoneBook: A phonetically-rich isolated-word telephone-speech database
- Pitrelli, Fong, et al.
- 1995
(Show Context)
Citation Context ... to reflect temporal continuity. Experimental Results Database and Task As a test-bed, we selected the Phonebook database, a large-vocabulary, isolated-word database compiled by researchers at NYNEX (=-=Pitrelli et al. 1995). Th-=-e words were chosen with the goal of "incorporating all phonemes in as many segmental/stress contexts as are likely to produce coarticulatory variations, while also spanning a variety of talkers ... |

28 | Hybrid HMM/ANN systems for training independent tasks - Dupont, Bourlard, et al. - 1997 |

25 |
A model for reasoning about persistence and causation. Computational Intelligence 5:142–150
- Dean, Kanazawa
- 1989
(Show Context)
Citation Context ...ans, as is often done in HMM systems. When the variables represent a temporal sequence and are thus ordered in time, the resulting Bayesian network is referred to as a dynamic Bayesian network (DBN) (=-=Dean & Kanazawa 1989-=-). These networks maintain values for a set of variables X i at each point in time. X ij represents the value of the ith variable at time j. These variables are partitioned into equivalence sets that ... |

9 |
Fallside, A recurrent error propagation speechrecognition system
- Robinson, F
- 1991
(Show Context)
Citation Context ...for large-scale speech recognition tasks in two dominant forms: hidden Markov models (Rabiner & Juang 1993), and neural networks with explicitly probabilistic interpretations (Bourlard & Morgan 1994; =-=Robinson & Fallside 1991-=-). Despite numerous successes in both isolatedword recognition and continuous speech recognition, both methodologies suffer from important deficiencies. HMMs use a single state variable to encode all ... |

6 |
Fusion and propagation with multiple observations
- PEOT, SHACHTER
- 1991
(Show Context)
Citation Context ...orithms. As with HMMs, there are standard algorithms for computing with Bayesian networks. In our implementation, the probability of a set of observations is computed using an algorithm derived from (=-=Peot & Shachter 1991-=-). Conditional probabilities can be learned using gradient methods (Russell et al. 1995) or EM (Lauritzen 1995). We have adapted these algorithms for dynamic Bayesian networks, using special technique... |

6 | Compositional modelling with DPNs
- Zweig, Russell
- 1997
(Show Context)
Citation Context ...q). Figure 2 illustrates a DBN structured for speech recognition in this manner. In the following two sections, we discuss the pronunciation model and acoustic model in turn. Pronunciation Model. In (=-=Zweig & Russell 1997-=-; Zweig 1998), it is shown that the DBN model structure we use can represent any distribution over phone sequences that can be represented by an HMM. For the purposes of simplifying the presentation i... |