## Continuous Word Recognition Based on the Stochastic Segment Model (1992)

Venue: | Proc. DARPA Workshop CSR |

Citations: | 9 - 2 self |

### BibTeX

@INPROCEEDINGS{Ostendorf92continuousword,

author = {Mari Ostendorf and Ashvin Kannan and Owen Kimball and J. Robin Rohlicek},

title = {Continuous Word Recognition Based on the Stochastic Segment Model},

booktitle = {Proc. DARPA Workshop CSR},

year = {1992},

pages = {53--58}

}

### OpenURL

### Abstract

This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segment-based acoustic model that uses a family of Gaussian distributions to characterize variable length segments; a divisive clustering technique for estimating robust context-dependent models; and recognition using the N-best rescoring formalism, which also provides a mechanism for combining different knowledge sources (e.g. SSM and HMM scores). Results are reported for the speaker-independent portion of the Resource Management Corpus, for both the SSM system and a combined BU-SSM/BBN-HMM system. 1. INTRODUCTION In the last decade, most of the research on continuous speech recognition has focused on different variations of hidden Markov models (HMMs), and the various efforts have led to significant improvements in recognition performance. However, some researchers have begun to ...

### Citations

72 |
Segregation of speakers for speech recognition and speaker identification
- Siu, Yu, et al.
- 1991
(Show Context)
Citation Context ... we wish to cluster together data which can be described with a common Gaussian distribution, we evaluate a two-way partition of data in a node according to a likelihood ratio test along the lines of =-=[7]-=- to choose between one of two hypotheses: ffl H 0 : the observations were generated from two different distributions (that represent the distributions of the child nodes), and ffl H 1 : the observatio... |

66 |
A Stochastic Segment Model for Phoneme-Based Continuous Speech Recognition
- Ostendorf, Roukos
- 1989
(Show Context)
Citation Context ...Resource Management task, and conclude with a summary of the key features of the system and a discussion of possible future developments. 2. GENERAL SSM DESCRIPTION The Stochastic Segment Model (SSM) =-=[1, 2]-=- is an alternative to the Hidden Markov Model (HMM) for representing variable-duration phonemes. The SSM provides a joint Gaussian model for a sequence of observations. Assuming each segment generates... |

34 |
Rohlicek, "Integration of Diverse Recognition Methodologies Through Reevaluation of NBest Sentence Hypotheses
- Ostendorf, Kannan, et al.
- 1991
(Show Context)
Citation Context ...g classes given simply by the left and right phone labels, while at the same time reducing the number of covariance parameters (and storage costs) by a factor of two. 4. N-BEST RESCORING FORMALISM In =-=[9]-=-, we introduced a general formalism for integrating different speech recognition methodologies using the N-best rescoring formalism. The rescoring formalism is reviewed below, followed by a descriptio... |

27 | Context Dependent Modeling of Phones in Continuous Speech Using Decision Trees
- Bahl, Souza, et al.
- 1991
(Show Context)
Citation Context ...ally motivated subsets. Recently, we have investigated the use of automatic clustering techniques to determine the classes for tying. This approach is motivated by previous work in context clustering =-=[5, 6]-=-, but differs from other approaches in that we cluster continuous rather than discrete distributions, in the specific clustering criterion used, and in that the goal of clustering is to determine clas... |

22 | Efficient, HighPerformance Algorithms for N-Best Search
- Schwartz, Austin
- 1990
(Show Context)
Citation Context ..., we consider only two systems here. The BBN Byblos system was used to generate the N-best hypotheses, and the Boston University SSM system was used to rescore the N hypotheses. The BBN Byblos system =-=[10, 11]-=- is a high performance HMM system that uses context-dependent models including cross-word-boundary contexts. The HMM observation densities are modeled by tied Gaussian mixtures. Word recognition by th... |

21 |
Allophone clustering for continuous speech recognition
- Lee, Hayamizu, et al.
- 1990
(Show Context)
Citation Context ...ally motivated subsets. Recently, we have investigated the use of automatic clustering techniques to determine the classes for tying. This approach is motivated by previous work in context clustering =-=[5, 6]-=-, but differs from other approaches in that we cluster continuous rather than discrete distributions, in the specific clustering criterion used, and in that the goal of clustering is to determine clas... |

18 |
Stochastic segment modeling using the estimate-maximize algorithm
- Roucos, Ostendorf, et al.
- 1988
(Show Context)
Citation Context ...Resource Management task, and conclude with a summary of the key features of the system and a discussion of possible future developments. 2. GENERAL SSM DESCRIPTION The Stochastic Segment Model (SSM) =-=[1, 2]-=- is an alternative to the Hidden Markov Model (HMM) for representing variable-duration phonemes. The SSM provides a joint Gaussian model for a sequence of observations. Assuming each segment generates... |

16 |
A dynamical system approach to continuous speech recognition
- DIGALAKIS, ROHLICEK, et al.
- 1993
(Show Context)
Citation Context ...for exploring issues associated with robust context modeling and word recognition system implementation, which will facilitate incorporation of acoustic models with less restrictive assumptions (e.g. =-=[3]). The par-=-ameter estimation algorithm for the SSM is an iterative procedure analogous to "Viterbi training" for HMMs, which involves iteratively finding the most likely segmentation and the maximum li... |

11 |
Context Modeling with the Stochastic Segment
- Kimball, Ostendorf, et al.
- 1992
(Show Context)
Citation Context ...therefore suffers from poorly estimated models for underrepresented contexts. To obtain robust estimates for context-dependent models in the SSM, covariance parameters are tied across similar classes =-=[4]-=-. Simple examples of classes for tying include left-context, right-context and hand-specified linguistically motivated subsets. Recently, we have investigated the use of automatic clustering technique... |

8 | BYBLOS Speech Recognition Benchmark Results
- Kubaia, Austin, et al.
- 1991
(Show Context)
Citation Context ..., we consider only two systems here. The BBN Byblos system was used to generate the N-best hypotheses, and the Boston University SSM system was used to rescore the N hypotheses. The BBN Byblos system =-=[10, 11]-=- is a high performance HMM system that uses context-dependent models including cross-word-boundary contexts. The HMM observation densities are modeled by tied Gaussian mixtures. Word recognition by th... |

7 |
Weight Estimation for N-Best Rescoring
- Kannan, Ostendorf, et al.
- 1992
(Show Context)
Citation Context ... [9] approached using Powell's method. However, we noticed that optimization was sensitive to the large number of local minima in the error function, and therefore introduced an alternative procedure =-=[12]-=-, reviewed below. We begin by evaluating the error function at a large number of points in the weight-space, specifically, on a multi-dimensional lattice spanning the range of probable weights to dete... |

3 |
Robust Estimation of Stochastic Segment Models for Word Recognition
- Kannan
- 1992
(Show Context)
Citation Context ...this quantity for all binary partitions allowed by the question set and over all terminal nodes, and then split the terminal node with the question that results in the largest reduction in distortion =-=[8]-=-. For the context clustering tree, it is assumed that all valid terminal nodes must have more than T c observations, where T c is an empirically determined threshold to indicate that a reliable covari... |

3 |
Recognition Using Classification and Segmentation Scoring
- Kimball, Ostendorf, et al.
- 1992
(Show Context)
Citation Context ...s of shared mixture distributions, and more effective use of the segmental framework either through time correlation modeling [3] and/or segmental features in a classification /segmentation framework =-=[13]-=-, and possibly unsupervised adaptation. ACKNOWLEDGMENTS The authors gratefully acknowledge BBN, especially George Zavaliagkos, for their help in providing the N best sentence hypotheses. We also thank... |