## The Stochastic Segment Model for Continuous Speech Recognition (1991)

Venue: | In Proceedings The 25th Asilomar Conference on Signals, Systems and Computers |

Citations: | 5 - 1 self |

### BibTeX

@INPROCEEDINGS{Digalakis91thestochastic,

author = {Ostendorf Digalakis and M. Ostendorf and V. Digalakis},

title = {The Stochastic Segment Model for Continuous Speech Recognition},

booktitle = {In Proceedings The 25th Asilomar Conference on Signals, Systems and Computers},

year = {1991},

pages = {964--968}

}

### Years of Citing Articles

### OpenURL

### Abstract

A new direction in speech recognition via statistical methods is to move from frame-based models, such as Hidden Markov Models (HMMs), to segment-based models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint model for a sequence of observations, which provides explicit modeling of time correlation as well as a formalism for incorporating segmental features. In this work, the focus is on modeling time correlation within a segment. We consider three Gaussian model variations based on different assumptions about the form of statistical dependency, including a Gauss-Markov model, a dynamical system model and a target state model, all of which can be formulated in terms of the dynamical system model. Evaluation of the different modeling assumptions is in terms of both phoneme classification performance and the predictive power of linear models. 1 Introduction Most of the existing speaker-independent ...

### Citations

237 |
Estimating optimal transformations for multiple regression and correlation
- Breiman, Friedman
- 1985
(Show Context)
Citation Context ...linear regression in explaining the variance of a particular observation within a segment to the performance of a nonlinear regression based on the alternating conditional expectation (ACE) algorithm =-=[3]-=-. The ACE method estimates a set of nonlinear functions for each of the dependent and independent variables so that the percentage of the variance not explained by a linear regression on the transform... |

137 |
Speech Database Development: Design and Analysis of the Acoustic-Phonetic Corpus
- Lamel, Kassel, et al.
- 1986
(Show Context)
Citation Context ...s of a segment. Experimental Results We have implemented a system based on our correlation invariance assumption and performed phoneme classification and recognition experiments on the TIMIT database =-=[10]-=-. The feature vectors included Mel-warped cepstra and their derivatives, computed from a 20 ms window at 100 frames per second. Each segment model had five different distributions (timeinvariant regio... |

98 |
Context-Dependent Phonetic Hidden Markov Models for Speaker-Independent Continuous Speech Recognition
- Lee
- 1990
(Show Context)
Citation Context ...ique has yielded good results experimentally: a 37% reduction in word recognition error with a combined left and right context model, which is comparable to error reduction observed for discrete HMMs =-=[11]-=-. The context-dependent word recognition results confirm that the independent-frame SSM can achieve performance comparable to HMMs, and therefore we may be able to improve performance by explicitly mo... |

66 |
A Stochastic Segment Model for Phoneme-Based Continuous Speech Recognition
- Ostendorf, Roukos
- 1989
(Show Context)
Citation Context ...by researchers at MIT (e.g., [13]). The segment-based approach is also a natural formalism for neural network classifiers, as shown in [12, 1]. Our work is based on the Stochastic Segment Model (SSM) =-=[16, 17]-=-, which provides a joint Gaussian model for a sequence of observations. Each segment generates an observation sequence fy t g L 1 of random length L, according to the density b ff;L (y 1 ; y 2 ; : : :... |

42 |
A linear predictive HMM for vector-valued observations with applications to speech recognition
- Kenny, Lennig, et al.
- 1990
(Show Context)
Citation Context ...n the independentframe model. The Gauss-Markov segment model is analogous to an HMM system which assumes that output distributions are conditioned on the current state and the past observation, as in =-=[4, 9]-=-. As for the HMM systems, our experiments showed that the Gauss-Markov model was useful for cepstral features alone, but performance degraded below that of the independent frame model with the additio... |

24 |
The acoustic modelling problem in automatic speech recognition
- Brown
- 1987
(Show Context)
Citation Context ...n the independentframe model. The Gauss-Markov segment model is analogous to an HMM system which assumes that output distributions are conditioned on the current state and the past observation, as in =-=[4, 9]-=-. As for the HMM systems, our experiments showed that the Gauss-Markov model was useful for cepstral features alone, but performance degraded below that of the independent frame model with the additio... |

20 |
Speech recognition using segmental neural nets
- Austin, Zavaliagkos, et al.
- 1992
(Show Context)
Citation Context ...duration. Segmental features have been more extensively explored by researchers at MIT (e.g., [13]). The segment-based approach is also a natural formalism for neural network classifiers, as shown in =-=[12, 1]-=-. Our work is based on the Stochastic Segment Model (SSM) [16, 17], which provides a joint Gaussian model for a sequence of observations. Each segment generates an observation sequence fy t g L 1 of r... |

18 |
Network-based connected digit recognition
- Bush, Kopec
- 1987
(Show Context)
Citation Context ... implications of these results for future work in acoustic modeling. 2 Overview of Segment-Based Models Perhaps the first segment-based model used in speech recognition was proposed by Bush and Kopec =-=[5]-=-. Their approach used vector quantizers designed for specific phone segments, with a recognition search based on minimum quantization error. Although the model was proposed to provide a formalism for ... |

18 |
Stochastic segment modeling using the estimate-maximize algorithm
- Roucos, Ostendorf, et al.
- 1988
(Show Context)
Citation Context ...by researchers at MIT (e.g., [13]). The segment-based approach is also a natural formalism for neural network classifiers, as shown in [12, 1]. Our work is based on the Stochastic Segment Model (SSM) =-=[16, 17]-=-, which provides a joint Gaussian model for a sequence of observations. Each segment generates an observation sequence fy t g L 1 of random length L, according to the density b ff;L (y 1 ; y 2 ; : : :... |

16 |
A dynamical system approach to continuous speech recognition
- DIGALAKIS, ROHLICEK, et al.
- 1993
(Show Context)
Citation Context ...hin the segment, as for the original segment model. The different regions are defined by a fixed time warping; linear time warping has been used thus far. The dynamical system model was introduced in =-=[8]-=-, where training and recognition algorithms are described. Training is equivalent to the maximum likelihood identification of a stochastic dynamical system, and we have developed a simple alternative ... |

11 |
Context Modeling with the Stochastic Segment
- Kimball, Ostendorf, et al.
- 1992
(Show Context)
Citation Context ...hat different from the traditional HMM techniques. Our distribution estimates are based on the assumption that covariance matrices of different distributions are tied across models of similar context =-=[2]-=-. This technique has yielded good results experimentally: a 37% reduction in word recognition error with a combined left and right context model, which is comparable to error reduction observed for di... |

10 |
Phonetic classification using multi-layer perceptrons
- Leung, Zue
- 1990
(Show Context)
Citation Context ...duration. Segmental features have been more extensively explored by researchers at MIT (e.g., [13]). The segment-based approach is also a natural formalism for neural network classifiers, as shown in =-=[12, 1]-=-. Our work is based on the Stochastic Segment Model (SSM) [16, 17], which provides a joint Gaussian model for a sequence of observations. Each segment generates an observation sequence fy t g L 1 of r... |

10 |
Signal representation comparison for phonetic classification
- Meng, Zue
(Show Context)
Citation Context ...orporating segmental features, the only feature they found to improve recognition performance was segment duration. Segmental features have been more extensively explored by researchers at MIT (e.g., =-=[13]-=-). The segment-based approach is also a natural formalism for neural network classifiers, as shown in [12, 1]. Our work is based on the Stochastic Segment Model (SSM) [16, 17], which provides a joint ... |

9 |
Hidden markov modeling using the most likely state sequence
- Merhav, Ephraim
- 1991
(Show Context)
Citation Context ...chanisms for modeling the temporal dynamics of a phoneme, or the statistical dependency of features over time, in the segmental framework. It has been observed experimentally (and shown theoretically =-=[14]-=-) that the joint likelihood of a particular state and observation sequence is dominated by the terms of the output distributions as the dimension of the feature vector increases; hence, the hidden Mar... |

7 | Fast Search Algorithms for Connected Phone Recognition Using the Stochastic Segment Model
- Digalakis, Ostendoff, et al.
- 1990
(Show Context)
Citation Context ...general acoustic models. Of course, the use of segmental models incurs the cost of more complex recognition search algorithms. However, various mechanisms have been developed to overcome this problem =-=[7, 15]-=-. The focus of this paper is on mechanisms for modeling the temporal dynamics of a phoneme, or the statistical dependency of features over time, in the segmental framework. It has been observed experi... |

4 |
Rohlicek, "Improvements in the Stochastic Segment Model for Phoneme Recognition
- Digalakis, Ostendorf, et al.
- 1989
(Show Context)
Citation Context ...a hidden Markov model with a particular complex topology. If the observation distributions dominate the probability of a phoneme, the difference between the two approaches will be minimal. Indeed, in =-=[6]-=- we were able to show that for equivalent numbers of parameters, an HMM and an independent-frame SSM give the same phone classification performance with context-independent models. Thus, the potential... |

1 |
Integration of Different Recognition Methodologies Through Reevaluation of N-Best Sentence Hypotheses
- Ostendorf, Kannan, et al.
- 1991
(Show Context)
Citation Context ...general acoustic models. Of course, the use of segmental models incurs the cost of more complex recognition search algorithms. However, various mechanisms have been developed to overcome this problem =-=[7, 15]-=-. The focus of this paper is on mechanisms for modeling the temporal dynamics of a phoneme, or the statistical dependency of features over time, in the segmental framework. It has been observed experi... |