Results 1  10
of
19
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
The HDM: A Segmental Hidden Dynamic Model of Coarticulation
 In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Sig. Proc. (ICASSPâ€™99), volume I
, 1999
"... This paper introduces a new approach to acousticphonetic modelling, the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces a new approach to acousticphonetic modelling, the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of a single vector target per phone in a hidden dynamic space in which speech trajectories are produced by a simple dynamic system. The hidden space is mapped to the surface acoustic representation via a nonlinear mapping in the form of a multilayer perceptron (MLP). Algorithms are presented for training of all the parameters (target vectors and MLP weights) from segmented and labelled acoustic observations alone, with no special initialisation. The model captures the dynamic structure of speech, and appears to aid a speech recognition task based on the SwitchBoard corpus. 1. INTRODUCTION Much of the complexity and indirectness of the relationship between the acoustic patterns of speech and ...
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic phonetic modelling of speech sounds. The approach utilizes a segmentbased framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
(Show Context)
The main goal of this work is to develop an alternative methodology for acoustic phonetic modelling of speech sounds. The approach utilizes a segmentbased framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatiotemporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
Parametric Subspace Modeling Of Speech Transitions
 Speech Communication
, 1998
"... This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurren ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurrent neural networks there is the hope, but not the convincing demonstration, that such transitional information could be captured. The method presented here starts from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved. We approach this by introducing a temporal constraint into the well known technique of Principal Component Analysis. On this subspace, we attempt a parametric modelling of the trajectory, and compute a distance metric to perform classification of diphones. We use the principal curves method of Hastie and Stuetzle and the Generative Topographic map (GTM...
Statistical Modelling in Continuous Speech Recognition (CSR)
 IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently
Articulatory Methods for Speech Production and Recognition
, 1996
"... roductionbased knowledge into the recognition framework. By using an explicit timedomain articulatory model of the mechanisms of coarticulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acousticallydri ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
roductionbased knowledge into the recognition framework. By using an explicit timedomain articulatory model of the mechanisms of coarticulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acousticallydriven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statisticallybased model of coarticulation is described, and found to yield improved articulatory modelling accuracy compared with Xray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to rescore Nbest recognition hypothesis lists produced by an HMMbased recogniser. The system is evaluated on two test databases, one including speakerspecific Xray training data and the other aco
Continuous Word Recognition Based on the Stochastic Segment Model
 Proc. DARPA Workshop CSR
, 1992
"... This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segmentbased acoustic model that uses a family of Gaussian distributions to characterize vari ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segmentbased acoustic model that uses a family of Gaussian distributions to characterize variable length segments; a divisive clustering technique for estimating robust contextdependent models; and recognition using the Nbest rescoring formalism, which also provides a mechanism for combining different knowledge sources (e.g. SSM and HMM scores). Results are reported for the speakerindependent portion of the Resource Management Corpus, for both the SSM system and a combined BUSSM/BBNHMM system. 1. INTRODUCTION In the last decade, most of the research on continuous speech recognition has focused on different variations of hidden Markov models (HMMs), and the various efforts have led to significant improvements in recognition performance. However, some researchers have begun to ...
The Stochastic Segment Model for Continuous Speech Recognition
 In Proceedings The 25th Asilomar Conference on Signals, Systems and Computers
, 1991
"... A new direction in speech recognition via statistical methods is to move from framebased models, such as Hidden Markov Models (HMMs), to segmentbased models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint m ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
A new direction in speech recognition via statistical methods is to move from framebased models, such as Hidden Markov Models (HMMs), to segmentbased models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint model for a sequence of observations, which provides explicit modeling of time correlation as well as a formalism for incorporating segmental features. In this work, the focus is on modeling time correlation within a segment. We consider three Gaussian model variations based on different assumptions about the form of statistical dependency, including a GaussMarkov model, a dynamical system model and a target state model, all of which can be formulated in terms of the dynamical system model. Evaluation of the different modeling assumptions is in terms of both phoneme classification performance and the predictive power of linear models. 1 Introduction Most of the existing speakerindependent ...
Smoothness Analysis For Trajectory Features
 Int. Conf. in Acoustics, Speech and Signal Processing
, 1997
"... Dynamic modeling of speech is potentially a major improvement on Hidden Markov Models (HMMs). In one approach, trajectory models[1] are used to model the dynamics of the spectrum, and are used as basis for classification [1, 2]. Although some improvement has been achieved in this way, one would hope ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Dynamic modeling of speech is potentially a major improvement on Hidden Markov Models (HMMs). In one approach, trajectory models[1] are used to model the dynamics of the spectrum, and are used as basis for classification [1, 2]. Although some improvement has been achieved in this way, one would hope for more substantial improvements given that the independence assumption is removed. One reason why this was not achieved may be that the trajectory models are based on cepstral coefficients; we show that these tracks contain spurious oscillations. This suggests that these trajectory features might have a high withinclass variance. We introduce a measure of evaluating the smoothness of trajectorybased features. This measure provides a method of selecting the best of a set of similar features. Formant trajectories prove to be significantly smoother than trajectories of mel scale cepstral coefficients (MFCC) by this measure, but this does not translate directly to improved performance. 1. I...
ProductionOriented Models for Speech Recognition
, 2006
"... Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation ther ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Acoustic modeling in speech recognition uses very little knowledge of the speech production process. At many levels our models continue to model speech as a surface phenomenon. Typically, hidden Markov model (HMM) parameters operate primarily in the acoustic space or in a linear transformation thereof; statetostate evolution is modeled only crudely, with no explicit relationship between states, such as would be afforded by the use of phonetic features commonly used by linguists to describe speech phenomena, or by the continuity and smoothness of the production parameters governing speech. This survey article attempts to provide an overview of proposals by several researchers for improving acoustic modeling in these regards. Such topics as the controversial Motor Theory of Speech Perception, work by Hogden explicitly using a continuity constraint in a pseudoarticulatory domain, the Kalman filter based Hidden Dynamic Model, and work by many groups showing the benefits of using articulatory features instead of phones as the underlying units of speech, will be covered.