Results 1 - 10
of
13
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic-- phonetic modelling of speech sounds. The approach utilizes a segment--based framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatio--temporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
The HDM: A Segmental Hidden Dynamic Model of Coarticulation
- In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Sig. Proc. (ICASSP’99), volume I
, 1999
"... This paper introduces a new approach to acoustic-phonetic modelling, the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
This paper introduces a new approach to acoustic-phonetic modelling, the Hidden Dynamic Model (HDM), which explicitly accounts for the coarticulation and transitions between neighbouring phones. Inspired by the fact that speech is really produced by an underlying dynamic system, the HDM consists of a single vector target per phone in a hidden dynamic space in which speech trajectories are produced by a simple dynamic system. The hidden space is mapped to the surface acoustic representation via a non-linear mapping in the form of a multilayer perceptron (MLP). Algorithms are presented for training of all the parameters (target vectors and MLP weights) from segmented and labelled acoustic observations alone, with no special initialisation. The model captures the dynamic structure of speech, and appears to aid a speech recognition task based on the SwitchBoard corpus. 1. INTRODUCTION Much of the complexity and indirectness of the relationship between the acoustic patterns of speech and ...
Parametric Subspace Modeling Of Speech Transitions
- Speech Communication
, 1998
"... This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurren ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurrent neural networks there is the hope, but not the convincing demonstration, that such transitional information could be captured. The method presented here starts from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved. We approach this by introducing a temporal constraint into the well known technique of Principal Component Analysis. On this subspace, we attempt a parametric modelling of the trajectory, and compute a distance metric to perform classification of diphones. We use the principal curves method of Hastie and Stuetzle and the Generative Topographic map (GTM...
Articulatory Methods for Speech Production and Recognition
, 1996
"... roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-dri ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-driven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statistically-based model of co-articulation is described, and found to yield improved articulatory modelling accuracy compared with X-ray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to re-score N-best recognition hypothesis lists produced by an HMM-based recogniser. The system is evaluated on two test databases, one including speaker-specific X-ray training data and the other aco
Continuous Word Recognition Based on the Stochastic Segment Model
- Proc. DARPA Workshop CSR
, 1992
"... This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segment-based acoustic model that uses a family of Gaussian distributions to characterize vari ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
This paper presents an overview of the Boston University continuous word recognition system, which is based on the Stochastic Segment Model (SSM). The key components of the system described here include: a segment-based acoustic model that uses a family of Gaussian distributions to characterize variable length segments; a divisive clustering technique for estimating robust context-dependent models; and recognition using the N-best rescoring formalism, which also provides a mechanism for combining different knowledge sources (e.g. SSM and HMM scores). Results are reported for the speaker-independent portion of the Resource Management Corpus, for both the SSM system and a combined BU-SSM/BBN-HMM system. 1. INTRODUCTION In the last decade, most of the research on continuous speech recognition has focused on different variations of hidden Markov models (HMMs), and the various efforts have led to significant improvements in recognition performance. However, some researchers have begun to ...
Statistical Modelling in Continuous Speech Recognition (CSR)
- IN CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 2001
"... Automatic continuous speech recognition (CSR) is sufficiently ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Automatic continuous speech recognition (CSR) is sufficiently
Smoothness Analysis For Trajectory Features
- Int. Conf. in Acoustics, Speech and Signal Processing
, 1997
"... Dynamic modeling of speech is potentially a major improvement on Hidden Markov Models (HMMs). In one approach, trajectory models[1] are used to model the dynamics of the spectrum, and are used as basis for classification [1, 2]. Although some improvement has been achieved in this way, one would hope ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Dynamic modeling of speech is potentially a major improvement on Hidden Markov Models (HMMs). In one approach, trajectory models[1] are used to model the dynamics of the spectrum, and are used as basis for classification [1, 2]. Although some improvement has been achieved in this way, one would hope for more substantial improvements given that the independence assumption is removed. One reason why this was not achieved may be that the trajectory models are based on cepstral coefficients; we show that these tracks contain spurious oscillations. This suggests that these trajectory features might have a high within-class variance. We introduce a measure of evaluating the smoothness of trajectory-based features. This measure provides a method of selecting the best of a set of similar features. Formant trajectories prove to be significantly smoother than trajectories of mel scale cepstral coefficients (MFCC) by this measure, but this does not translate directly to improved performance. 1. I...
The Stochastic Segment Model for Continuous Speech Recognition
- In Proceedings The 25th Asilomar Conference on Signals, Systems and Computers
, 1991
"... A new direction in speech recognition via statistical methods is to move from frame-based models, such as Hidden Markov Models (HMMs), to segment-based models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint m ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A new direction in speech recognition via statistical methods is to move from frame-based models, such as Hidden Markov Models (HMMs), to segment-based models that provide a better framework for modeling the dynamics of the speech production mechanism. The Stochastic Segment Model (SSM) is a joint model for a sequence of observations, which provides explicit modeling of time correlation as well as a formalism for incorporating segmental features. In this work, the focus is on modeling time correlation within a segment. We consider three Gaussian model variations based on different assumptions about the form of statistical dependency, including a Gauss-Markov model, a dynamical system model and a target state model, all of which can be formulated in terms of the dynamical system model. Evaluation of the different modeling assumptions is in terms of both phoneme classification performance and the predictive power of linear models. 1 Introduction Most of the existing speaker-independent ...
Situated State Hidden Markov Models
, 1993
"... We introduce a probabilistic model called a Situated State Hidden Markov Model (SSHMM), in which states are `situated' (i.e. assigned positions) and assumed to correspond to regions of an underlying continuous state space. Transition probabilities among states are induced by the assigned state posit ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We introduce a probabilistic model called a Situated State Hidden Markov Model (SSHMM), in which states are `situated' (i.e. assigned positions) and assumed to correspond to regions of an underlying continuous state space. Transition probabilities among states are induced by the assigned state positions in such a way that transitions occur more frequently between nearby states. The model is formally defined, and a maximum likelihood estimation procedure is described. Experiments on synthetic data are described and demonstrate that SHMM's can learn the structure of an underlying continuous state space even when observed through high dimensional discontinuous functions. Experiments using SSHMMs for speaker-independent phonetic classification are also reported. 1. INTRODUCTION AND OVERVIEW Hidden Markov models have been used with considerable success for automatic speech recognition, but suffer from several serious shortcomings as models of speech for analysis and synthesis applications....

