Results 1  10
of
23
From HMM's to Segment Models: A Unified View of Stochastic Modeling for Speech Recognition
, 1996
"... ..."
Probabilistictrajectory Segmental HMMs. Computer Speech and Language
, 1999
"... “Segmental hidden Markov models ” (SHMMs) are intended to overcome important speechmodelling limitations of the conventionalHMM approach by representing sequences (or segments) of features and incorporating the concept of trajectories to describe how features change over time. A novel feature of t ..."
Abstract

Cited by 32 (2 self)
 Add to MetaCart
“Segmental hidden Markov models ” (SHMMs) are intended to overcome important speechmodelling limitations of the conventionalHMM approach by representing sequences (or segments) of features and incorporating the concept of trajectories to describe how features change over time. A novel feature of the approach presented in this paper is that extrasegmental variability between different examples of a subphonemic speech segment is modelled separately from intrasegmental variability within any one example. The extrasegmental component of the model is represented in terms of variability in the trajectory parameters, and these models are therefore referred to as “probabilistictrajectory segmental HMMs ” (PTSHMMs). This paper presents the theory of PTSHMMs using a linear trajectory description characterized by slope and midpoint parameters, and presents theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs. Experiments have demonstrated that, for any given feature set, a linear PTSHMM can substantially reduce the error rate in comparison with a conventional HMM, both for a connecteddigit recognition task and for a phonetic classification task. Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the midpoint parameter. c ○ 1999 British Crown Copyright/DERA 1.
Generalised linear Gaussian models
, 2001
"... This paper addresses the timeseries modelling of high dimensional data. Currently, the hidden Markov model (HMM) is the most popular and successful model especially in speech recognition. However, there are well known shortcomings in HMMs particularly in the modelling of the correlation between suc ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
This paper addresses the timeseries modelling of high dimensional data. Currently, the hidden Markov model (HMM) is the most popular and successful model especially in speech recognition. However, there are well known shortcomings in HMMs particularly in the modelling of the correlation between successive observation vectors; that is, interframe correlation. Standard diagonal covariance matrix HMMs also lack the modelling of the spatial correlation in the feature vectors; that is, intraframe correlation. Several other timeseries models have been proposed recently especially in the segment model framework to address the interframe correlation problem such as GaussMarkov and dynamical system segment models. The lack of intraframe correlation has been compensated for with transform schemes such as semitied full covariance matrices (STC). All these models can be regarded as belonging to the broad class of generalised linear Gaussian models. Linear Gaussian models (LGM) are popular as many forms may be trained efficiently using the expectation maximisation algorithm. In this paper, several LGMs and generalised LGMs are reviewed. The models can be roughly categorised into four combinations according to two different state evolution and two different observation processes. The state evolution process can be based on a discrete finite state machine such as in the HMMs or a linear firstorder GaussMarkov process such as in the traditional linear dynamical systems. The observation process can be represented as a factor analysis model or a linear discriminant analysis model. General HMMs and schemes proposed to improve their performance such as STC can be regarded as special cases in this framework.
Augmented Statistical Models for Classifying Sequence Data
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a technical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i
NearMiss Modeling: A SegmentBased Approach to Speech Recognition
, 1998
"... Currently, most approaches to speech recognition are framebased in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Currently, most approaches to speech recognition are framebased in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segmentbased approaches represent speech as a temporal graph of feature vectors and facilitate the incorporation of a wide range of modeling strategies. However, difficulties in segmentbased recognition have impeded the realization of potential advantages in modeling. This thesis
Linear Gaussian models for speech recognition
 CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo
Switching Linear Dynamical Systems For Speech Recognition
, 2003
"... This paper describes the application of RaoBlackwellised Gibbs sampling (RBGS) to speech recognition using switching linear dynamical systems (SLDSs) as the acoustic model. ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
This paper describes the application of RaoBlackwellised Gibbs sampling (RBGS) to speech recognition using switching linear dynamical systems (SLDSs) as the acoustic model.
Parametric Subspace Modeling Of Speech Transitions
 Speech Communication
, 1998
"... This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurren ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
This report describes an attempt at capturing segmental transition information for speech recognition tasks. The slowly varying dynamics of spectral trajectories carries much discriminant information that is very crudely modelled by traditional approaches such as HMMs. In approaches such as recurrent neural networks there is the hope, but not the convincing demonstration, that such transitional information could be captured. The method presented here starts from the very different position of explicitly capturing the trajectory of short time spectral parameter vectors on a subspace in which the temporal sequence information is preserved. We approach this by introducing a temporal constraint into the well known technique of Principal Component Analysis. On this subspace, we attempt a parametric modelling of the trajectory, and compute a distance metric to perform classification of diphones. We use the principal curves method of Hastie and Stuetzle and the Generative Topographic map (GTM...
A Comparison Of Trajectory And Mixture Modeling In SegmentBased Word Recognition
 in Proc. Int'l. Conf. on Acoust., Speech and Signal Proc
, 1993
"... This paper presents a mechanism for implementing mixtures at a phonesubsegment (microsegment) level for continuous word recognition based on the Stochastic Segment Model (SSM). We investigate the issues that are involved in tradeoffs between trajectory and mixture modeling in segmentbased word re ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
This paper presents a mechanism for implementing mixtures at a phonesubsegment (microsegment) level for continuous word recognition based on the Stochastic Segment Model (SSM). We investigate the issues that are involved in tradeoffs between trajectory and mixture modeling in segmentbased word recognition. Experimental results are reported on DARPA's speakerindependent Resource Management corpus. 1. INTRODUCTION In earlier work, the Stochastic Segment Model (SSM) [1, 2] has been shown to be a viable alternative to the Hidden Markov Model (HMM) for representing variableduration phones. The SSM provides a joint Gaussian model for a sequence of observations. Assuming each segment generates an observation sequence of random length, the model for a phone consists of 1) a family of joint density functions (one for every observation length), and 2) a collection of mappings that specify the particular density function for a given observation length. Typically, the model assumes that segme...
A Maximumentropy Solution to the Framedependency Problem in Speech Recognition
, 2001
"... The HMM assumption of conditional independence of observations causes a variety of problems for speechrecognition applications. Previous attempts to construct acoustic models that remove this assumption have suffered from a significant increase in the number of parameters to train. Another weakness ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
The HMM assumption of conditional independence of observations causes a variety of problems for speechrecognition applications. Previous attempts to construct acoustic models that remove this assumption have suffered from a significant increase in the number of parameters to train. Another weakness of current acoustic models is that they do not account for the origin of derived features (estimated derivatives). We show how to both remove the independence assumption and properly account for derived features, with little or no increase in the number of parameters to train, by applying the principle of maximum entropy. We also show that ignoring the origins of derived features in training HMM acoustic models can lead to severe distortions of the effective language model. Evaluation of our maxent model on a simple problem cuts an alreadylow error rate in half compared to an equivalent HMM with the same number of parameters.