Results 1  10
of
24
SemiTied Covariance Matrices For Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all ..."
Abstract

Cited by 181 (27 self)
 Add to MetaCart
There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or blockdiagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few \full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own \diagonal" covariance matrix. In contrast to other schemes which have hypothesised a similar form, this technique ts within the standard maximumlikelihood criterion used for training HMMs. The new form of covariance matrix is evaluated on a largevocabulary speechrecognition task. In initial experiments the performance of the standard system was achieved using approximately half the number of parameters. Moreover, a 10% reduction in word error rate compared to a standard system can be achieved with less than a 1% increase in the number of parameters and little increase in recognition time. 2 1
Genones: Generalized Mixture Tying in Continuous Hidden Markov ModelBased Speech Recognizers
 IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixturedensity hidden Markov model (HMM)based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixturedensity hidden Markov model (HMM)based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's WallStreet Journal corpus show that this scheme reduces errors by 25% over typical tiedmixture systems. New fast algorithms for computing Gaussian likelihoodsthe most timeconsuming aspect of continuousdensity HMM systemsare also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30821...
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Statistical Trajectory Models for Phonetic Recognition
, 1994
"... The main goal of this work is to develop an alternative methodology for acoustic phonetic modelling of speech sounds. The approach utilizes a segmentbased framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Te ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
The main goal of this work is to develop an alternative methodology for acoustic phonetic modelling of speech sounds. The approach utilizes a segmentbased framework to capture the dynamical behavior and statistical dependencies of the acoustic attributes used to represent the speech waveform. Temporal behavior is modelled explicitly by creating dynamic tracks of the acoustic attributes used to represent the waveform, and by estimating the spatiotemporal correlation structure of the resulting errors. The tracks serve as templates from which synthetic segments of the acoustic attributes are generated. Scoring of an hypothesized phonetic segment is then based on the error between the measured acoustic attributes and the synthetic segments generated for each phonetic model.
StateBased Gaussian Selection In Large Vocabulary Continuous Speech Recognition Using HMMs
, 1997
"... This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 3070% of the computational time of a HMMbased speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoust ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper investigates the use of Gaussian Selection (GS) to increase the speed of a large vocabulary speech recognition system. Typically 3070% of the computational time of a HMMbased speech recogniser is spent calculating probabilities. The aim of GS is to reduce this load by dividing the acoustic space into a set of clusters and associating a "shortlist" of Gaussians with each of these clusters. Any Gaussian not in the shortlist is simply approximated. This paper examines new techniques for obtaining "good" shortlists. All the new schemes make use of state information, specifically which state each of the components belongs to. In this way a maximum number of components per state may be specified, hence reducing the size of the shortlist. The first technique introduced is a simple extension of the standard GS one, which uses this state information. Then, more complex schemes based on maximising the likelihood of the training data are proposed. These new approaches are compared...
Hierarchical search for large vocabulary conversational speech recognition
 IEEE Signal Processing Magazine
, 1999
"... ABSTRACT 2 Speakerindependent speech recognition technology has made significant progress from the days of isolated word recognition. Today, stateoftheart systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information so ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
ABSTRACT 2 Speakerindependent speech recognition technology has made significant progress from the days of isolated word recognition. Today, stateoftheart systems are capable of performing large vocabulary continuous speech recognition (LVCSR) on audio streams derived from complex information sources such as broadcast news and twoway telephone dialogs. A significant contribution to this advancement in technology is the development of search techniques that find suboptimal but accurate solutions in problems involving large search spaces and extremely complex statistical models. Moreover, these search strategies are capable of dynamically integrating information from a number of diverse knowledge sources to determine the correct word hypothesis, and limit the scope of the search by using a hierarchical search strategy. We refer to this problem as the decoding or search problem. This paper describes the complexity associated with decoding using hierarchical representations for linguistic and acoustic knowledge sources. An extensible objectoriented decoder available in the public domain, that leverages current stateoftheart technology is described to illustrate these concepts. This decoder supports efficient handling of acoustic models for crossword contextdependent phones, multiple pronunciations of words using lexical trees, and rescoring of word graphs based on Ngram language models in a single pass. It employs a stateoftheart Viterbistyle dynamic programming algorithm, and is equipped with several heuristic pruning criteria to minimize the consumption of computational resources while maintaining good accuracy.
Linear Gaussian models for speech recognition
 CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo
Factor analysed hidden Markov models for Speech Recognition
 COMPUTER SPEECH AND LANGUAGE
, 2004
"... Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semitied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition perfor ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semitied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition performance without dramatically increasing the number of model parameters compared to standard diagonal covariance Gaussian mixture HMMs. This paper introduces a general form of acoustic model, the factor analysed HMM. A variety of configurations of this model and parameter sharing schemes, some of which correspond to standard systems, were examined. An EM algorithm for the parameter optimisation is presented along with a number of methods to increase the e#ciency of training. The performance of FAHMMs on medium to large vocabulary continuous speech recognition tasks was investigated. The experiments show that without elaborate complexity control an equivalent or better performance compared to a standard diagonal covariance Gaussian mixture HMM system can be achieved with considerably fewer parameters.
SemiTied FullCovariance Matrices For Hidden Markov Models
, 1997
"... There is normally a simple choice made in the form of the covariance matrix to be used with HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the corr ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
There is normally a simple choice made in the form of the covariance matrix to be used with HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or blockdiagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few "full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own "diagonal" covariance matrix. In contrast to other schemes which have hypothesised a similar form, this technique fits within the standard maximumlikelihood criterion used for training HMMs. The new form of covariance matrix is evaluated on a largevocabulary...
Combining Linguistic with Statistical Methods in Automatic Speech Understanding
, 1994
"... this paper will argue, combining knowledge and techniques from the two communities can yield results that neither community alone could achieve ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
this paper will argue, combining knowledge and techniques from the two communities can yield results that neither community alone could achieve