Results 11 - 20
of
61
Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes
- In Proc. of the International Conference on Machine Learning (ICML
, 1999
"... We present a learning algorithm for non-parametric hidden Markov models with continuous state and observation spaces. All necessary probability densities are approximated using samples, along with density trees generated from such samples. AMonte Carlo version of Baum-Welch (EM) is employed to learn ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
We present a learning algorithm for non-parametric hidden Markov models with continuous state and observation spaces. All necessary probability densities are approximated using samples, along with density trees generated from such samples. AMonte Carlo version of Baum-Welch (EM) is employed to learn models from data. Regularization during learning is achieved using an exponential shrinking technique. The shrinkage factor, which determines the effective capacity of the learning algorithm, is annealed down over multiple iterations of BaumWelch, and early stopping is applied to select the right model. Once trained, Monte Carlo HMMs can be run in an any-time fashion. We prove that under mild assumptions, Monte Carlo Hidden Markov Models converge to a local maximum in likelihood space, just like conventional HMMs. In addition, we provide empirical results obtained in a gesture recognition domain. 1 Introduction Hidden Markov models (HMMs) [27] have been applied successfully to a large rang...
MAP Estimation of Continuous Density HMM: Theory and Applications
- In: Proceedings of DARPA Speech and Natural Language Workshop
, 1992
"... We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) [1, 10, 6] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM). The MAP ...
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Generalised linear Gaussian models
, 2001
"... This paper addresses the time-series modelling of high dimensional data. Currently, the hidden Markov model (HMM) is the most popular and successful model especially in speech recognition. However, there are well known shortcomings in HMMs particularly in the modelling of the correlation between suc ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
This paper addresses the time-series modelling of high dimensional data. Currently, the hidden Markov model (HMM) is the most popular and successful model especially in speech recognition. However, there are well known shortcomings in HMMs particularly in the modelling of the correlation between successive observation vectors; that is, inter-frame correlation. Standard diagonal covariance matrix HMMs also lack the modelling of the spatial correlation in the feature vectors; that is, intra-frame correlation. Several other time-series models have been proposed recently especially in the segment model framework to address the inter-frame correlation problem such as Gauss-Markov and dynamical system segment models. The lack of intra-frame correlation has been compensated for with transform schemes such as semi-tied full covariance matrices (STC). All these models can be regarded as belonging to the broad class of generalised linear Gaussian models. Linear Gaussian models (LGM) are popular as many forms may be trained efficiently using the expectation maximisation algorithm. In this paper, several LGMs and generalised LGMs are reviewed. The models can be roughly categorised into four combinations according to two different state evolution and two different observation processes. The state evolution process can be based on a discrete finite state machine such as in the HMMs or a linear first-order Gauss-Markov process such as in the traditional linear dynamical systems. The observation process can be represented as a factor analysis model or a linear discriminant analysis model. General HMMs and schemes proposed to improve their performance such as STC can be regarded as special cases in this framework.
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
- Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
An Adaptive Spatial Diversity Receiver For Non-Gaussian Interference And Noise
, 1997
"... Standard linear diversity combining techniques are not effective in combating fading in the presence of non-Gaussian noise. An adaptive spatial diversity receiver is developed for wireless communication channels with slow, flat fading and additive non-Gaussian noise. The noise is modeled as a mix ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
Standard linear diversity combining techniques are not effective in combating fading in the presence of non-Gaussian noise. An adaptive spatial diversity receiver is developed for wireless communication channels with slow, flat fading and additive non-Gaussian noise. The noise is modeled as a mixture of Gaussian distributions, and the expectation-maximization (EM) algorithm is used to derive estimates for the model parameters. The transmitted signals are detected using a likelihood ratio test based on the parameter estimates. The new adaptive receiver converges rapidly, its bit error rate performance is very close to optimum when relatively short training sequences are used, and it appears to be relatively insensitive to mismatch between the noise model and the actual noise distribution. Simulation results are included that illustrate various aspects of the adaptive receiver performance. Revision of Paper SP 9544 for IEEE Transactions on Signal Processing May 20, 1998 Ele...
Large margin hidden markov models for speech recognition
, 2005
"... In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muti-class separation margin. The approach is named as large margi ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
In this work, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum muti-class separation margin. The approach is named as large margin HMM. Firstly, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Secondly, by imposing different constraints to the minimax problem, we propose three solutions to the large margin HMM estimation problem, namely the iterative localized optimization method, the constrained joint optimization method and the semidefinite pro-gramming (SDP) method. These new training methods are evaluated in the isolated E-set recognition task using ISOLET database and the TIDIGITS connected digit string recog-nition task. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods.
Linear Gaussian models for speech recognition
- CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo-
Face Image Retrieval Using HMMs
, 1999
"... This paper introduces a new face recognition system that can be used to index (and thus retrieve) images and videos of a database of faces. New face recognition approaches are needed because, although much progress has been made to identify face taken from different viewpoints, we still cannot robus ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This paper introduces a new face recognition system that can be used to index (and thus retrieve) images and videos of a database of faces. New face recognition approaches are needed because, although much progress has been made to identify face taken from different viewpoints, we still cannot robustly identify faces under different illumination conditions, or when the facial expression changes, or when a part of the face is occluded on account of glasses or parts of clothing. When face recognition methods have worked in the past, it was only when all possible "image variations" were learned. Principal Components Analysis (PCA) and Fisher Discriminant Analysis (FDA) are well-known cases of such methods. In this paper we present a different approach to the indexing of face images. Our approach is based on identifying frontal faces and it allows reasonable variability in facial expressions, illumination conditions, and occlusions caused by eye-wear or items of clothing such as scarves. W...

