Results 11 - 20
of
36
Hybrid training method for tied mixture density hidden Markov models using Learning Vector Quantization and Viterbi estimation
"... . In this work the output density functions of hidden Markov models are phoneme-wise tied mixture Gaussians. For training these tied mixture density HMMs, modified versions of the Viterbi training and LVQ based corrective tuning are described. The initialization of the mean vectors of the mixture Ga ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
. In this work the output density functions of hidden Markov models are phoneme-wise tied mixture Gaussians. For training these tied mixture density HMMs, modified versions of the Viterbi training and LVQ based corrective tuning are described. The initialization of the mean vectors of the mixture Gaussians is performed by first composing small SelfOrganizing Maps representing each phoneme and then combining them to a single large codebook to be trained by Learning Vector Quantization (LVQ). The experiments on the proposed training methods are accomplished using a speech recognition system for Finnish phoneme sequences. Comparing to the corresponding continuous density and semi-continuous HMMs in [9] and [8] in the respect of the number of parameters, the recognition time and the average error rate, the performance of the phoneme-wise tied mixture HMMs is superior. INTRODUCTION Hidden Markov models are widely used in automatic speech recognition as phoneme models to combine the modelin...
Online Bayesian tree-structured transformation of HMMs with optimal model selection for speaker adaptation
- IEEE Trans. Speech and Audio Proc
, 2001
"... Abstract—This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small a ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract—This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small amount of adaptation data. By constructing a clustering tree of HMM Gaussian mixture components, the linear regression (LR) or affine transformation parameters for HMM Gaussian mixture components are dynamically searched. An online Bayesian learning technique is proposed for recursive maximum a posteriori (MAP) estimation of LR and affine transformation parameters. This technique has the advantages of being able to accommodate flexible forms of transformation functions as well as a priori probability density functions (pdfs). To balance between model complexity and goodness of fit to adaptation data, a dynamic programming algorithm is developed for selecting models using a Bayesian variant of the “minimum description length ” (MDL) principle. Speaker adaptation experiments with a 26-letter English alphabet vocabulary were conducted, and the results confirmed effectiveness of the online learning framework. Index Terms—Affine transformation, Bayesian model selection, hidden Markov models (HMMs), linear regression (LR), model
A Comparison of Hidden Markov Model Features for the Recognition of Cursive Handwriting
, 1996
"... Due to the difficulty of character segmentation in cursive handwriting recognition, much recent research has turned to segmentation free approaches of word recognition. While techniques of feature extraction for presegmented characters have been thoroughly explored in the literature, an evaluation o ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Due to the difficulty of character segmentation in cursive handwriting recognition, much recent research has turned to segmentation free approaches of word recognition. While techniques of feature extraction for presegmented characters have been thoroughly explored in the literature, an evaluation of features for use with segmentation during recognition techniques remains sparse. The main purpose of this thesis is to provide a comparison of a number of feature extraction techniques applied to the domain of legal amount recognition in bank checks. An experimental system using Hidden Markov Models and a horizontally sliding window is described. Results are presented for the recognition of the entire legal field using a variety of features. Of the experiments presented here, the best results were obtained by concatenating the feature vectors from the present, previous, and next window...
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...
Segmental LVQ3 training for phoneme-wise tied mixture density HMMs
- In European Signal Processing Conference
, 1996
"... This work presents training methods and recognition experiments for phoneme-wise tied mixture densities in hidden Markov models (HMM). The system trains speaker dependent, but vocabulary independent, phoneme models for the recognition of Finnish words. The Learning Vector Quantization (LVQ) methods ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
This work presents training methods and recognition experiments for phoneme-wise tied mixture densities in hidden Markov models (HMM). The system trains speaker dependent, but vocabulary independent, phoneme models for the recognition of Finnish words. The Learning Vector Quantization (LVQ) methods are applied to increase the discrimination between the phoneme models. A segmental LVQ3 training is proposed to substitute the LVQ2 based corrective tuning as a parameter estimation method. The experiments indicate that the new method can provide the corresponding recognition accuracy, but with less training and more robustness over the initial models. Experiments to upscale the current system by introducing context vectors and larger mixture pools show up to 40 % reduction of recognition errors compared to the earlier results in [10]. 1
Training Mixture Density HMMs with SOM and LVQ
, 1997
"... ¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the estimation of the probability density function of the short-time observations in each state as a mixture of Gaussian densities. The Learning Vector Quantization (LVQ) is used to increase the discrimination between dioeerent phoneme models both during the initialization of the Gaussian codebooks and during the actual MDHMM training. The Self-Organizing Map (SOM) is applied to provide a suitably smoothed mapping of the training vectors to accelerate the convergence of the actual training. The obtained codebook topology can also be exploited in the recognition phase to speed up the calculations to approximate the observation probabilities. The experiments with LVQ and SOMs show reductions both...
Hidden Model Sequence Models for Automatic Speech Recognition
, 2001
"... Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In m ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Most modern automatic speech recognition systems make use of acoustic models based on hidden Markov models. To obtain reasonable recognition performance within a large vocabulary framework, the acoustic models usually include a pronunciation model, together with complex parameter tying schemes. In many cases the pronunciation model operates on a phoneme level and is derived independently of the underlying models. In contrast, this work is aimed at improving pronunciation modelling on a sub-phone level in a combined framework. The modelling of pronunciation variation is assumed to be of special importance for recognition of spontaneous speech.
Tied Posteriors: An Approach for Effective Introduction of Context . . .
, 2000
"... This papers presents a method to improve the recognition rate of hybrid connectionist/HMM speech recognition systems. At the same time this approach allows the easy introduction of context dependent models in the hybrid framework. The approach is based on a standard hybrid connectionist/HMM recogniz ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This papers presents a method to improve the recognition rate of hybrid connectionist/HMM speech recognition systems. At the same time this approach allows the easy introduction of context dependent models in the hybrid framework. The approach is based on a standard hybrid connectionist/HMM recognizer, in which the neural nets are trained to estimate the a posteriori probabilities for all phones in each input frame. In the approach presented here, the probabilities of the neural nets are used to replace the codebook of a tied-mixture HMM system. Therefore the resulting system is called tied posterior. The advantages of this structure are that an arbitrary HMM-topology can be used, and that all context dependency and all clustering techniques used in tied-mixture systems can be applied to this hybrid speech recognition system. The approach has been evaluated on the Wall Street Journal (WSJ) database, with the result, that it outperforms the standard hybrid approach on this task.
Towards A Compact Speech Recognizer: Subspace Distribution Clustering Hidden Markov Model
, 1998
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiii 1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 The Problem: Too Many Parameters : : : : : : : : : : : : : : : : : : : : : : 3 1.2 Proposed Solution: It Is Time to Share More! : : : : : : : : : : : : : : : : : 4 1.3 Thesis Summary and Outline : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 Review of Acoustic Modeling Using Hidden Markov Model : : : : : : : 9 2.1 Speech Characteristics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 2.2 Selection of Input Speech Space and Speech Model : : : : : : : : : : : : : : 10 2.2.1 Cepstral Input : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2.2 Hidden Markov Model : : : : : : : : : : : : : : : : : : : : : : : : : : 11 2.2.3 Our Choice of HMM for Acoustic Modeling : : : : : : : : : : : : : : 14 2.3 Speech Unit to Model : : : : : : : : : : : : : : : : : : : : : : : : : : ...
Hierarchical Unsupervised Learning of Event Categories
, 2001
"... We consider the problem of unsupervised classification of temporal sequences of events in video. This problem arises in the design of an adaptive visual agent, which must be capable of identifying appropriate classes of visual events to effectively complete its tasks. We present a multilevel dynamic ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We consider the problem of unsupervised classification of temporal sequences of events in video. This problem arises in the design of an adaptive visual agent, which must be capable of identifying appropriate classes of visual events to effectively complete its tasks. We present a multilevel dynamic Bayesian network that learns the high-level dynamics of visual events simultaneously with models of the events themselves. We show how the parameters of the model can be learned in a scalable and efficient way. We present preliminary results using real video data and a class of simulated dynamic event models. The results show that our model correctly classifies the input data at rates comparable to a standard event classification approach, while also learning the high-level semantic model parameters. 1. Introduction The goals of this paper are twofold. First, we wish to show that it is feasible to simultaneously learn classes of events from video input and the high-level temporal structur...

