Results 11 - 20
of
45
The HTK Hidden Markov Model Toolkit: Design and Philosophy
- Entropic Cambridge Research Laboratory, Ltd
, 1994
"... ion. However, they are not actually abstract data types. Far from it, all HTK data types are very concrete. The full definition of each type is visible outside of the module that defines it and the program which uses that type is free to manipulate its innards. Thus, from a software engineering pers ..."
Abstract
-
Cited by 41 (0 self)
- Add to MetaCart
ion. However, they are not actually abstract data types. Far from it, all HTK data types are very concrete. The full definition of each type is visible outside of the module that defines it and the program which uses that type is free to manipulate its innards. Thus, from a software engineering perspective, the construction of HTK is unsafe since it is all too easy for an external agent to corrupt the internal operation of a module. Furthermore, it is necessary for an external agent to have a detailed understanding of each library module data type in order to use it effectively. Again, the HMMDef type provides a good example since this type represents a large hierarchical structure which HTK tools need to traverse and manipulate. To do this they have to access and manipulate the structure directly and since it is complex, this kind of operation will be prone to error. There are several reasons why HTK has been constructed like this. Firstly, and perhaps most importantly, it is very har...
High Performance Speaker-Independent Phone Recognition Using CDHMM
- In Proc. Eurospeech
, 1993
"... In this paper we report high phone accuracies on three corpora: WSJ0, BREF and TIMIT. The main characteristics of the phone recognizer are: high dimensional feature vector (48), context- and genderdependent phone models with duration distribution, continuous density HMM with Gaussian mixtures, and n ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
In this paper we report high phone accuracies on three corpora: WSJ0, BREF and TIMIT. The main characteristics of the phone recognizer are: high dimensional feature vector (48), context- and genderdependent phone models with duration distribution, continuous density HMM with Gaussian mixtures, and n-gram probabilities for the phonotatic constraints. These models are trained on speech data that have either phonetic or orthographic transcriptions using maximum likelihood and maximum a posteriori estimation techniques. On the WSJ0 corpus with a 46 phone set we obtain phone accuraciesof 72.4% and 74.4% using 500 and 1600 CD phone units, respectively. Accuracy on BREF with 35 phones is as high as 78.7% with only 428 CD phone units. On TIMIT using the 61 phone symbols and only 500 CD phone units, we obtain a phoneaccuracyof 67.2% which correspond to 73.4% when the recognizer output is mapped to the commonly used 39 phone set. Making reference to our work on large vocabularyCSR, we show that ...
A Wearable Computer Based American Sign Language Recognizer
, 1997
"... Modern wearable computer designs package workstation level performance in systems small enough to be worn as clothing. These machines enable technology to be brought where it is needed the most for the handicapped: everyday mobile environments. This paper de- scribes a research effort to make a wear ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
Modern wearable computer designs package workstation level performance in systems small enough to be worn as clothing. These machines enable technology to be brought where it is needed the most for the handicapped: everyday mobile environments. This paper de- scribes a research effort to make a wearable computer that can recognize (with the possible goal of translat- ing) sentence level American Sign Language (ASL) using only a baseball cap mounted camera for input. Current accuracy exceeds 97% per word on a 40 word lexicon.
Speaker-Independent Continuous Speech Dictation
- SPEECH COMMUNICATION
, 1994
"... In this paper we report on progress made at LIMSI in speaker-independent large vocabulary speech dictation using newspaper-based speech corpora in English and French. The recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n-gram statistics estimated on n ..."
Abstract
-
Cited by 26 (12 self)
- Add to MetaCart
In this paper we report on progress made at LIMSI in speaker-independent large vocabulary speech dictation using newspaper-based speech corpora in English and French. The recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n-gram statistics estimated on newspaper texts for language modeling. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models. For English the ARPA Wall Street Journal-based CSR corpus is used and for French the BREF corpus containing recordings of texts from the French newspaper Le Monde is used. Experiments were carried out with both these corpora at the phone level and at the word level with vocabularies containing up to 20,000 words. Word recognition experiments are also described for the ARPA RM task which has been widely used to evaluate and compare systems.
Learning Models for Robot Navigation
, 1998
"... Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. Th ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information into both the models and the algorithm that learns them. By taking advantage of such information, learning hmms/pomdps can be made better and require fewer iterations, while being robust in the face of data reduction. That is, the performance of our algorithm does not significantly deteriorate as the training sequences provided to it become significantly shorter. Formal proofs for the convergence of the algorithm to a local maximum of the likelihood function are provided. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach....
Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition
"... In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semi-continuous one with Gaussian mixture state observation densities is presented. Corresponding to the well-known Baum-Welch and segmental k-means algorithms respectively for HMM training, formulations of MAP ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
In this paper a theoretical framework for Bayesian adaptive learning of discrete HMM and semi-continuous one with Gaussian mixture state observation densities is presented. Corresponding to the well-known Baum-Welch and segmental k-means algorithms respectively for HMM training, formulations of MAP (maximum aposteriori) and segmental MAP estimation of HMM parameters are developed. Furthermore, a computationally efficient method of the segmental quasi-Bayes estimation for semi-continuous HMM is also presented. The important issue of prior density estimation is discussed and a simplified method of moment estimate is given. The method proposed in this paper will be applicable to some problems in HMM training for speech recognition such as sequential or batch training, model adaptation, and parameter smoothing, etc.
Monte Carlo Hidden Markov Models: Learning Non-Parametric Models of Partially Observable Stochastic Processes
- In Proc. of the International Conference on Machine Learning (ICML
, 1999
"... We present a learning algorithm for non-parametric hidden Markov models with continuous state and observation spaces. All necessary probability densities are approximated using samples, along with density trees generated from such samples. AMonte Carlo version of Baum-Welch (EM) is employed to learn ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
We present a learning algorithm for non-parametric hidden Markov models with continuous state and observation spaces. All necessary probability densities are approximated using samples, along with density trees generated from such samples. AMonte Carlo version of Baum-Welch (EM) is employed to learn models from data. Regularization during learning is achieved using an exponential shrinking technique. The shrinkage factor, which determines the effective capacity of the learning algorithm, is annealed down over multiple iterations of BaumWelch, and early stopping is applied to select the right model. Once trained, Monte Carlo HMMs can be run in an any-time fashion. We prove that under mild assumptions, Monte Carlo Hidden Markov Models converge to a local maximum in likelihood space, just like conventional HMMs. In addition, we provide empirical results obtained in a gesture recognition domain. 1 Introduction Hidden Markov models (HMMs) [27] have been applied successfully to a large rang...
MAP Estimation of Continuous Density HMM: Theory and Applications
- In: Proceedings of DARPA Speech and Natural Language Workshop
, 1992
"... We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
We discuss maximum a posteriori estimation of continuous density hidden Markovmodels(CDHMM).The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach. INTRODUCTION Estimation of hidden Markov model (HMM) is usually obtained by the method of maximum likelihood (ML) [1, 10, 6] assuming that the size of the training data is large enough to provide robust estimates. This paper investigates maximum a posteriori (MAP) estimate of continuous density hidden Markov models (CDHMM). The MAP ...
Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Cross-Lingual Experiments with Phone Recognition
- Proc. IEEE ICASSP-93
"... This paper presents some of the recent research on speaker-independent continuous phone recognition for both French and English. The phone accuracy is assessed on the BREF corpus for French, and on the Wall Street Journal and TIMIT corpora for English. Cross-language differences concerning language ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
This paper presents some of the recent research on speaker-independent continuous phone recognition for both French and English. The phone accuracy is assessed on the BREF corpus for French, and on the Wall Street Journal and TIMIT corpora for English. Cross-language differences concerning language properties are presented. It was found that French is easier to recognize at the phone level (the phone error for BREF is 23.6% vs. 30.1% for WSJ), but harder to recognize at the lexical level due to the larger number of homophones. Experiments with signal analysis indicate that a 4kHz signal bandwidth is sufficient for French, whereas 8kHz is needed for English. Phone recognition is a powerful technique for language, sex, and speaker identification. With 2s of speech, the languagecan be identified with better than 99% accuracy. Sex-identification for BREF and WSJ is errorfree. Speaker identification accuracies of 98.2% on TIMIT (462 speakers) and 99.1% on BREF (57 speakers), were obtained w...

