Results 1  10
of
15
SharedDistribution Hidden Markov Models for Speech Recognition
, 1991
"... Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generaliz ..."
Abstract

Cited by 275 (7 self)
 Add to MetaCart
Parameter sharing plays an important role in statistical modeling since training data are usually limited. On the one hand, we would like to use models that are as detailed as possible. On the other hand, with models too detailed, we can no longer reliably estimate the parameters. Triphone generalization may force two models to be merged together when only parts of the model output distributions are similar, while the rest of the output distributions are different. This problem can be avoided if clustering is carried out at the distribution level. In this paper, a shareddistribution model is proposed to replace generalized triphone models for speakerindependent continuous speech recognition. Here, output distributions in the hidden Markov model are shared with each other if they exhibit acoustic similarity. In addition to detailed representation, it also gives us the freedom to use a large number of states for each phonetic model. Although an increase in the number of states will inc...
Signal modeling techniques in speech recognition
 PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract

Cited by 126 (5 self)
 Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or timederivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decorrelate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closedloop manner. In this paper, we review the signal processing components of these algorithms. These algorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in stateoftheart speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Genones: Generalized Mixture Tying in Continuous Hidden Markov ModelBased Speech Recognizers
 IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixturedensity hidden Markov model (HMM)based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract

Cited by 41 (7 self)
 Add to MetaCart
An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixturedensity hidden Markov model (HMM)based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's WallStreet Journal corpus show that this scheme reduces errors by 25% over typical tiedmixture systems. New fast algorithms for computing Gaussian likelihoodsthe most timeconsuming aspect of continuousdensity HMM systemsare also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30821...
Using SelfOrganizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models
, 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the SelfOrganizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...
Improved acoustic modeling for continuous speech recognition
 Proc. DARPA Speech and Natural Language Workshop
, 1990
"... We report on some recent improvements to an HMMbased, continuous speech recognition system which is being developed at AT&T Bell Laboratories. These advances, which include the incorporation of interword, contextdependent units and an improved feature analysis, lead to a recognition system which a ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
We report on some recent improvements to an HMMbased, continuous speech recognition system which is being developed at AT&T Bell Laboratories. These advances, which include the incorporation of interword, contextdependent units and an improved feature analysis, lead to a recognition system which achieves better than 95 % word accuracy for speaker independent recognition of the 1000word, DARPA resource management task using the standard wordpair grammar (with a perplexity of about 60). It will he shown that the incorporation of interword units into training results in better acoustic models of word juncture coarticulation and gives a 20 % reduction in error rate. The effect of an improved set of spectral and log energy features is to further reduce word error rate by about 30%. We also found that the spectral vectors, corresponding to the same speech unit, behave differently statistically, depending on whether they are at word boundaries or within a word. The results suggest that intraword and interword units should be modeled independently, even when they appear in the same context. Using a set of subword units which included variants for intraword and interword, contextdependent phones, an additional decrease of about 10 % in word error rate resulted. 1.
A New Approach To Generalized Mixture Tying For Continuous HMMBased Speech Recognition
 Proc. EUROSPEECH, Rhodes
, 1997
"... In this paper we present a new approach for a generalized tying of mixture components for continuous mixturedensity HMMbased speech recognition systems. With an iterative pruning and splitting procedure for the mixture components, this approach offers a very accurate and detailed representation of ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In this paper we present a new approach for a generalized tying of mixture components for continuous mixturedensity HMMbased speech recognition systems. With an iterative pruning and splitting procedure for the mixture components, this approach offers a very accurate and detailed representation of the acoustic space and at the same time keeps the number of parameters reasonably small in favor of a robust parameter estimation and a fast decoding. Contrary to other approaches, it does not require a strict clustering of the pdfs into subsets that share their mixture components, so that it is capable of providing more general and flexible types of mixture tying. We applied the new approach on a semicontinuous HMM (SCHMM)system for the Resource Management task and improved its recognition performance by 12% and vastly accelerated the decoding because of a much faster likelihood computation. 1. INTRODUCTION In continuous mixturedensity HMMbased speech recognition systems the HMM stat...
Training Mixture Density HMMs with SOM and LVQ
, 1997
"... ¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
¯ The objective of this paper is to present experiments and discussions of how some neural network algorithms can help the phoneme recognition with mixture density hidden Markov models (MDHMMs). In MDHMMs the modeling of the stochastic observation processes associated with the states is based on the estimation of the probability density function of the shorttime observations in each state as a mixture of Gaussian densities. The Learning Vector Quantization (LVQ) is used to increase the discrimination between dioeerent phoneme models both during the initialization of the Gaussian codebooks and during the actual MDHMM training. The SelfOrganizing Map (SOM) is applied to provide a suitably smoothed mapping of the training vectors to accelerate the convergence of the actual training. The obtained codebook topology can also be exploited in the recognition phase to speed up the calculations to approximate the observation probabilities. The experiments with LVQ and SOMs show reductions both...
Acoustic modeling of subword units for large vocabulary speaker independent speech recognition
 In Proceedings. DARPA Speech and Natural Language Workshop
, 1989
"... The field of large vocabulary, continuous speech recognition has advanced to the point where there are several systems capable of attaining between 90 and 95 % word accuracy for speaker independent recognition of a 1000 word vocabulary, spoken fluently for a task with a perplexity (average word bran ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The field of large vocabulary, continuous speech recognition has advanced to the point where there are several systems capable of attaining between 90 and 95 % word accuracy for speaker independent recognition of a 1000 word vocabulary, spoken fluently for a task with a perplexity (average word branching factor) of about 60. There are several factors which account for the high performance achieved by these systems, including the use of hidden Markov models (HMM) for acoustic modeling, the use of context dependent subword units, the representation of betweenword phonemic variation, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe one of the large vocabulary speech recognition systems which is being developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular, we focus on the techniques used to obtain acoustic models of the subword units (both context independent and context dependent units), and discuss the resulting system performance as a function of the type of acoustic modeling used.
Fuzzy Approaches to Speech . . .
, 2000
"... Statistical pattern recognition is the most successful approach to automatic speech and speaker recognition (ASASR). Of all the statistical pattern recognition techniques, the hidden Markov model (HMM) is the most important. The Gaussian mixture model (GMM) and vector quantisation (VQ) are also eff ..."
Abstract
 Add to MetaCart
Statistical pattern recognition is the most successful approach to automatic speech and speaker recognition (ASASR). Of all the statistical pattern recognition techniques, the hidden Markov model (HMM) is the most important. The Gaussian mixture model (GMM) and vector quantisation (VQ) are also effective techniques, especially for speaker recognition and in conjunction with HMMs, for speech recognition. However, the performance of these techniques degrades rapidly in the context of insufficient training data and in the presence of noise or distortion. Fuzzy approaches with their adjustable parameters can reduce such degradation. Fuzzy set theory is one of the most successful approaches in pattern recognition, where, based on the idea of a fuzzy membership function, fuzzy Cmeans (FCM) clustering and noise clustering (NC) are the most important techniques. To establish fuzzy approaches to ASASR, the following basic problems are solved. First, a timedependent fuzzy membership function is defined for the HMM. Second, a general distance is proposed to obtain a relationship between modelling and clustering techniques. Third, fuzzy entropy (FE) clustering is proposed to relate fuzzy models to statistical models. Finally, fuzzy membership functions are proposed as discriminant functions in decison making. The following models are proposed: 1) the FEHMM, NCFEHMM, FEGMM, NCFEGMM, FEVQ and NCFEVQ in the FE approach, 2) the FCMHMM, NCFCMHMM, FCMGMM and NCFCMGMM in the FCM approach, and 3) the hard HMM and GMM as the special models of both FE and FCM approaches. Finally, a fuzzy approach to speaker verification and a further extension using possibility theory are also proposed. The evaluation experiments performed on the TI46, ANDOSL and YOHO corpora show better results for all of the proposed techniques in comparison with the nonfuzzy baseline techniques. ii Certificate of Authorship of Thesis Except as specially indicated in footnotes, quotations and the bibliography, I certify that I am the sole author of the thesis submitted today entitled—
ON THE USE OF TIEDMIXTURE
"... Tiedmixture (or semicontinuous) distributions are an important tool for acoustic modeling, used in many highperformance speech recognition systems today. This paper provides a survey of the work in this area, outlining the different options available for tied mixture modeling, introducing algorith ..."
Abstract
 Add to MetaCart
Tiedmixture (or semicontinuous) distributions are an important tool for acoustic modeling, used in many highperformance speech recognition systems today. This paper provides a survey of the work in this area, outlining the different options available for tied mixture modeling, introducing algorithms for reducing training time, and providing experimental results assessing the tradeoffs for speakerindependent recognition on the Resource Management task. Additionally, we describe an extension of tied mixtures to segmentlevel distributions. 1.