Results 1 
6 of
6
Connected Digit Recognition Using Statistical Template Matching
 Proc. 1995 Europ. Conf. on Speech Communication and Technology
, 1995
"... In this paper we describe the optimization of 'conventional ' template matching techniques for connected digit recognition (TI/NIST connected digit corpus). In particular we carried out a series of experiments in which we studied various aspects of signal processing, acoustic modeling, mixture densi ..."
Abstract

Cited by 10 (9 self)
 Add to MetaCart
In this paper we describe the optimization of 'conventional ' template matching techniques for connected digit recognition (TI/NIST connected digit corpus). In particular we carried out a series of experiments in which we studied various aspects of signal processing, acoustic modeling, mixture densities and linear transforms of the acoustic vector. After all optimization steps, our best string error rate on the TI/NIST connected digit corpus was 1.71% for single densities and 0.74% for mixture densities. 1. INTRODUCTION Over the last five years much progress has been made in connected digit recognition [3, 7, 8, 9]. This paper describes how the systematic optimization of various components of a 'conventional' recognition system leads to high performance comparable with other systems that use much more complicated techniques. Experimental results on the adult corpus of the TI/NIST connected digit corpus are given. The optimization steps presented in this paper are: 1. Several methods f...
The Use of Speaker Correlation Information for Automatic Speech Recognition
, 1998
"... This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores withinspeaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker in ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
This dissertation addresses the independence of observations assumption whichis typically made by today's automatic speech recognition systems. This assumption ignores withinspeaker correlations which are known to exist. The assumption clearly damages the recognition ability of standard speaker independent systems, as can seen by the severe drop in performance exhibited by systems between their speaker dependent mode and their speaker independent mode. The typical solution to this problem is to apply speaker adaptation to the models of the speaker independent system. This approach is examined in this thesis with the explicit goal of improving the rapid adaptation capabilities of the system by incorporating withinspeaker correlation information into the adaptation process. This is achieved through the creation of an adaptation technique called referencespeaker weighting and in the development of a speaker clustering technique called speaker cluster weighting. However, speaker adaptation is just one way in which the independence assumption can be attacked. This dissertation also introduces a novel speech recognition technique called consistency modeling. This technique utilizes a priori knowledge about the withinspeaker correlations which exist between di#erent phonetic events for the purpose of incorporating speaker constraintinto a speech recognition system without explicitly applying speaker adaptation. These new techniques are implemented within a segmentbased speech recognition system and evaluation results are reported on the DARPA Resource Management recognition task.
Using Map Estimated Parameters To Improve HMM Speech Recognition Performance
, 1994
"... Hidden Markov models (HMMs) have been quite successfully applied to speech recognition tasks, but many unsolved problems still remain. HMMs do not directly model all phenomena that might be useful for recognition. This is the case, for example, for duration modeling. Mechanisms are needed to incorpo ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Hidden Markov models (HMMs) have been quite successfully applied to speech recognition tasks, but many unsolved problems still remain. HMMs do not directly model all phenomena that might be useful for recognition. This is the case, for example, for duration modeling. Mechanisms are needed to incorporate additional information into an HMM system. This paper presents a maximum a posteriori (MAP) parameter estimation approach for improving the stateduration modeling capability and incorporating a priori knowledge about the wordduration distribution into an HMM. The MAPbased approach is evaluated on a talkerindependent, connected alphadigit task for various prior distributions on duration. The results  in terms of both computational complexity and recognition performance  are compared with the results of HMMbased systems trained with the traditional maximumlikelihood criterion. 1. INTRODUCTION Over the past decade, there has been considerable research in the area of speech r...
Speaker Adaptation Using Regularization And Network Adaptation For Hybrid MMINN/HMM Speech Recognition
, 1999
"... This paper describes, how to perform speaker adaptation for a hybrid large vocabulary speech recognition system. The hybrid system is based on a Maximum Mutual Information Neural Network (MMINN), which is used as a Vector Quantizer (VQ) for a discrete HMM speech recognizer. The combination of MMINN ..."
Abstract
 Add to MetaCart
This paper describes, how to perform speaker adaptation for a hybrid large vocabulary speech recognition system. The hybrid system is based on a Maximum Mutual Information Neural Network (MMINN), which is used as a Vector Quantizer (VQ) for a discrete HMM speech recognizer. The combination of MMINNs and HMMs has shown good performance on several large vocabulary speech recognition tasks like RM and WSJ. This paper now presents two approaches to perform speaker adaptation with this hybrid system. The first approach is a transformation of the feature space, which is performed by a neural network with maximum likelihood (ML) as objective function for the complete system, which means, that the parameters of the NN are estimated in order to match the HMMparameters of the pretrained speaker independent system. The second approach is to adapt the HMM parameters depending on the amount of training data available per HMM, using a regularization approach. Both approaches can be applied join...
THE SIMILARITY MEASURE AMONG ACOUSTIC MODELS AND ITS TWO APPLICATIONS
"... The distance measure of two stochastic processes is a key problem in the processing of stochastic signals.In speech recognition, the distance between two basic recognition models can provide the information about the relation and the difference of these two units.In fact, the distance measure can de ..."
Abstract
 Add to MetaCart
The distance measure of two stochastic processes is a key problem in the processing of stochastic signals.In speech recognition, the distance between two basic recognition models can provide the information about the relation and the difference of these two units.In fact, the distance measure can depict the model’s availability.We can improve the hit rate of recognition results by adjusting the distance between basic unit models.In recent years, many definitions have been put forward for calculating the exact value of the distance between stochastic processes. We also have developed a simplified distance measure based on CDCPM (CenterDistance Continuous Probability Model) which is an improved version of CHMM (Continuous Hidden Markov Model).And since CDN (CenterDistance Normal) distribution is derived from the normal distribution, the definition can be extended to other types of acoustic models such as Segmental HMM easily.In this paper, we will focus on this simplified definition of distance measure and propose two examples applied to continuous speech recognition. And the experiment result shows it preserve very good performance without additory computation. 1. INTRODUCTION TO
Bayesian Approaches to Acoustic Modeling: A Review
, 2012
"... This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization ca ..."
Abstract
 Add to MetaCart
This paper focuses on applications of Bayesian approaches to acoustic modeling for speech recognition and related speech processing applications. Bayesian approaches have been widely studied in the fields of statistics and machine learning, and one of their advantages is that their generalization capability is better than that of conventional approaches (e.g., maximum likelihood). On the other hand, since inference in Bayesian approaches involves integrals and expectations that are mathematically intractable in most cases and require heavy numerical computations, it is generally difficult to apply them to practical speech recognition problems. However, there have been many such attempts, and this paper aims to summarize these attempts to encourage further progress on Bayesian approaches in the speech processing field. This paper describes various applications of Bayesian approaches to speech processing in terms of the four typical ways of approximating Bayesian inferences, i.e., maximum a posteriori approximation, model complexity control using a Bayesian information criterion based on asymptotic approximation, variational approximation, and Markov chain Monte Carlo based sampling techniques.