Results 1 -
7 of
7
Genones: Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers
- IEEE Transactions on Speech and Audio Processing
, 1996
"... An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture co ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
An algorithm is proposed that achieves a good trade-off between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's Wall-Street Journal corpus show that this scheme reduces errors by 25% over typical tied-mixture systems. New fast algorithms for computing Gaussian likelihoods--the most time-consuming aspect of continuous-density HMM systems--are also presented. These new algorithms significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy. Corresponding Author: Vassilios Digalakis Address: Electronic and Computer Engineering Department Technical University of Crete, Kounoupidiana Chania, 73100 GREECE Phone: +30-821...
Training Data Clustering For Improved Speech Recognition
- in Proceedings of EUROSPEECH
, 1995
"... We present an approach to cluster the training data for automatic speech recognition (ASR). A relativeentropy based distance metric between training data clusters is defined. This metric is used to hierarchically cluster the training data. The metric can also be used to select the closest training d ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We present an approach to cluster the training data for automatic speech recognition (ASR). A relativeentropy based distance metric between training data clusters is defined. This metric is used to hierarchically cluster the training data. The metric can also be used to select the closest training data clusters given a small amount of data from the test speaker. The selected clusters are then used to estimate a set of hidden Markov models (HMMs) for recognizing the speech from the test speaker. We present preliminary experimental results of the clustering algorithm and its application to ASR. 1 Introduction While progress in ASR has been encouraging, it has become increasingly clear that ASR systems must perform well in the presence of mismatches between the training and testing environments. ASR systems trained in one environment often perform poorly in a new environment due to mismatches between the training and testing conditions. Common sources of mismatches include different tran...
Improved Modeling and Efficiency for Automatic Transcription of Broadcast News
, 2000
"... Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We fo ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Over the last few years, the DARPA-sponsored Hub4 continuous speech recognition evaluations have pushed speech recognition technology for the very interesting and difficult task of automatically transcribing broadcast news. In this paper, we report on our research and progress on this problem. We focus on individual techniques we developed, rather than on descriptions of our evaluation systems. We provide comparative experimental results showing the improvements obtained with the novel approaches we developed. 1 Introduction In recent years there has been increasing interest in developing large-vocabulary continuous speech recognition (LVCSR) systems for speech found in real sources. Broadcast news, in particular, has been the testbed for the DARPA-sponsored Hub4 continuous speech recognition (CSR) evaluations over the last few years, and represents a significant challenge to speech recognition researchers. Many interesting problems are associated with the automatic recognition of b...
Performance Prediction for Exponential Language Models
"... We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, an ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We investigate the task of performance prediction for language models belonging to the exponential family. First, we attempt to empirically discover a formula for predicting test set cross-entropy for n-gram language models. We build models over varying domains, data set sizes, and n-gram orders, and perform linear regression to see whether we can model test set performance as a simple function of training set performance and various model statistics. Remarkably, we find a simple relationship that predicts test set performance with a correlation of 0.9997. We analyze why this relationship holds and show that it holds for other exponential language models as well, including class-based models and minimum discrimination information models. Finally, we discuss how this relationship can be applied to improve language model performance. 1
Segmental Modeling Using a Continuous Mixture of Non-parametric Models
- IEEE Trans on SAP
, 1997
"... The aim of the research described in this paper is to overcome the modeling limitation of conventional hidden Markov models. We present a segmental model that consists of two elements. The first is a nonparametric representation of both the mean and variance trajectories, which describes the local d ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The aim of the research described in this paper is to overcome the modeling limitation of conventional hidden Markov models. We present a segmental model that consists of two elements. The first is a nonparametric representation of both the mean and variance trajectories, which describes the local dynamics. The second element is some parameterized transformation (e.g., random shift) of the trajectory that is global to the segment and models long-term variations such as speaker identity. Introduction Speech sounds are produced by a time-varying dynamic system. Consequently, speech signals are highly correlated and nonstationary. In spite of this fact, in most implementations of hidden Markov models (HMMs) to speech recognition, the assumption that successive observations in a state are independent and identically distributed is inherent to the model. These limitations of the HMM are due to the fact that the HMM is a frame-based approach. An alternative approach is segmental modeling, w...
Hmm State Clustering Across Allophone Class Boundaries
"... We present a novel approach to hidden Markov model (HMM) state clustering based on the use of broad phone classes and an allophone class entropy measure. Most state-of-the-art largevocabulary speech recognizers are based on context-dependent (CD) phone HMMs that use Gaussian mixture models for the s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a novel approach to hidden Markov model (HMM) state clustering based on the use of broad phone classes and an allophone class entropy measure. Most state-of-the-art largevocabulary speech recognizers are based on context-dependent (CD) phone HMMs that use Gaussian mixture models for the state-conditioned observation densities. A common approach for robust HMM parameter estimation is to cluster HMM states where each state cluster shares a set of parameters such as the components of a Gaussian mixture model. In all the current state clustering algorithms, the HMM states are clustered only within their respective allophone classes. While this makes some intuitive sense, it prevents the clustering of states across allophone class boundaries, even when the states are acoustically similar. Our algorithm allows clustering across allophone class boundaries by defining broad phone groups within which two states from different allophone classes can be clustered together. An allophone ...
Training Issues and Channel Equalization Techniques for the Construction of Telephone Acoustic Models Using a High-Quality Speech Corpus
"... We describe an approach for the estimation of acoustic phonetic models that will be used in a hidden Markov model (HMM) recognizer operating over the telephone. We explore two complementary techniques to developing telephone acoustic models. The first technique presents two new channel compensation ..."
Abstract
- Add to MetaCart
We describe an approach for the estimation of acoustic phonetic models that will be used in a hidden Markov model (HMM) recognizer operating over the telephone. We explore two complementary techniques to developing telephone acoustic models. The first technique presents two new channel compensation algorithms. Experimental results on the Wall Street Journal corpus show no significant improvement over sentence-based cepstral -mean removal. The second technique uses an existing "high-quality" speech corpus to train acoustic models that are appropriate for the Switchboard Credit Card task over longdistance telephone lines. Experimental results show that cross-database acoustic training yields performance similar to that of conventional task-dependent acoustic training. 1. This research was supported by the Advanced Research Projects Agency under Contract ONR N0001493 -C-0142 and ONR N00014-92-C-0154. It was also supported by a Grant, NSF IRI-9014829, from the National Science Foundation,...

