Results 1 - 10
of
38
An Application of Recurrent Nets to Phone Probability Estimation
- IEEE Transactions on Neural Networks
, 1994
"... This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed ..."
Abstract
-
Cited by 165 (8 self)
- Add to MetaCart
This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed
Maximum Likelihood Discriminant Feature Spaces
- in Proc. ICASSP
, 2000
"... Linear discriminant analysis (LDA) is known to be inappropriate for the case of classes with unequal sample covariances. In recent years, there has been an interest in generalizing LDA to heteroscedastic discriminant analysis (HDA) by removing the equal within-class covariance constraint. This paper ..."
Abstract
-
Cited by 58 (16 self)
- Add to MetaCart
Linear discriminant analysis (LDA) is known to be inappropriate for the case of classes with unequal sample covariances. In recent years, there has been an interest in generalizing LDA to heteroscedastic discriminant analysis (HDA) by removing the equal within-class covariance constraint. This paper presents a new approach to HDA by defining an objective function which maximizes the class discrimination in the projected subspace while ignoring the rejected dimensions. Moreover, we will investigate the link between discrimination and the likelihood of the projected samples and show that HDA can be viewed as a constrained ML projection for a full covariance gaussian model, the constraint being given by the maximization of the projected between-class scatter volume. It will be shown that, under diagonal covariance gaussian modeling constraints, applying a diagonalizing linear transformation (MLLT) to the HDA space results in increased classification accuracy even though HDA alone actually...
Should recognizers have ears
- Speech Communication
, 1998
"... The paper discusses author’s experience with applying auditory knowledge to automatic recognition of speech. It indirectly argues against blind implementing of scattered accidental knowledge which may be irrelevant to a speech recognition task. It advances the notion that the reason for applying kno ..."
Abstract
-
Cited by 44 (3 self)
- Add to MetaCart
The paper discusses author’s experience with applying auditory knowledge to automatic recognition of speech. It indirectly argues against blind implementing of scattered accidental knowledge which may be irrelevant to a speech recognition task. It advances the notion that the reason for applying knowledge of human auditory perception in engineering applications should be the ability of perception to suppress some parts of information in the speech message. Three properties of human speech perception: limited spectral resolution, use of information from about syllable-length segments ability to alleviate unreliable cues, are discussed in some detail. Overall, we are advocating selective use of auditory knowledge,optimized on real speechdata. Fig. I A good hard working man. Fig. II A foolish man?
Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1997
"... Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the short-time properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlatio ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the short-time properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: (i) to maximize the likelihood of observed speech signals, or (ii) to minimize the number of classification errors. We derive an Expectation-Maximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one small-sized vocabulary (connected alpha-digits) and one medium-sized vocabulary (New Jersey town names). We find that modeling feature correlations...
Comparison of Discriminative Training Criteria and Optimization Methods for Speech Recognition
, 2001
"... The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The u ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
The aim of this work is to build up a common framework for a class of discriminative training criteria and optimization methods for continuous speech recognition. A unified discriminative criterion based on likelihood ratios of correct and competing models with optional smoothing is presented. The unified criterion leads to particular criteria through the choice of competing word sequences and the choice of smoothing. Analytic and experimental comparisons are presented for both the maximum mutual information (MMI) and the minimum classification error (MCE) criterion together with the optimization methods gradient descent (GD) and extended Baum (EB) algorithm. A tree search-based restricted recognition method using word graphs is presented, so as to reduce the computational complexity of large vocabulary discriminative training. Moreover, for MCE training, a method using word graphs for efficient calculation of discriminative statistics is introduced. Experiments were performed for continuous speech recognition using the ARPA wall street journal (WSJ) corpus with a vocabulary of 5k words and for the recognition of continuously spoken digit strings using both the TI digit string corpus for American English digits, and the SieTill corpus for telephone line recorded German digits. For the MMI criterion, neither analytical nor experimental results do indicate significant differences between EB and GD optimization. For acoustic models of low complexity, MCE training gave significantly better results than MMI training. The recognition results for large vocabulary MMI training on the WSJ corpus show a significant dependence on the context length of the language model used for training. Best results were obtained using a unigram language model for MMI training. No significant co...
Strategies for name recognition in automatic directory assistance systems
- In Proc. IVTTA
, 1998
"... Abstract The commercial viability of automating large scale directory assistance is shown by presenting new results on the recognition of large numbers of different names. Satisfactory recognition performance is achieved by employing a stochastic combination of N-best lists retrieved from multiple u ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Abstract The commercial viability of automating large scale directory assistance is shown by presenting new results on the recognition of large numbers of different names. Satisfactory recognition performance is achieved by employing a stochastic combination of N-best lists retrieved from multiple user utterances with the telephone database as an additional knowledge source. The strategy is used in a prototype of a fully automated directory information system which is designed to cover a whole country: After the city has been selected, the user is asked for first and last name of the desired person and, if necessary, also for the street or a spelling of the last name. Confidence measures are used for an optimal dialogue flow.
Automatic Question Generation For Decision Tree Based State Tying
- Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing
, 1998
"... Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same pho ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
Decision tree based state tying uses so-called phonetic questions to assign triphone states to reasonable acoustic models. These phonetic questions are in fact phonetic categories such as vowels, plosives or fricatives. The assumption behind this is that context phonemes which belong to the same phonetic class have a similar influence on the pronunciation of a phoneme. For a new phoneme set, which has to be used e.g. when switching to a different corpus, a phonetic expert is needed to define proper phonetic questions. In this paper a new method is presented which automatically defines good phonetic questions for a phoneme set. This method uses the intermediate clusters from a phoneme clustering algorithm which are reduced to an appropriate number afterwards. Recognition results on the Wall Street Journal data for within-word and acrossword phoneme models show competitive performance of the automatically generated questions with our best handcrafted question set.
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
Data Based Filter Design for RASTA-like Channel Normalization in ASR
- in ASR. In ICSLP
, 1996
"... RASTA processing has proven to be a successful technique for channel normalization in automatic speech recognition (ASR). We present two approaches to the design of RASTA-like filters from training data. One consists of finding the solution to a constrained optimization problem on the feature time t ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
RASTA processing has proven to be a successful technique for channel normalization in automatic speech recognition (ASR). We present two approaches to the design of RASTA-like filters from training data. One consists of finding the solution to a constrained optimization problem on the feature time trajectories while the other uses Linear Discriminant Analysis (LDA). Whereas LDA is often applied to one or a few frames of the feature vectors we apply LDA to feature time trajectories. Both approaches result in similar filters which are consistent with the ad hoc designed RASTA filter. 1. Introduction Relatively unstructured data-driven systems are the mainstream in today's ASR. These systems acquire their structure from the large amounts of training data and are susceptible to failure when used in conditions that assumea different structure than that acquired during the training. It is our belief that more knowledge-constrained and structured designs will result in simpler and ultimatel...
A Comparative Study Of Linear Feature Transformation Techniques For Automatic Speech Recognition
- in Proc. Int. Conf. on Spoken Language Processing
, 1996
"... Although widely used, there are still open questions concerning which properties of Linear Discriminant Analysis (LDA) do account for its success in many speech recognition systems. In order to gain more insight into the nature of the transformation we compare LDA with mel-cepstral feature vectors w ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Although widely used, there are still open questions concerning which properties of Linear Discriminant Analysis (LDA) do account for its success in many speech recognition systems. In order to gain more insight into the nature of the transformation we compare LDA with mel-cepstral feature vectors with respect to the following criteria: decorrelation and ordering property, invariance under linear transforms, automatic learning of dynamical features, and data dependence of the transformation.

