Results 1 -
5 of
5
Using Mutual Information To Design Feature Combinations
, 2000
"... Combination of different feature streams is a well-established method for improving speech recognition performance. This empirical success, however, poses theoretical problems when trying to design combination systems: is it possible to predict which feature streams will combine most advantageously, ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Combination of different feature streams is a well-established method for improving speech recognition performance. This empirical success, however, poses theoretical problems when trying to design combination systems: is it possible to predict which feature streams will combine most advantageously, and which of the many possible combination strategies will be most successful for the particular feature streams in question? We approach these questions with the tool of conditional mutual information (CMI), estimating the amount of information that one feature stream contains about the other, given knowledge of the correct subword unit label. We argue that CMI of the raw feature streams should be useful in deciding whether to use independent or conjoint acoustic models for the streams; this is only weakly supported by our results. We also argue that CMI between the outputs of independent classifiers based on each stream should help predict which streams can be combined most beneficially. ...
From Multi-Band Full Combination to Multi-Stream Full Combination Processing in Robust ASR
, 2000
"... The multi-band processing paradigm for noise robust ASR was originally motivated by the observation that human recognition appears to be based on independent processing of separate frequency sub-bands, and also by "missing data" results which have shown that ASR can be made significantly more robu ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
The multi-band processing paradigm for noise robust ASR was originally motivated by the observation that human recognition appears to be based on independent processing of separate frequency sub-bands, and also by "missing data" results which have shown that ASR can be made significantly more robust to band-limited noise if noisy sub-bands can be detected and then ignored. Of the different multi-band models which have been proposed, only the "Full Combination" or "all-wise" multi-band HMM/ANN hybrid approach allows us to consistently overcome the difficult problem of deciding which sub-bands are noisy, by integrating over all possible positions of noisy sub-bands. While this system has performed better than any other multi-band system which we have tested, we have also found that it only shows significantly improved robustness to noise when the noise is strongly band-limited. In real noise environments this is rarely the case. An alternative paradigm for noise robust ASR is multi-stream, a...
PUBLISHED AS
, 2003
"... for his love and his continuous support in good and bad times throughout this thesis To Laura Lou for her smiles and the energy they gave me when I needed it most To my parents for their perspective about the relative importance of a thesis and other things in life ii State-of-the-art automatic spee ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
for his love and his continuous support in good and bad times throughout this thesis To Laura Lou for her smiles and the energy they gave me when I needed it most To my parents for their perspective about the relative importance of a thesis and other things in life ii State-of-the-art automatic speech recognition (ASR) techniques are typically based on hidden Markov models (HMMs) for the modeling of temporal sequences of feature vectors extracted from the speech
doi:10.1155/2007/79032 Research Article Significance of Joint Features Derived from the Modified Group Delay Function in Speech Processing
"... This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum ..."
Abstract
- Add to MetaCart
This paper investigates the significance of combining cepstral features derived from the modified group delay function and from the short-time spectral magnitude like the MFCC. The conventional group delay function fails to capture the resonant structure and the dynamic range of the speech spectrum primarily due to pitch periodicity effects. The group delay function is modified to suppress these spikes and to restore the dynamic range of the speech spectrum. Cepstral features are derived from the modified group delay function, which are called the modified group delay feature (MODGDF). The complementarity and robustness of the MODGDF when compared to the MFCC are also analyzed using spectral reconstruction techniques. Combination of several spectral magnitude-based features and the MODGDF using feature fusion and likelihood combination is described. These features are then used for three speech processing tasks, namely, syllable, speaker, and language recognition. Results indicate that combining MODGDF with MFCC at the feature level gives significant improvements for speech recognition tasks in noise. Combining the MODGDF and the spectral magnitude-based features gives a significant increase in recognition performance of 11 % at best, while combining any two features derived from the spectral magnitude does not give any significant improvement. Copyright © 2007 Rajesh M. Hegde et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1.

