Results 1 -
4 of
4
Prosodic Cues to Recognition Errors
- In Proceedings of the Automatic Speech Recognition and Understanding Workshop
, 1999
"... We identify methods of distinguishing between correctly and incorrectly recognized utterances (scored by hand for semantic concept accuracy) for a speech recognition system, using acoustic/prosodic characteristics. The analysis was performed on data collected during independent experiments done with ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
We identify methods of distinguishing between correctly and incorrectly recognized utterances (scored by hand for semantic concept accuracy) for a speech recognition system, using acoustic/prosodic characteristics. The analysis was performed on data collected during independent experiments done with an interactive voice response system that provides travel information over the phone. 1. INTRODUCTION There has been little research in the field of automatic speech recognition (ASR) on the question of how misrecognized utterances differ from correctly recognized utterances. Recognition performance is known to vary depending upon the relative formality or casualness of speaking style [14], but there has been little attempt to identify this variation precisely. An exception is a study of the effect of speaking style on recognition performance in the Switchboard Corpus in which a standard recognition system was augmented with a conditioning variable, the speaking style (mode) [8]. Lexical ...
Acoustic Model Clustering Based on Syllable Structure
, 2002
"... Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Current speech recognition systems perform poorly on conversational speech as compared to read speech, arguably due to the large acoustic variability inherent in conversational speech. Our hypothesis is that there are systematic effects in local context, associated with syllabic structure, that are not being captured in the current acoustic models. Such variation may be modeled using a broader definition of context than in traditional systems which restrict context to be the neighboring phonemes. In this paper, we study the use of word- and syllable-level context conditioning in recognizing conversational speech. We describe a method to extend standard tree-based clustering to incorporate a large number of features, and we report results on the Switchboard task which indicate that syllable structure outperforms pentaphones and incurs less computational cost. It has been hypothesized that previous work in using syllable models for recognition of English was limited because of ignoring the phenomenon of re-syllabification (change of syllable structure at word boundaries), but our analysis shows that accounting for re-syllabification does not impact recognition performance.
INDICATOR VARIABLE DEPENDENT OUTPUT PROBABILITY MODELLING VIA CONTINUOUS POSTERIOR FUNCTIONS
"... This paper investigates the problem of inserting an additional hidden variable into a standard HMM. It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable. The posteriors are modelled b ..."
Abstract
- Add to MetaCart
This paper investigates the problem of inserting an additional hidden variable into a standard HMM. It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable. The posteriors are modelled by softmax functions with polynomial exponents and an efficient method is developed for reestimating their parameters. After analysing a two dimensional reestimation example on artificial data, the proposed HMM is evaluated on the 1997 Broadcast News task with a particular focus on spontaneous speech. To derive a good indicator variable for this purpose, classification experiments are carried out on fast and slow classes of phones on the 1997 Broadcast News training data. Finally, recognition experiments on the test set of this task show that the proposed model gives an improvement over a standard HMM with a comparable number of parameters. 1.
Indicator Variable Dependent Output Probability Modelling Via
"... This paper investigates the problem of inserting an additional hidden variable into a standard HMM. It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable. The posteriors are modelled b ..."
Abstract
- Add to MetaCart
This paper investigates the problem of inserting an additional hidden variable into a standard HMM. It is shown that this can be done by introducing a continuous feature which is used to calculate the probability of observing the different states of the hidden variable. The posteriors are modelled by softmax functions with polynomial exponents and an efficient method is developed for reestimating their parameters. After analysing a two dimensional reestimation example on artificial data, the proposed HMM is evaluated on the 1997 Broadcast News task with a particular focus on spontaneous speech. To derive a good indicator variable for this purpose, classification experiments are carried out on fast and slow classes of phones on the 1997 Broadcast News training data. Finally, recognition experiments on the test set of this task show that the proposed model gives an improvement over a standard HMM with a comparable number of parameters.

