Results 1 -
4 of
4
Error-responsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal interfaces, multi-media databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \meta-level " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.
Word And Acoustic Confidence Annotation For Large Vocabulary Speech Recognition
"... We present improvements in confidence annotation of automatic speech recognizer output for large vocabulary, speakerindependent systems. Several strong additions to the set of predictor variables used for this purpose are discussed. Extensions which allow prediction of separate types of errors, as o ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
We present improvements in confidence annotation of automatic speech recognizer output for large vocabulary, speakerindependent systems. Several strong additions to the set of predictor variables used for this purpose are discussed. Extensions which allow prediction of separate types of errors, as opposed to the simple presence of an error, are presented. A new development, acoustic confidence annotation, is explored, in which a predictor is built that indicates the likely successes and failures of the acoustic models alone. Four separate learning mechanisms are compared in terms of their ability to provide good confidence annotations from the same set of predictor variables. Performance figures are reported on both read news (the North American Business news corpus) and conversational telephone speech (the Switchboard corpus) , both in American English. The Sphinx-II system [1] is used for the NAB tests. The Janussystem [2] is used for the Switchboard tests. 1. Annotation of Read Spe...
Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance
- COMPUTER SCIENCE DEPARTMENT, CARNEGIE MELLON UNIVERSITY. HTTP://WWW.CS.CMU.EDU/~MSIEGLER/PUBLISH/PHD/THESIS.PS.GZ SINGHAL
, 1999
"... Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study th ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Traditionally, indexing and searching of speech content in multimedia databases have been achieved through a combination of separately constructed speech recognition and information retrieval engines. Although each technology has a legacy of research, only recently have efforts been made to study the potential suboptimality of this strategy, and none of these efforts specifically addresses the presence of uncertainty in automatically generated transcriptions. This research develops a refinement of the most common information retrieval relevance formula, TFIDF, to incorporate uncertainty as a retrieval feature, along with a set of techniques to acquire this uncertainty from multiple hypotheses produced by existing speech recognition data structures. In the process a greater amount of evidence is extracted than is available in the most likely transcription hypothesis, and overall retrieval precision and recall are improved. The term weighting scheme known as the inverse document frequenc...
Experiments In Information Retrieval From Spoken Documents
, 1998
"... This paper describes the experiments performed as part of the TREC-97 Spoken Document Retrieval Track. The task was to pick the correct document from 35 hours of recognized speech documents, based on a text query describing exactly one document. Among the experiments we described here are: Vocabular ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper describes the experiments performed as part of the TREC-97 Spoken Document Retrieval Track. The task was to pick the correct document from 35 hours of recognized speech documents, based on a text query describing exactly one document. Among the experiments we described here are: Vocabulary size experiments to assess the effect of words missing from the speech recognition vocabulary; experiments with speech recognition using a stemmed language model; using confidence annotations that estimate of the correctness of each recognized word; using multiple hypotheses from the recognizer. And finally we also measured the effects of corpus size on the SDR task. Despite fairly high word error rates, information retrieval performance was only slightly degraded for speech recognizer transcribed documents. 1. INTRODUCTION For the first time, the 1997 Text REtrieval Conference (TREC97) included an evaluation track for information retrieval on spoken documents. In this paper, we describe ...

