Results 1 - 10
of
10
TRIPS: An Integrated Intelligent Problem-Solving Assistant
- In Proc. 15th Nat. Conf. AI
, 1998
"... We discuss what constitutes an integrated system in AI, and why AI researchers should be interested in building and studying them. Taking integrated systems to be ones that integrate a variety of components in order to perform some task from start to finish, we believe that such systems (a) allow us ..."
Abstract
-
Cited by 76 (11 self)
- Add to MetaCart
We discuss what constitutes an integrated system in AI, and why AI researchers should be interested in building and studying them. Taking integrated systems to be ones that integrate a variety of components in order to perform some task from start to finish, we believe that such systems (a) allow us to better ground our theoretical work in actual tasks, and (b) provide an opportunity for much-needed evaluation based on task performance. We describe one particular integrated system we have developed that supports spoken-language dialogue to collaboratively solve planning problems. We discuss how the integrated system provides key advantages for helping both our work in natural language dialogue processing and in interactive planning and problem solving, and consider the opportunities such an approach affords for the future. Content areas: AI systems, natural language understanding, planning and control, problem solving, user interfaces Introduction It is an interesting time to be an A...
Robust Understanding in a Dialogue System
- In 34th Meeting of the Association for Computational Linguistics
, 1995
"... This paper describes a system that leads us to believe in the feasibility of constructing natural spoken dialogue systems in task-oriented domains. It specifically addresses the issue of robust interpretation of speech in the presence of recognition errors. Robustness is achieved by a combination of ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
This paper describes a system that leads us to believe in the feasibility of constructing natural spoken dialogue systems in task-oriented domains. It specifically addresses the issue of robust interpretation of speech in the presence of recognition errors. Robustness is achieved by a combination of statistical error post-correction, syntactically- and semantically-driven robust parsing, and extensive use of the dialogue context. We present an evaluation of the system using time-to-completion and the quality of the final solution that suggests that most native speakers of English can use the system successfully with virtually no training. 1.
The SRI March 2000 Hub-5 conversational speech transcription system
- In Proceedings of the NIST Speech Transcription Workshop
, 2000
"... We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation. The system performs four recognition passes: (1) bigram recognition with phone-loop-adapted, within-word triphone acoustic models, (2) lattice generation with transcription-m ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
We describe SRI’s large vocabulary conversational speech recognition system as used in the March 2000 NIST Hub-5E evaluation. The system performs four recognition passes: (1) bigram recognition with phone-loop-adapted, within-word triphone acoustic models, (2) lattice generation with transcription-mode-adapted models, (3) trigram lattice recognition with adapted cross-word triphone models, and (4) N-best rescoring and reranking with various additional knowledge sources. The system incorporates two new kinds of acoustic model: triphone models conditioned on speaking rate, and an explicit joint model of within-word phone durations. We also obtained an unusually large improvement from modeling crossword pronunciation variants in “multiword ” vocabulary items. The language model (LM) was enhanced with an “anti-LM ” representing acoustically confusable word sequences. Finally, we applied a generalized ROVER algorithm to combine the N-best hypotheses from several systems based on different acoustic models. 1.
A Fertility Channel Model for Post-Correction of Continuous Speech Recognition
- In Proceedings of the Fourth International Conference on Spoken Language Processing (ICSLP-96
, 1996
"... We have implemented a post-processor called SPEECHPP to correct word-level errors committed by an arbitrary speech recognizer. Applying a noisy-channelmodel, SPEECHPP uses a Viterbi beam-search that employs language and channel models. Previous work demonstrated that a simple word-for-word channel m ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We have implemented a post-processor called SPEECHPP to correct word-level errors committed by an arbitrary speech recognizer. Applying a noisy-channelmodel, SPEECHPP uses a Viterbi beam-search that employs language and channel models. Previous work demonstrated that a simple word-for-word channel model was sufficient to yield substantial increases in word accuracy. This paper demonstrates that some improvements in word accuracy result from augmenting the channel model with an account of word fertility in the channel. This work further demonstrates that a modern continuous speech recognizer can be used in “black-box ” fashion for robustly recognizing speech for which the recognizer was not originally trained. This work also demonstrates that in the case where the recognizer can be tuned to the new task, environment, or speaker, the post-processor can also contribute to performance improvements. 1.
Corrective Language Modeling For Large Vocabulary ASR With The Perceptron Algorithm
- PROC. ICASSP
, 2004
"... This paper investigates error-corrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finite-state automaton, and is used by intersecting the model with word lattices, making it simple and inexpensive to apply during decoding. We pre ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper investigates error-corrective language modeling using the perceptron algorithm on word lattices. The resulting model is encoded as a weighted finite-state automaton, and is used by intersecting the model with word lattices, making it simple and inexpensive to apply during decoding. We present results for various training scenarios for the Switchboard task, including using ngram features of different orders, and performing n-best extraction versus using full word lattices. We demonstrate the importance of making the training conditions as close as possible to testing conditions. The best approach yields a 1.3 percent improvement in first pass accuracy, which translates to 0.5 percent improvement after other rescoring passes.
Learning N-Best Correction Models from Implicit User Feedback in a Multi-Modal Local Search Application
"... We describe a novel n-best correction model that can leverage implicit user feedback (in the form of clicks) to improve performance in a multi-modal speech-search application. The proposed model works in two stages. First, the n-best list generated by the speech recognizer is expanded with additiona ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We describe a novel n-best correction model that can leverage implicit user feedback (in the form of clicks) to improve performance in a multi-modal speech-search application. The proposed model works in two stages. First, the n-best list generated by the speech recognizer is expanded with additional candidates, based on confusability information captured via user click statistics. In the second stage, this expanded list is rescored and pruned to produce a more accurate and compact n-best list. Results indicate that the proposed n-best correction model leads to significant improvements over the existing baseline, as well as other traditional n-best rescoring approaches. 1
Improving Automatic Speech Recognition for Lectures through Transformation-based Rules Learned from Minimal Data
"... We demonstrate that transformation-based learning can be used to correct noisy speech recognition transcripts in the lecture domain with an average word error rate reduction of 12.9%. Our method is distinguished from earlier related work by its robustness to small amounts of training data, and its r ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We demonstrate that transformation-based learning can be used to correct noisy speech recognition transcripts in the lecture domain with an average word error rate reduction of 12.9%. Our method is distinguished from earlier related work by its robustness to small amounts of training data, and its resulting efficiency, in spite of its use of true word error rate computations as a rule scoring function. 1
A Fertility Channel Model For Post-Correction Of Continuous Speech Recognition
- In Proceedings of the Fourth International Conference on Spoken Language Processing (ICSLP-96
"... We have implemented a post-processor called SPEECHPP to correct word-level errors committed by an arbitrary speech recognizer. Applying a noisy-channelmodel, SPEECHPPusesa Viterbi beam-search that employs language and channel models. Previous work demonstrated that a simple word-for-word channel mod ..."
Abstract
- Add to MetaCart
We have implemented a post-processor called SPEECHPP to correct word-level errors committed by an arbitrary speech recognizer. Applying a noisy-channelmodel, SPEECHPPusesa Viterbi beam-search that employs language and channel models. Previous work demonstrated that a simple word-for-word channel model was sufficient to yield substantial increases in word accuracy. This paper demonstrates that some improvements in word accuracy result from augmenting the channel model with an account of word fertility in the channel. This work further demonstrates that a modern continuous speech recognizer can be used in "black-box" fashion for robustly recognizing speech for which the recognizer was not originally trained. This work also demonstrates that in the case where the recognizer canbe tuned to the new task, environment, or speaker, the post-processor can also contribute to performance improvements.
Post-Editing Error Correction Algorithm For Speech Recognition using Bing Spelling Suggestion
"... process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a harsh surrounding wherein the input speech is of low quality. This paper proposes a post-editing ASR error correct ..."
Abstract
- Add to MetaCart
process of converting a spoken speech into text that can be manipulated by a computer. Although ASR has several applications, it is still erroneous and imprecise especially if used in a harsh surrounding wherein the input speech is of low quality. This paper proposes a post-editing ASR error correction method and algorithm based on Bing’s online spelling suggestion. In this approach, the ASR recognized output text is spell-checked using Bing’s spelling suggestion technology to detect and correct misrecognized words. More specifically, the proposed algorithm breaks down the ASR output text into several word-tokens that are submitted as search queries to Bing search engine. A returned spelling suggestion implies that a query is misspelled; and thus it is replaced by the suggested correction; otherwise, no correction is performed and the algorithm continues with the next token until all tokens get validated. Experiments carried out on various speeches in different languages indicated a successful decrease in the number of ASR errors and an improvement in the overall error correction rate. Future research can improve upon the proposed algorithm so much so that it can be parallelized to take advantage of multiprocessor computers.
A Natural Language Correction Model for Continuous Speech Recognition
"... We have developed a method of improving and controlling the accuracy of automated continuous speech recognition through linguistic postprocessing. In this approach, an output from a speech recognitio n system is passed to a trainable Correction Box module which attempts to locate and repair any tran ..."
Abstract
- Add to MetaCart
We have developed a method of improving and controlling the accuracy of automated continuous speech recognition through linguistic postprocessing. In this approach, an output from a speech recognitio n system is passed to a trainable Correction Box module which attempts to locate and repair any transcription errors. The Correction Box consists of a text alignment program, a correction-rule generator, and a series of rule application and verification steps. In the training phase, the correction rules are learned by aligning the recognized speech samples with their original, fully correct versions, on sentence by sentence basis. Misaligned sections give rise to candidate context-free correlation rules, e.g., from ~ frontal; there were made ~ the remainder, etc. Validation against a text corpus leads to context-sensitive correction rules, such as from view ~ frontal view. The system is applied to medical dictation in the area of clinical radiology.

