Results 1 - 10
of
16
A Maximum Entropy Approach to Adaptive Statistical Language Modeling
- Computer, Speech and Language
, 1996
"... An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's histor ..."
Abstract
-
Cited by 201 (11 self)
- Add to MetaCart
An adaptive statistical languagemodel is described, which successfullyintegrates long distancelinguistic information with other knowledge sources. Most existing statistical language models exploit only the immediate history of a text. To extract information from further back in the document's history, we propose and use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from multiple sources must be combined. Traditionally, linear interpolation and its variants have been used, but these are shown here to be seriously deficient. Instead, we apply the principle of Maximum Entropy (ME). Each information source gives rise to a set of constraints, to be imposed on the combined estimate. The intersection of these constraints is the set of probability functions which are consistent with all the information sources. The function with the highest entropy within that set is the ME solution...
Adaptive Statistical Language Modeling: A Maximum Entropy Approach
, 1994
"... Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model's parameters. Language modeling is useful in automatic speech recognition, machine translation ..."
Abstract
-
Cited by 154 (5 self)
- Add to MetaCart
Language modeling is the attempt to characterize, capture and exploit regularities in natural language. In statistical language modeling, large amounts of text are used to automatically determine the model's parameters. Language modeling is useful in automatic speech recognition, machine translation, and any other application that processes natural language with incomplete knowledge. In this thesis, I view language as an information source which emits a stream of symbols from a finite alphabet (the vocabulary). The goal of language modeling is then to identify and exploit sources of information in the language stream, so as to minimize its perceived entropy. Most existing statistical language models exploit the immediate past only. To extract information from further back in the document's history, I use trigger pairs as the basic information bearing elements. This allows the model to adapt its expectations to the topic of discourse. Next, statistical evidence from many sources must...
Integrating Multiple Knowledge Sources For Detection And Correction Of Repairs In Human-Computer Dialog
, 1992
"... We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automaticaJ]y detecting the presence of a repair, its location, and making the appropriate correction. The criteria ..."
Abstract
-
Cited by 84 (12 self)
- Add to MetaCart
We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automaticaJ]y detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics.
Automatic Detection and Correction of Repairs in Human-Computer Dialog
- Proceedings of the DARPA Speech and Natural Language Workshop
, 1992
"... We have analyzed 607 sentences of spontaneous humancomputer speech data containing repairs (drawn from a corpus of 10,718). We present here criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integrat ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We have analyzed 607 sentences of spontaneous humancomputer speech data containing repairs (drawn from a corpus of 10,718). We present here criteria and techniques for automatically detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics. 1. INTRODUCTION Spontaneous spoken language often includes speech that is not intended by the speaker to be part of the content of the utterance. This speech must be detected and deleted in order to correctly identify the intended meaning. This broad class of disfluencies encompasses a number of phenomena, including word fragments, interjections, filled pauses, restarts, and repairs. We are analyzing the repairs in a large subset (over ten thousand sentences) of spontaneous speech data collected for the DARPA spoken language program. We have categorized these disfluencies as to type and...
Near-Miss Modeling: A Segment-Based Approach to Speech Recognition
, 1998
"... Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Currently, most approaches to speech recognition are frame-based in that they represent speech as a temporal sequence of feature vectors. Although these approaches have been successful, they cannot easily incorporate complex modeling strategies that may further improve speech recognition performance. In contrast, segment-based approaches represent speech as a temporal graph of feature vectors and facilitate the incorporation of a wide range of modeling strategies. However, difficulties in segmentbased recognition have impeded the realization of potential advantages in modeling. This thesis
Multi-Site Data Collection for a Spoken Language Corpus: MADCOW
- In Proceedings of the DARPA Speech and Natural Language Workshop
, 1992
"... This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. This data collection effort has been co-ordinated by MADCOW (Multi-site ATIS Data COllection Working group). We summarize the motivation for this effort, the goals, the implementatio ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
This paper describes a recently collected spoken language corpus for the ATIS (Air Travel Information System) domain. This data collection effort has been co-ordinated by MADCOW (Multi-site ATIS Data COllection Working group). We summarize the motivation for this effort, the goals, the implementation of a multi-site data collection paradigm, and the accomplishments of MADCOW in monitoring the collection and distribution of 12,000 utterances of spontaneous speech from five sites for use in a multi-site common evaluation of speech, natural language and spoken language. 1. Introduction Following the February 1991 DARPA Speech and Natural Language Workshop, the DARPA Spoken Language contractors decided to institute a multi-site data collection paradigm in order to: ffl support a common evaluation on speech, natural language and spoken language; ffl maximize the amount of data collected; ffl provide some diversity in data collection paradigms; ffl reduce cost to any one site by sharing...
The Effect of Speech Recognition Accuracy Rates on the Usefulness and Usability of Webcast Archives
, 2006
"... The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for speci ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
The widespread availability of broadband connections has led to an increase in the use of Internet broadcasting (webcasting). Most webcasts are archived and accessed numerous times retrospectively. In the absence of transcripts of what was said, users have difficulty searching and scanning for specific topics. This research investigates user needs for transcription accuracy in webcast archives, and measures how the quality of transcripts affects user performance in a question-answering task, and how quality affects overall user experience. We tested 48 subjects in a within-subjects design under 4 conditions: perfect transcripts, transcripts with 25 % Word Error Rate (WER), transcripts with 45 % WER, and no transcript. Our data reveals that speech recognition accuracy linearly influences both user performance and experience, shows that transcripts with 45 % WER are unsatisfactory, and suggests that transcripts having a WER of 25 % or less would be useful and usable in webcast archives.
2002. Adding intelligent help to mixed-initiative spoken dialogue systems
- In ACL-02 Companion Volume to the Proceedings of the Conference, page 95, Philadelphia. Association for Computational Linguistics
, 2002
"... The rapidly expanding voice recognition industry has so far shown a preference for grammar-based language modelling, despite the better overall performance of statistical language modelling. Given that the advantages of the grammar-based approach make it unlikely to be replaced as the primary soluti ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
The rapidly expanding voice recognition industry has so far shown a preference for grammar-based language modelling, despite the better overall performance of statistical language modelling. Given that the advantages of the grammar-based approach make it unlikely to be replaced as the primary solution in the near future, it is natural to wonder whether some combination of the two approaches may prove useful. Here, we describe an implemented system that uses statistical language modelling and a decision-tree classifier to provide the user with some feedback when grammarbased recognition fails. Users of this system had more successful interactions than did users of a control system. 1.
A Framework and Toolkit for the Construction of Multimodal Learning Interfaces
, 1998
"... Multimodal human-computer interaction, in which the computer accepts input from multiple channels or modalities, is more flexible, natural, and powerful than unimodal interaction with input from a single modality. Many research studies ([Hauptmann89], [Nakagawa94], [Nishimoto94], [Oviatt97b], [Chu97 ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Multimodal human-computer interaction, in which the computer accepts input from multiple channels or modalities, is more flexible, natural, and powerful than unimodal interaction with input from a single modality. Many research studies ([Hauptmann89], [Nakagawa94], [Nishimoto94], [Oviatt97b], [Chu97], to name a few) have reported that the combination of human communication means such as speech, gestures, handwriting, eye movement, etc. enjoys strong preference among users. Unfortunately, the development of multimodal applications is difficult and still suffers from a lack of generality, such that a lot of duplicated effort is wasted when implementing different applications sharing some common aspects. The research presented in this dissertation aims to provide a partial solution to the difficult problem of developing multimodal applications by creating a modular, distributed, and customizable infrastructure to facilitate the construction of such applications. This dissertation contribu...
FeasPar - A Feature Structure Parser Learning to Parse Spoken Language
, 1996
"... We describe and experimentally evaluate a system, FeasPar, that learns parsing spontaneous speech. To train and run FeasPar (Feature Structure Parser), only limited handmodeled knowledge is required. The FeasPar architecture consists of neural networks and a search. The networks spilt the incoming s ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We describe and experimentally evaluate a system, FeasPar, that learns parsing spontaneous speech. To train and run FeasPar (Feature Structure Parser), only limited handmodeled knowledge is required. The FeasPar architecture consists of neural networks and a search. The networks spilt the incoming sentence into chunks, which are labeled with feature values and chunk relations. Then, the search finds the most probable and consistent feature structure. FeasPar is trained...

