Results 1 - 10
of
84
Preliminaries to a Theory of Speech Disfluencies
, 1994
"... This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. ..."
Abstract
-
Cited by 97 (7 self)
- Add to MetaCart
This thesis examines disfluencies (e.g., "um", repeated words, and a variety of forms of self-repair) in the spontaneous speech of adult normal speakers of American English. Despite their prevalence, disfluencies have traditionally been viewed as irregular events and have received little attention. The goal of the thesis is to provide evidence that, on the contrary, disfluencies show remarkably regular trends in a number of dimensions. These regularities have consequences for models of human language production; they can also be exploited to improve performance in speech applications. The method includes analysis of over 5000 hand-annotated disfluencies from a database (250,000 words) containing three different styles of spontaneous speech: task-oriented human-computer dialog, task-oriented human-human dialog, and human-human conversation on a prescribed topic. The approach is theory-neutral and strongly data-driven. The annotations correspond to observable characteristics ("features") ...
Integrating Multiple Knowledge Sources For Detection And Correction Of Repairs In Human-Computer Dialog
, 1992
"... We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automaticaJ]y detecting the presence of a repair, its location, and making the appropriate correction. The criteria ..."
Abstract
-
Cited by 84 (12 self)
- Add to MetaCart
We have analyzed 607 sentences of spontaneous human-computer speech data containing repairs, drawn from a total corpus of 10,718 sentences. We present here criteria and techniques for automaticaJ]y detecting the presence of a repair, its location, and making the appropriate correction. The criteria involve integration of knowledge from several sources: pattern matching, syntactic and semantic analysis, and acoustics.
A Corpus-based study of repair cues in spontaneous speech
"... this paper, acoustic and prosodic cues to such repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair corr ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
this paper, acoustic and prosodic cues to such repairs are identified, based on an analysis of a corpus taken from the ARPA Air Travel Information System database, and methods are proposed for exploiting these cues for repair detection, especially the task of modeling word fragments, and repair correction. The relative contributions of these speech-based cues, as well as other text-based repair cues, are examined in a statistical model of repair site detection that achieves a precision rate of 91% and recall of 86% on a prosodically labeled corpus of repair utterances. (This paper appears in the Journal of the Acoustical Society of America, 95 (3), March 1994, pp.1603--1616.) PACS numbers: 43.72Ja,43.70.B,43.70.Bk,43.70.Fq Nakatani&Hirschberg, JASA 2 Introduction
Speech repairs, intonational phrases and discourse markers: modeling speakers’ utterances in spoken dialogue
- Computational Linguistics
, 1999
"... Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Eve ..."
Abstract
-
Cited by 61 (9 self)
- Add to MetaCart
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly intertwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72 % of turn-internal intonational boundaries with a precision of 71%, 97 % of discourse markers with 96 % precision, and detect and correct 66 % of repairs with 74 % precision.
Detecting and Correcting Speech Repairs
, 1994
"... Interactive spoken dialog provides many new challenges for spoken language systems. One of the most critical is the prevalence of speech repairs. This paper presents an algorithm that detects and corrects speech repairs based on finding the repair pattern. The repair pattern is built by finding word ..."
Abstract
-
Cited by 59 (13 self)
- Add to MetaCart
Interactive spoken dialog provides many new challenges for spoken language systems. One of the most critical is the prevalence of speech repairs. This paper presents an algorithm that detects and corrects speech repairs based on finding the repair pattern. The repair pattern is built by finding word matches and word replacements, and identifying fragments and editing terms. Rather than using a set of prebuilt templates, we build the pattern on the fly. In a fair test, our method, when combined with a statistical model to filter possible repairs, was successful at detecting and correcting 80 % of the repairs, without using prosodic information or a parser.
Conversational Actions and Discourse Situations
- COMPUTATIONAL INTELLIGENCE
, 1997
"... We use the idea that actions performed in a conversation become part of the common ground as the basis for a model of context that reconciles in a general and systematic fashion the differences between the theories of discourse context used for reference resolution, intention recognition, and dialog ..."
Abstract
-
Cited by 53 (14 self)
- Add to MetaCart
We use the idea that actions performed in a conversation become part of the common ground as the basis for a model of context that reconciles in a general and systematic fashion the differences between the theories of discourse context used for reference resolution, intention recognition, and dialogue management. We start from the treatment of anaphoric accessibility developed in DRT, and we show first how to obtain a discourse model that, while preserving DRT's basic ideas about referential accessibility, includes information about the occurrence of speech acts and their relations. Next, we show how the different kinds of `structure' that play a role in conversation -- discourse segmentation, turn-taking, and grounding -- can be formulated in terms of information about speech acts, and use this same information as the basis for a model of the interpretation of fragmentary input.
Lexical access in aphasic and nonaphasic speakers
- Psychological Review
, 1997
"... An interactive 2-step theory of lexical retrieval was applied to the picture-naming error patterns of aphasic and nonaphasic speakers. The theory uses spreading activation in a lexical network to accomplish the mapping between the conceptual representation of an object and the phonological form of t ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
An interactive 2-step theory of lexical retrieval was applied to the picture-naming error patterns of aphasic and nonaphasic speakers. The theory uses spreading activation in a lexical network to accomplish the mapping between the conceptual representation of an object and the phonological form of the word naming the object. A model developed from the theory was parameterized to fit normal error patterns. It was then "lesioned " by globally altering its connection weight, decay rates, or both to provide fits to the error patterns of 21 fluent aphasic patients. These fits were then used to derive predictions about the influence of syntactic categories on patient errors, the effect of phonology on semantic errors, error patterns after recovery, and patient performance on a single-word repetition task. The predictions were confirmed. It is argued that simple quantitative alterations to a normal processing model can explain much of the variety among patient patterns in naming. Difficulty in word retrieval is the most pervasive symptom of language breakdown in aphasia. As with other symptoms of brain damage, word retrieval is subject to graceful degradation (Marr, 1982; Rumelhart & McClelland, 1986): Unsuccessful attempts at retrieval generally resemble the target, either in
Predicting Spoken Disfluencies During Human-Computer Interaction
, 1995
"... This research characterizes the spontaneous spoken disfluencies typical of human-computer interaction, and presents a predictive model accounting for their occurrence. Data were collected during three empirical studies in which people spoke or wrote to a highly interactive simulated system as they c ..."
Abstract
-
Cited by 47 (6 self)
- Add to MetaCart
This research characterizes the spontaneous spoken disfluencies typical of human-computer interaction, and presents a predictive model accounting for their occurrence. Data were collected during three empirical studies in which people spoke or wrote to a highly interactive simulated system as they completed service transactions. The studies involved within-subject factorial designs in which the input modality and presentation format were varied. Spoken disfluency rates during human-computer interaction were documented to be substantially lower than rates typically observed during comparable human-human speech. Two separate factors, both associated with increased planning demands, were statistically related to higher disfluency rates: (1) length of utterance, and (2) lack of structure in the presentation format. Regression techniques demonstrated that a linear model based simply on utterance length accounted for over 77% of the variability in spoken disfluencies. Therefore, design methods ca...
Speaking while monitoring addressees for understanding
- JOURNAL OF MEMORY AND LANGUAGE
, 2004
"... ..."
The challenge of spoken language systems: Research directions for the nineties
- IEEE Transactions on Speech and Audio Processing
, 1995
"... Footnote This article is based on a February, 1992workshop sponsored by the National Science ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Footnote This article is based on a February, 1992workshop sponsored by the National Science

