• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Speech repairs, intonational boundaries and discourse markers: Modeling speakers’ utterances in spoken dialog. Doctoral dissertation (1997)

by P A Heeman
Venue:In preparation
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Speech repairs, intonational phrases and discourse markers: modeling speakers’ utterances in spoken dialogue

by Peter A. Heeman, James F. Allen - Computational Linguistics , 1999
"... Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Eve ..."
Abstract - Cited by 61 (9 self) - Add to MetaCart
Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker’s intended utterances: both segmenting a speaker’s turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly intertwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72 % of turn-internal intonational boundaries with a precision of 71%, 97 % of discourse markers with 96 % precision, and detect and correct 66 % of repairs with 74 % precision.

Intonational Boundaries, Speech Repairs and Discourse Markers: Modeling Spoken Dialog

by Peter A. Heeman, James F. Allen , 1997
"... To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved earl ..."
Abstract - Cited by 29 (5 self) - Add to MetaCart
To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved early in the processing stream. We put forward a statistical language model that resolves these problems, does POS tagging, and can be used as the language model of a speech recognizer. We find that by accounting for the interactions between these tasks that the performance on each task improves, as does POS tagging and perplexity.

Identifying Discourse Markers in Spoken Dialog

by Peter A. Heeman, Donna Byron, James F. Allen , 1998
"... In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language modeling, discourse markers can be identified during speech recognition, in which the timeliness of the information can be used to help predict the following words. We contrast this approach with an alternative machine learning approach proposed by Litman (1996). This paper also argues that discourse markers can be used to help the hearer predict the role that the upcoming utterance plays in the dialog. Thus discourse markers should provide valuable evidence for automatic dialog act prediction. Introduction Discourse markers are a linguistic devise that speakers use to signal how the upcoming unit of speech or text relates to the current discourse state (Schiffrin 1987). Previous ...

POS Tagging versus Classes in Language Modeling

by Peter Heeman , 1998
"... Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS t ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
Language models for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. The use of POS tags allows more sophisticated generalizations than are afforded by using a class-based approach. Furthermore, if we want to incorporate speech repair and intonational phrase modeling into the language model, using POS tags rather than classes gives .bet- ter performance in this task.

Correction of Disfluencies in Spontaneous Speech using a Noisy-Channel Approach

by Matthias Honal, Tanja Schultz - in Proceedings of the 8th Eurospeech Conference , 2003
"... In this paper we present a system which automatically corrects disfluencies such as repairs and restarts typically occurring in spontaneously spoken speech. The system is based on a noisy-channel model and its development requires no linguistic knowledge, but only annotated texts. Therefore, it has ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
In this paper we present a system which automatically corrects disfluencies such as repairs and restarts typically occurring in spontaneously spoken speech. The system is based on a noisy-channel model and its development requires no linguistic knowledge, but only annotated texts. Therefore, it has large potential for rapid deployment and the adaptation to new target languages. The experiments were conducted on spontaneously spoken dialogs from the English VERBMOBIL corpus where a recall of 77.2% and a precision of 90.2% was obtained. To demonstrate the feasibility of rapid adaptation additional experiments on the spontaneous Mandarin Chinese CallHome corpus were performed achieving 49.4% recall and 76.8% precision.

POS Tags and Decision Trees for Language Modeling

by Peter A. Heeman - IN PROCEEDINGS OF THE JOINT SIGDAT CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND VERY LARGE CORPORA , 1999
"... Language model's for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags ..."
Abstract - Cited by 11 (2 self) - Add to MetaCart
Language model's for speech recognition concentrate solely on recognizing the words that were spoken. In this paper, we advocate redefining the speech recognition problem so that its goal is to find both the best sequence of words and their POS tags, and thus incorporate POS tagging. To use POS tags effectively, we use clustering and decision tree algorithms, which allow generalizations between POS tags and words to be effectively used in estimating the probability distributions. We show that our POS model gives.a reduction in word error rate and perplexity for the Trains corpus in comparison to word and class-based approaches. By using the Wall Street Journal corpus, we show that this approach scales up when more training data is available.

Incorporating POS Tagging into Language Modeling

by Peter A. Heeman - In Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech , 1997
"... Language models for speech recognition tend to concentrate solely on recognizing the words that were spoken. In this paper, we redefine the speech recognition problem so that its goal is to find both the best sequence of words and their syntactic role (part-of-speech) in the utterance. This is a nec ..."
Abstract - Cited by 10 (6 self) - Add to MetaCart
Language models for speech recognition tend to concentrate solely on recognizing the words that were spoken. In this paper, we redefine the speech recognition problem so that its goal is to find both the best sequence of words and their syntactic role (part-of-speech) in the utterance. This is a necessary first step towards tightening the interaction between speech recognition and natural language understanding. 1

Analyzing and Predicting Patterns of DAMSL Utterance Tags

by Mark G. Core - IN WORKING NOTES AAAI SPRING SYMPOSIUM ON APPLYING MACHINE LEARNING TO DISCOURSE PROCESSING , 1997
"... We have been annotating TRAINS dialogs with dialog acts in order to produce training data for a dialog act predictor, and to study how language is used in these dialogs. We are using DAMSL dialog acts which consist of 15 independent attributes. For the purposes of this paper, infrequent attribu ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
We have been annotating TRAINS dialogs with dialog acts in order to produce training data for a dialog act predictor, and to study how language is used in these dialogs. We are using DAMSL dialog acts which consist of 15 independent attributes. For the purposes of this paper, infrequent attributes such as Unintelligible and Self-Talk were set aside to concentrate on the eight major DAMSL tag sets. For #ve of these eight tag sets, hand constructed decision trees #based solely on the previous utterance's DAMSL tags# did better than always guessing the most frequentDAMSL tag values. This result suggests that it is possible to automatically build such decision trees especially if other sources of context are added. Our initial e#orts to address our second goal #studying language use in the TRAINS dialogs# consist of measuring DAMSL tag cooccurrences and bigrams. Some interesting patterns have emerged from this simple analysis such as the fact that signaling non-understandi...

Linguistic adaptations in spoken human-computer dialogues -- Empirical studies of user behavior

by Linda Bell , 2003
"... ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Abstract not found

Implementing Parser Metarules that Handle Speech Repairs and Other Disruptions

by Mark G. Core, Lenhart K. Schubert - In Proceedings of the 11th Annual International FLAIRS conference , 1998
"... Mixed-initiative dialogs often contain interruptions in phrase structure such as repairs and backchannel responses. Phrase structure as traditionally defined does not accommodate such phenomena, so it is not surprising that phrase structure parsers are ill-equipped to handle them. This paper present ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
Mixed-initiative dialogs often contain interruptions in phrase structure such as repairs and backchannel responses. Phrase structure as traditionally defined does not accommodate such phenomena, so it is not surprising that phrase structure parsers are ill-equipped to handle them. This paper presents metarules that specify how the instantiations of phrase structure rules may be restarted or interrupted, with allowance for interleaved speech. In the case of interleaved speech or backchannel responses, the metarules allow syntactically separate constituents to interleave or to straddle each other. In the case of repairs, the metarules operate on the reparandum (what is being repaired) and alteration (the correction) to build parallel phrase structure trees: one with the reparandum and one with the alteration. Consider the partial utterance, take the ban- um the oranges. The repair metarule would build two VPs, one being take the ban- and the other being take the oranges. The introductio...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University