Results 1 - 10
of
62
A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues
- IN 34TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1996
"... This paper reports on corpus-based research into the relationship between intonational variation and discourse structure. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the na- ture of this relationship. ..."
Abstract
-
Cited by 88 (15 self)
- Add to MetaCart
This paper reports on corpus-based research into the relationship between intonational variation and discourse structure. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the na- ture of this relationship. We also compare the acoustic-prosodic features of initial, medial, and final utterances in a discourse segment.
A Semantics of Contrast and Information Structure for Specifying Intonation in Spoken Language Generation
, 1996
"... ..."
Cue phrase classification using machine learning
- Journal of Artificial Intelligence Research
, 1996
"... Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit di ..."
Abstract
-
Cited by 47 (2 self)
- Add to MetaCart
Cue phrases may be used in a discourse sense to explicitly signal discourse structure, but also in a sentential sense to convey semantic rather than structural information. Correctly classifying cue phrases as discourse or sentential is critical in natural language processing systems that exploit discourse structure, e.g., for performing tasks such as anaphora resolution and plan recognition. This paper explores the use of machine learning for classifying cue phrases as discourse or sentential. Two machine learning programs (cgrendel and C4.5) are used to induce classification models from sets of pre-classified cue phrases and their features in text and speech. Machine learning is shown to be an effective technique for not only automating the generation of classification models, but also for improving upon previous results. When compared to manually derived classification models already in the literature, the learned models often perform with higher accuracy and contain new linguistic insights into the data. In addition, the ability to automatically construct classification models makes it easier to comparatively analyze the utility of alternative feature representations of the data. Finally, the ease of retraining makes the learning approach more scalable and flexible than manual methods. 1.
Assigning Phrase Breaks from Part-of-Speech Sequences
- Computer Speech and Language
, 1998
"... This paper presents an algorithm for automatically assigning phrase breaks to unrestricted text for use in a text-to-speech synthesizer. Text is first converted into a sequence of part-of-speech tags. Next a Markov model is used to give the most likely sequence of phrase breaks for the input part-of ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
This paper presents an algorithm for automatically assigning phrase breaks to unrestricted text for use in a text-to-speech synthesizer. Text is first converted into a sequence of part-of-speech tags. Next a Markov model is used to give the most likely sequence of phrase breaks for the input part-of-speech tags. In the Markov model, states represent types of phrase break and the transitions between states represent the likelihoods of sequences of phrase types occurring. The paper reports a variety of experiments investigating part-of-speech tag-sets, Markov model structure and smoothing. The best setup correctly identifies 79 % of breaks in the test corpus. © 1998 Academic Press Limited 1.
Discourse Structure in Spoken Language: Studies on Speech Corpora
- In Working Notes of the AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation
, 1995
"... A better understanding of the intonational characteristics of spoken discourse may lead to new empirical techniques for identifying discourse structure from speech, as well as new algorithms for enhancing the naturalness of synthetic speech. This paper summarizes results of pilot studies that demons ..."
Abstract
-
Cited by 32 (9 self)
- Add to MetaCart
A better understanding of the intonational characteristics of spoken discourse may lead to new empirical techniques for identifying discourse structure from speech, as well as new algorithms for enhancing the naturalness of synthetic speech. This paper summarizes results of pilot studies that demonstrate reliable correlations of discourse and speech properties, and reports findings on a new corpus of direction-giving monologues, collected in both spontaneous and read speaking styles. Preliminary analyses of the direction-giving corpus show that the availability of speech significantly affects the reliability of discourse segmentation for a set of trained discourse labelers. Introduction This paper reports on ongoing corpus-based research on the intonational characteristics of spoken discourse in American English. The scientific goal of this research is to lay the foundations for a bootstrapping process, in which empirical evidence from spoken language informs us of strengths and weak...
Intonation and Dialogue Context as Constraints for Speech Recognition
- LANGUAGE AND SPEECH
, 1998
"... This paper describes a way of using intonation and dialogue context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialogue speech. This corpus has been tagged according to a ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
This paper describes a way of using intonation and dialogue context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialogue speech. This corpus has been tagged according to a dialogue analysis scheme that assigns each utterance to one of 12 "move types", such as "acknowledge", "query-yes/no" or "instruct". Most asr systems use a bigram language model to constrain the possible sequences of words that might be recognised. Here we use a separate bigram language model for each move type. We show that when the "correct" move-specific language model is used for each utterance in the test set, the word error rate of the recogniser drops. Of course
M = Syntax + Prosody: A syntactic-prosodic labelling scheme for large spontaneous speech databases
, 1998
"... ..."
Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers
- Department of Computer Science, University of Rochester
, 1997
"... Peter Heeman was born October 22, 1963, and much to his dismay his parents had already moved away from Toronto. Instead he was born in London Ontario, where he grew up on a strawberry farm. He attended the University of Waterloo where he re-ceived a Bachelors of Mathematics with a joint degree in Pu ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Peter Heeman was born October 22, 1963, and much to his dismay his parents had already moved away from Toronto. Instead he was born in London Ontario, where he grew up on a strawberry farm. He attended the University of Waterloo where he re-ceived a Bachelors of Mathematics with a joint degree in Pure Mathematics and Com-puter Science in the spring of 1987. After working two years for a software engineering company, which supposedly used artificial intelligence techniques to automate COBOL and CICS programming, Peter was ready for a change. What better way to wipe the slate clear than by going to graduate school at the University of Toronto, but not without first spending the sum-mer in Europe. After spending two months in countries where he couldn’t speak the language, Peter became fascinated by language, and so decided to give computational linguistics a try.
Consistency in Transcription and Labelling of German Intonation with GToBI
- in Proc. ICSLP
, 1996
"... A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber -consistency suggests that, with training, labellers can acquire sufficient skill with GToBI ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
A diverse set of speech data was labelled in three sites by 13 transcribers with differing levels of expertise, using GToBI, a consensus transcription system for German intonation. Overall inter-transcriber -consistency suggests that, with training, labellers can acquire sufficient skill with GToBI for large-scale database labelling. 1.

