Results 1 - 10
of
33
A Prosodic Analysis of Discourse Segments in Direction-Giving Monologues
- IN 34TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1996
"... This paper reports on corpus-based research into the relationship between intonational variation and discourse structure. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the na- ture of this relationship. ..."
Abstract
-
Cited by 88 (15 self)
- Add to MetaCart
This paper reports on corpus-based research into the relationship between intonational variation and discourse structure. We examine the effects of speaking style (read versus spontaneous) and of discourse segmentation method (text-alone versus text-and-speech) on the na- ture of this relationship. We also compare the acoustic-prosodic features of initial, medial, and final utterances in a discourse segment.
SpeechSkimmer: A System for Interactively Skimming Recorded Speech
- ACM Transactions on Computer Human Interaction
, 1997
"... Note that the text that appeared in printed journal contains very minor typographic and grammatical corrections that do not appear in this version. SpeechSkimmer: ..."
Abstract
-
Cited by 85 (1 self)
- Add to MetaCart
Note that the text that appeared in printed journal contains very minor typographic and grammatical corrections that do not appear in this version. SpeechSkimmer:
Some Intonational Characteristics Of Discourse Structure
- In Proceedings of the International Conference on Spoken Language Processing
, 1992
"... This paper reports on a study of the relationship between acoustic-prosodic variation and discourse structure, as determined from an independent model of discourse. We present results of two pilot studies. Our corpus consisted of three AP news stories recorded by a professional speaker. Discourse st ..."
Abstract
-
Cited by 80 (14 self)
- Add to MetaCart
This paper reports on a study of the relationship between acoustic-prosodic variation and discourse structure, as determined from an independent model of discourse. We present results of two pilot studies. Our corpus consisted of three AP news stories recorded by a professional speaker. Discourse structure was labeled by subjects either from text alone or from text (with all orthographic markings except sentence-final punctuation removed) and speech, following Grosz & Sidner 1986; average inter-labeler agreement for structural elements varied from 74.3%-95.1%, depending upon feature. These elements of global structure, together with elements of local structure such as parentheticals and attributive tags, were correlated with variation in intonational and acoustic features such as pitch range, contour, timing, and amplitude. We found statistically significant associations between aspects of pitch range, amplitude, and timing with features of global and local structure both for labelings...
Auto-Summarization of Audio-Video Presentations
, 1999
"... As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video s ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video summaries; that is, a shorter version assembled by picking important segments from the original. We evaluate three techniques for automatic creation of summaries for online audio-video presentations. These techniques exploit information in the audio signal (e.g., pitch and pause information), knowledge of slide transition points in the presentation, and information about access patterns of previous users. We report a user study that compares automatically generated summaries that are 20%- 25% the length of full presentations to author generated summaries. Users learn from the computer-generated summaries, although less than from authors' summaries. They initially find computer-generated summ...
The Audio Notebook - Paper and Pen Interaction with Structured Speech
, 2001
"... This paper addresses the problem that a listener experiences when attempting to capture information presented during a lecture, meeting, or interview. Listeners must divide their attention between the talker and their notetaking activity. We propose a new device -- the Audio Notebook -- for taking n ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper addresses the problem that a listener experiences when attempting to capture information presented during a lecture, meeting, or interview. Listeners must divide their attention between the talker and their notetaking activity. We propose a new device -- the Audio Notebook -- for taking notes and interacting with a speech recording. The Audio Notebook is a combination of a digital audio recorder and paper notebook, all in one device. Audio recordings are structured using two techniques: user structuring based on notetaking activity, and acoustic structuring based on a talker's changes in pitch, pausing, and energy. A field study showed that the interaction techniques enabled a range of usage styles, from detailed review to high speed skimming. The study motivated the addition of phrase detection and topic suggestions to improve access to the audio recordings. Through these audio interaction techniques, the Audio Notebook defines a new approach for navigation in the audio domain.
Combining Multiple Knowledge Sources for Discourse Segmentation
- IN PROCEEDINGS OF THE 33RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS
, 1995
"... We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
We predict discourse segment boundaries from linguistic features of utterances, using a corpus of spoken narratives as data. We present two methods for developing segmentation algorithms from training data: hand tuning and machine learning. When multiple types of features are used, results approach human performance on an independent test set (both methods), and using cross-validation (machine learning).
Intention-Based Segmentation: Human Reliability And Correlation With Linguistic Cues
- In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics
, 1993
"... Certain spans of utterances in a discourse, referred to here as segments, are widely assumed'to form coherent units. Further, the segmental structure of discourse has been claimed to constrain and be constrained by many phenomena. However, there is weak consensus on the nature of segments and the cr ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
Certain spans of utterances in a discourse, referred to here as segments, are widely assumed'to form coherent units. Further, the segmental structure of discourse has been claimed to constrain and be constrained by many phenomena. However, there is weak consensus on the nature of segments and the criteria for recognizing or generating them. We present quantitative results of a two part study using a corpus of spontaneous, narrative monologues. The first part evaluates the statistical reliability of human segmentation of our corpus, where speaker intention is the segmentation criterion. We then use the subjects' segmentations to evaluate the corre- lation of discourse segmentation with three linguistic cues (referential noun phrases, cue words, and pauses), using information retrieval metrics.
The challenge of spoken language systems: Research directions for the nineties
- IEEE Transactions on Speech and Audio Processing
, 1995
"... Footnote This article is based on a February, 1992workshop sponsored by the National Science ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
Footnote This article is based on a February, 1992workshop sponsored by the National Science
Retrieval Effectiveness of an Ontology-Based Model for Information Selection
- The VLDB Journal
, 2004
"... Technology in the field of digital media generates huge amounts of non-textual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and userfriendly re ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
Technology in the field of digital media generates huge amounts of non-textual information, audio, video, and images, along with more familiar textual information. The potential for exchange and retrieval of information is vast and daunting. The key problem in achieving efficient and userfriendly retrieval is the development of a search mechanism to guarantee delivery of minimal irrelevant information (high precision) while insuring relevant information is not overlooked (high recall). The traditional solution employs keyword-based search. The only documents retrieved are

