Results 1 - 10
of
25
Prosody-based automatic segmentation of speech into sentences and topics
- SPEECH COMMUNICATION
, 2000
"... A crucial step in processing speech audio data for information-extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are abse ..."
Abstract
-
Cited by 137 (41 self)
- Add to MetaCart
A crucial step in processing speech audio data for information-extraction, topic detection, or browsing/playback is to segment the input into sentence and topic units. Speech segmentation is challenging, since the cues typically present for segmenting text (headers, paragraphs, punctuation) are absent in spoken language. We investigate the use of prosody (informationgleaned from the timing and melody of speech) for these tasks. Using decision tree and hidden Markov modeling techniques, we combine prosodic cues with word-based approaches, and evaluate performance on two speech corpora, Broadcast News and Switchboard. Results show that the prosodic model alone performs on par with, or better than, word-based statistical language models—for both true and automatically recognized words in news speech. The prosodic model achieves comparable performance with significantly less training data, and requires no hand-labeling of prosodic events. Across tasks and corpora, we obtain a significant improvement over word-only models using a probabilistic combination of prosodic and lexical information. Inspection reveals that the prosodic models capture language-independent boundary indicators described in the literature. Finally, cue usage is task and corpus dependent. For example, pause and pitch features are highly informative for segmenting news speech, whereas pause, duration and word-based cues dominate for natural conversation.
Discourse segmentation of multi-party conversation
- in 41st Annual Meeting of ACL
, 2003
"... We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation al ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
We present a domain-independent topic segmentation algorithm for multi-party speech. Our feature-based algorithm combines knowledge about content using a text-based algorithm as a feature and about form using linguistic and acoustic cues about topic shifts extracted from speech. This segmentation algorithm uses automatically induced decision rules to combine the different features. The embedded text-based algorithm builds on lexical cohesion and has performance comparable to state-of-the-art algorithms based on lexical information. A significant error reduction is obtained by combining the two knowledge sources. 1
Meeting Structure Annotation: Data and Tools
- In Proceedings of the SIGdial Workshop on Discourse and Dialogue
, 2005
"... We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
We present a set of annotations of hierarchical topic segmentations and action item subdialogues collected over 65 meetings from the ICSI and ISL meeting corpora, designed to support automatic meeting understanding and analysis. We describe an architecture for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and an audiovisual toolkit which facilitates browsing and annotating discourse, as well as visualizing and adjusting features for machine learning tasks. 1
Prosody modeling for automatic speech recognition and understanding
- in Proc. Workshop on Mathematical Foundations of Natural Language Modeling
, 2002
"... Abstract. This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word- ..."
Abstract
-
Cited by 17 (2 self)
- Add to MetaCart
Abstract. This paper summarizes statistical modeling approaches for the use of prosody (the rhythm and melody of speech) in automatic recognition and understanding of speech. We outline effective prosodic feature extraction, model architectures, and techniques to combine prosodic with lexical (word-based) information. We then survey a number of applications of the framework, and give results for automatic sentence segmentation and disfluency detection, topic segmentation, dialog act labeling, and word recognition. Key words. Prosody, speech recognition and understanding, hidden Markov models. 1. Introduction. Prosody
Context in Multi-lingual Tone and Pitch Accent Recognition
"... Tone and intonation play a crucial role across many languages. However, the use and structure of tone varies widely, ranging from lexical tone which determines word identity to pitch accent signalling information status. In this paper, we employ a uniform representation of acoustic features for reco ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Tone and intonation play a crucial role across many languages. However, the use and structure of tone varies widely, ranging from lexical tone which determines word identity to pitch accent signalling information status. In this paper, we employ a uniform representation of acoustic features for recognition of both Mandarin tone and English pitch accent. The representation captures both local tone height and shape as well as contextual coarticulatory and phrasal influences. By exploiting multiclass Support Vector Machines as a discriminative classifier, we achieve competitive rates of tone and pitch accent recognition. We further demonstrate the greater importance of modeling preceding local context, which yields up to 24 % reduction in error over modeling the following context.
Prosody modeling for automatic speech understanding: an overview of recent research at SRI
- In Proc. ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and Understanding
, 2001
"... Prosody has long been studied as an important knowledge source for speech understanding. In recent years there has been a large amount of computational work aimed at prosodic ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Prosody has long been studied as an important knowledge source for speech understanding. In recent years there has been a large amount of computational work aimed at prosodic
Maximum Entropy Segmentation of Broadcast News
- in Proceedings of ICASSP 2005
, 2005
"... This paper presents an automatic system for structuring and preparing a news broadcast for applications such as speech summarization, browsing, archiving and information retrieval. This process comprises transcribing the audio using an automatic speech recognizer and subsequently segmenting the text ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper presents an automatic system for structuring and preparing a news broadcast for applications such as speech summarization, browsing, archiving and information retrieval. This process comprises transcribing the audio using an automatic speech recognizer and subsequently segmenting the text into utterances and topics. A maximum entropy approach is used to build statistical models for both utterance and topic segmentation. The experimental work addresses the effect on performance of the topic boundary detector of three factors: the information sources used, the quality of the ASR transcripts, and the quality of the utterance boundary detector. The results show that the topic segmentation is not affected severely by transcripts errors, whereas errors in the utterance segmentation are more devastating.
Prosody-based Topic Segmentation for Mandarin Broadcast News
"... Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic, acoustic feature-based segmentation of other languages. In this paper, we focus on prosody-based topic segmentation of Mandarin Chinese. As a tone language, Mandarin presents special challenges for applicability of intonation-based techniques, since the pitch contour is also used to establish lexical identity. We demonstrate that intonational cues such as reduction in pitch and intensity at topic boundaries and increase in duration and pause still provide significant contrasts in Mandarin Chinese. We also build a decision tree classifier that, based only on word and local context prosodic information without reference to term similarity, cue phrase, or sentence-level information, achieves boundary classification accuracy of 89-95.8 % on a large standard test set.
MEETING STRUCTURE ANNOTATION -- Annotations Collected with a General Purpose Toolkit
"... We describe a generic set of tools for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and NOMOS – a flexible and extensible toolkit for browsing and annotating discourse. We describe applicatio ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We describe a generic set of tools for representing, annotating, and analyzing multi-party discourse, including: an ontology of multimodal discourse, a programming interface for that ontology, and NOMOS – a flexible and extensible toolkit for browsing and annotating discourse. We describe applications built using the NOMOS framework to facilitate a real annotation task, as well as for visualizing and adjusting features for machine learning tasks. We then present a set of of hierarchical topic segmentations and action item subdialogues collected over 56 meetings from the ICSI and ISL meeting corpora using our tools. These annotations are designed to support research towards automatic meeting understanding.
Prosody Models for Conversational Speech Recognition
, 2002
"... This paper describes a formal model for incorporating prosody in the speech recognition process, both for improving word recognition directly and for jointly recognizing words and underlying structure. The model includes the possibility of using an intermediate symbolic representation as well as dir ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper describes a formal model for incorporating prosody in the speech recognition process, both for improving word recognition directly and for jointly recognizing words and underlying structure. The model includes the possibility of using an intermediate symbolic representation as well as direct conditioning on acoustic correlates. Alternatives for feature extraction are described, together with implications for statistical modeling. Examples of prosody conditioning in spontaneous speech recognition include acoustic model clustering and dynamic pronunciation modeling.

