• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Modeling Linguistic Segment And Turn Boundaries For N-Best Rescoring Of Spontaneous Speech (1997)

by Andreas Stolcke
Venue:Proc. EUROSPEECH
Add To MetaCart

Tools

Sorted by:
Results 1 - 9 of 9

SRILM—An extensible language modeling toolkit

by Andreas Stolcke - In Proceedings of the 7th International Conference on Spoken Language Processing (ICSLP 2002 , 2002
"... SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation ..."
Abstract - Cited by 449 (13 self) - Add to MetaCart
SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools. 1.

Automatic Detection Of Sentence Boundaries And Disfluencies Based On Recognized Words

by Andreas Stolcke, Elizabeth Shriberg, Rebecca Bates, Mari Ostendorf, Dilek Hakkani, Madelaine Plauche, Gokhan Tur, Yu Lu , 1998
"... We study the problem of detecting linguistic events at interword boundaries, such as sentence boundaries and disfluency locations, in speech transcribed by an automatic recognizer. Recovering such events is crucial to facilitate speech understanding and other natural language processing tasks. Our a ..."
Abstract - Cited by 35 (13 self) - Add to MetaCart
We study the problem of detecting linguistic events at interword boundaries, such as sentence boundaries and disfluency locations, in speech transcribed by an automatic recognizer. Recovering such events is crucial to facilitate speech understanding and other natural language processing tasks. Our approach is based on a combination of prosodic cues modeled by decision trees, and word-based event N-gram language models. Several model combination approaches are investigated. The techniques are evaluated on conversational speech from the Switchboard corpus. Model combination is shown to give a significant win over individual knowledge sources. 1. INTRODUCTION Current automatic speech recognition systems output a string of words. Most natural language understanding systems, however, require structural information such as punctuation, which is present in text but not overtly indicated in spoken language. Similarly, for speech understanding and information extraction, it is important to fi...

Switchboard Discourse Language Modeling Project (Final Report)

by Daniel Jurafsky, Carol Van Ess-dykema (dod , 1997
"... We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 `Dialog Acts' (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) ..."
Abstract - Cited by 30 (7 self) - Add to MetaCart
We describe a new approach for statistical modeling and detection of discourse structure for natural conversational speech. Our model is based on 42 `Dialog Acts' (DAs), (question, answer, backchannel, agreement, disagreement, apology, etc). We labeled 1155 conversations from the Switchboard (SWBD) database (Godfrey et al. 1992) of human-to-human telephone conversations with these 42 types and trained a Dialog Act detector based on three distinct knowledge sources: sequences of words which characterize a dialog act, prosodic features which characterize a dialog act, and a statistical Discourse Grammar. Our combined detector, although still in preliminary stages, already achieves a 65% Dialog Act detection rate based on acoustic waveforms, and 72% accuracy based on word transcripts. Using this detector to switch among the 42 dialog-act-specific trigram LMs also gave us an encouraging but not statistically significant reduction in SWBD word error. 1 Introduction The ability to model and...

Automatic Detection Of Semantic Boundaries Based On Acoustic And Lexical Knowledge

by Mauro Cettolo, Daniele Falavigna - In ICSLP , 1998
"... In spoken language systems, the segmentation of utterances into coherent linguistic/semantic units is required when modules following the speech recognizer can only process such units one at a time. In this paper, techniques for semantic boundary prediction, based on both acoustic and lexical knowle ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
In spoken language systems, the segmentation of utterances into coherent linguistic/semantic units is required when modules following the speech recognizer can only process such units one at a time. In this paper, techniques for semantic boundary prediction, based on both acoustic and lexical knowledge, are presented and tested on a corpus of personto -person dialogues. Best result gives 62.8% recall and 71.8% precision. 1. INTRODUCTION In spoken language systems, the minimal unit of analysis does not necessarily correspond to a full sentence. A possible approach for language processing is that of splitting a given sentence in a sequence of units that can be successively processed by linguistic modules one at a time. The goal of the Semantic Boundary (SB) detector is to locate boundaries inside a sentence in order to obtain such "minimal units". Useful information for SB detection can be extracted either from the waveform of an utterance or from its corresponding word sequence. Some ...

Prosody and phonetic variability: Lessons learned from acoustic model clustering

by Izhak Shafran, Mari Ostendorf, Richard Wright - in Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding , 2001
"... Most research on the use of prosody in automatic speech processing has focused on F0, energy and duration correlates to prosodic structure. However, there are multiple sources of evidence suggesting that there are spectral correlates as well. This paper presents an analysis of prosodically labeled c ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
Most research on the use of prosody in automatic speech processing has focused on F0, energy and duration correlates to prosodic structure. However, there are multiple sources of evidence suggesting that there are spectral correlates as well. This paper presents an analysis of prosodically labeled conversational speech data using acoustic parameters and clustering techniques that are standard in speech recognition. We find acoustic differences primarily associated with segment position at prosodic constituent onsets and at prominent syllables. Importantly, phones at fluent vs. disfluent boundaries are frequently placed in different clusters. These differences can be leveraged in a "multiple pronunciation" acoustic model to aid in detecting fluent vs. disfluent prosodic boundaries, and potentially for improving recognition accuracy.

Summarization of Spoken Language -- Challenges, Methods, and Prospects

by Klaus Zechner - JANUARY 2002 JAMIL ANWAR, M.M.AWAIS, SHAHID MASUD, AND SHAFAY SHAMAIL AUTOMATIC ARABIC SPEECH SEGMENTATION SYSTEM , 2002
"... ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
Abstract not found

Parse structure and segmentation for improving speech recognition

by William P. Mcneill, Jeremy G. Kahn, Dustin L. Hillard, Mari Ostendorf - In Proc. of SLT , 2006
"... Separate avenues of prior work have shown that parsing language models lead to improved recognition performance, and that segmentation of speech into sentence-like units has an impact on parser performance. This paper brings these two findings together, showing that segmentation also impacts the qua ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Separate avenues of prior work have shown that parsing language models lead to improved recognition performance, and that segmentation of speech into sentence-like units has an impact on parser performance. This paper brings these two findings together, showing that segmentation also impacts the quality of a syntax-based language model, such that larger reductions in word error rate are possible when using sentencelike segmentations rather than simple paused-based strategies. Further, we show that the same types of syntactic features used in parse reranking can also be used to reduce word error rate in an N-best rescoring framework. Index Terms — natural languages, speech recognition 1.

Automatic Detection of Contrast for Speech Understanding

by Tong Zhang, Mark Hasegawa-johnson, Stephen E. Levinson - Proceedings of ICSLP 2004, Jeju Island, South Korea , 2004
"... Contrast is a very popular phenomenon in spoken language, and carries very important information to help understanding contents and structures of spoken language. In this paper, we propose an idea of automatic contrast detection as an effort for better speech understanding. We study the automatic ta ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Contrast is a very popular phenomenon in spoken language, and carries very important information to help understanding contents and structures of spoken language. In this paper, we propose an idea of automatic contrast detection as an effort for better speech understanding. We study the automatic tagging of three specific types of contrast: symmetric contrast, contrastive focus, and contrastive topic. We label the three types of contrasted words as contrast (C), and other words as noncontrast (¬C). The classification of contrast events is based on prosodic, spectral, and part-of-speech (POS) information sources. The integration of different knowledge sources is realized by a time-delay recursive neural network (TDRNN). The approach we proposed was testified on 235 spontaneous utterances consisting of 3500 words (samples). The contrast detection was speaker independent. The tests yielded an average of 87.9 % classification rate. 1.

Automatic Sentence Structure Annotation for Spoken Language Processing

by Dustin Lundring Hillard , 2008
"... Increasing amounts of easily available electronic data are precipitating a need for automatic processing that can aid humans in digesting large amounts of data. Speech and video are becoming an increasingly significant portion of on-line information, from news and television broadcasts, to oral hist ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Increasing amounts of easily available electronic data are precipitating a need for automatic processing that can aid humans in digesting large amounts of data. Speech and video are becoming an increasingly significant portion of on-line information, from news and television broadcasts, to oral histories, on-line lectures, or user generated content. Automatic processing of audio and video sources requires automatic speech recognition (ASR) in order to provide transcripts. Typical ASR generates only words, without punctuation, capitalization, or further structure. Many techniques available from natural language processing therefore suffer when applied to speech recognition output, because they assume the presence of reliable punctuation and structure. In addition, errors from automatic transcription also degrade the performance of downstream processing such as machine translation, name detection, or information retrieval. We develop approaches for automatically annotating structure in speech, including sentence and sub-sentence segmentation, and then turn towards optimizing ASR and annotation for downstream applications.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University