Results 1 -
9 of
9
Automatic linguistic segmentation of conversational speech
- Proc. ICSLP
, 1996
"... As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of tr ..."
Abstract
-
Cited by 74 (19 self)
- Add to MetaCart
As speech recognition moves toward more unconstrained domains such as conversational speech, we encounter a need to be able to segment (or resegment) waveforms and recognizer output into linguistically meaningful units, such a sentences. Toward this end, we present a simple automatic segmenter of transcripts based on N-gram language modeling. We also study the relevance of several word-level features for segmentation performance. Using only word-level information, we achieve 85 % recall and 70 % precision on linguistic boundary detection. 1.
A prosody-only decision-tree model for disfluency detection
- Proc. EUROSPEECH
, 1997
"... Speech disfluencies (filled pauses, repetitions, repairs, and false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for effective natural language understanding, as well as to improve speech models in general. Previous approaches ..."
Abstract
-
Cited by 45 (14 self)
- Add to MetaCart
Speech disfluencies (filled pauses, repetitions, repairs, and false starts) are pervasive in spontaneous speech. The ability to detect and correct disfluencies automatically is important for effective natural language understanding, as well as to improve speech models in general. Previous approaches to disfluency detection have relied heavily on lexical information, which makes them less applicable when word recognition is unreliable. We have developed a disfluency detection method using decision tree classifiers that use only local and automatically extracted prosodic features. Because the model doesn’t rely on lexical information, it is widely applicable even when word recognition is unreliable. The model performed significantly better than chance at detecting four disfluency types. It also outperformed a language model in the detection of false starts, given the correct transcription. Combining the prosody model with a specialized language model improved accuracy over either model alone for the detection of false starts. Results suggest that a prosody-only model can aid the automatic detection of disfluencies in spontaneous speech. 1.
Automatic Detection Of Sentence Boundaries And Disfluencies Based On Recognized Words
, 1998
"... We study the problem of detecting linguistic events at interword boundaries, such as sentence boundaries and disfluency locations, in speech transcribed by an automatic recognizer. Recovering such events is crucial to facilitate speech understanding and other natural language processing tasks. Our a ..."
Abstract
-
Cited by 35 (13 self)
- Add to MetaCart
We study the problem of detecting linguistic events at interword boundaries, such as sentence boundaries and disfluency locations, in speech transcribed by an automatic recognizer. Recovering such events is crucial to facilitate speech understanding and other natural language processing tasks. Our approach is based on a combination of prosodic cues modeled by decision trees, and word-based event N-gram language models. Several model combination approaches are investigated. The techniques are evaluated on conversational speech from the Switchboard corpus. Model combination is shown to give a significant win over individual knowledge sources. 1. INTRODUCTION Current automatic speech recognition systems output a string of words. Most natural language understanding systems, however, require structural information such as punctuation, which is present in text but not overtly indicated in spoken language. Similarly, for speech understanding and information extraction, it is important to fi...
Detecting structural metadata with decision trees and transformation-based learning
- in Proc. of HLT/NAACL, 2004
, 2004
"... The regular occurrence of disfluencies is a distinguishing characteristic of spontaneous speech. Detecting and removing such disfluencies can substantially improve the usefulness of spontaneous speech transcripts. This paper presents a system that detects various types of disfluencies and other stru ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
The regular occurrence of disfluencies is a distinguishing characteristic of spontaneous speech. Detecting and removing such disfluencies can substantially improve the usefulness of spontaneous speech transcripts. This paper presents a system that detects various types of disfluencies and other structural information with cues obtained from lexical and prosodic information sources. Specifically, combinations of decision trees and language models are used to predict sentence ends and interruption points and, given these events, transformationbased learning is used to detect edit disfluencies and conversational fillers. Results are reported on human and automatic transcripts of conversational telephone speech. 1
Modeling Linguistic Segment And Turn Boundaries For N-Best Rescoring Of Spontaneous Speech
- Proc. EUROSPEECH
, 1997
"... Language modeling, especially for spontaneous speech, often suffers from a mismatch of utterance segmentations between training and test conditions. In particular, training often uses linguistically-based segments, whereas testing occurs on acoustically determined segments, resulting in degraded per ..."
Abstract
-
Cited by 12 (2 self)
- Add to MetaCart
Language modeling, especially for spontaneous speech, often suffers from a mismatch of utterance segmentations between training and test conditions. In particular, training often uses linguistically-based segments, whereas testing occurs on acoustically determined segments, resulting in degraded performance. We present an N-best rescoring algorithm that removes the effect of segmentation mismatch. Furthermore, we show that explicit language modeling of hidden linguistic segment boundaries is improved by including turn-boundary events in the model. 1. THE SEGMENTATION PROBLEM IN LANGUAGE MODELING One of the problems encountered in speech recognition on continuous, spontaneous speech is the segmentation of long waveforms. Because current recognizers prefer short waveform segments for best performance and to limit computational resources, conversation-length waveforms are typically pre-segmented using simple acoustic criteria, such as locations of long pauses and turn switches. This crea...
Dependency Language Modeling
, 1997
"... This report summarizes the work of the Dependency Language Modeling group at the 1996 Summer Speech Workshop at the Center for Language and Speech Processing at Johns Hopkins University (WS96). We motivate and descibe a novel statistical language model that models the syntactic dependencies between ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This report summarizes the work of the Dependency Language Modeling group at the 1996 Summer Speech Workshop at the Center for Language and Speech Processing at Johns Hopkins University (WS96). We motivate and descibe a novel statistical language model that models the syntactic dependencies between words. The model is formulated in the maximum entropy framework, which expresses statistical constraints on the frequencies of various type of dependencies, as well the standard N-gram statistics. We describe how this model was applied to the recognition of spontaneous English speech from the Switchboard corpus. Due to implementation constraints, only a reduced version of our model could be tested so far. The model gave a modest improvement over an N-gram baseline model. A by-product of the project is the Maximim Entropy Modeling Toolkit (MEMT), a freely available software package for domain-independent maximum entropy modeling. 1 Introduction Current state-of-the-art language models for s...
Parse structure and segmentation for improving speech recognition
- In Proc. of SLT
, 2006
"... Separate avenues of prior work have shown that parsing language models lead to improved recognition performance, and that segmentation of speech into sentence-like units has an impact on parser performance. This paper brings these two findings together, showing that segmentation also impacts the qua ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Separate avenues of prior work have shown that parsing language models lead to improved recognition performance, and that segmentation of speech into sentence-like units has an impact on parser performance. This paper brings these two findings together, showing that segmentation also impacts the quality of a syntax-based language model, such that larger reductions in word error rate are possible when using sentencelike segmentations rather than simple paused-based strategies. Further, we show that the same types of syntactic features used in parse reranking can also be used to reduce word error rate in an N-best rescoring framework. Index Terms — natural languages, speech recognition 1.
How far do speakers back up in repairs? a quantitative model
- In Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Speakers frequently retrace one or more words when continuing after a break in fluency. Syntactic principles constrain the points from which speakers retrace; however syntactic principles do not provide predictions about the relative usage of different allowable retrace points. Such predictions are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Speakers frequently retrace one or more words when continuing after a break in fluency. Syntactic principles constrain the points from which speakers retrace; however syntactic principles do not provide predictions about the relative usage of different allowable retrace points. Such predictions are useful for automatic processing of repairs in speech technology, particularly if they use information readily available to a speech recognizer. We propose a quantitative model that predicts the overall distribution of retrace lengths in a large corpus of spontaneous speech, based only on word position. The model has two components: (1) a constant, position-independent probability for extending a retrace by one more word; and (2) a position-dependent probability to “skip” to the beginning of the sentence. Results have implications for modeling repairs in speech applications and constrain explanatory models in psycholinguistics. 1.
Extracting clauses for spoken language understanding in conversational systems
- In Proc. internat. conf. spoken
, 2002
"... Spontaneous human utterances in the context of human-human and human-machine dialogs are rampant with dysfluencies, and speech repairs. Furthermore, when recognized using a speech recognizer, these utterances produce a sequence of words with no identification of clausal units. Such long strings of w ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Spontaneous human utterances in the context of human-human and human-machine dialogs are rampant with dysfluencies, and speech repairs. Furthermore, when recognized using a speech recognizer, these utterances produce a sequence of words with no identification of clausal units. Such long strings of words combined with speech errors pose a difficult problem for spoken language parsing and understanding. In this paper, we address the issue of editing speech repairs as well as segmenting user utterances into clause units with a view of parsing and understanding spoken language utterances. We present generative and discriminative models for this task and present evaluation results on the human-human conversations obtained from the Switchboard corpus. 1.

