Results 1 - 10
of
44
Edit Detection and Parsing for Transcribed Speech
- In Proc. NAACL
, 2001
"... We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on edited words of 2.2%. (The NULL-model, which marks everything as not edited, has an error rate of 5.9%.) To evaluate our parsing results we introduce a new evaluation metric, the purpose of which is to make evaluation of a parse tree relatively indi#erent to the exact tree position of EDITED nodes. By this metric the parser achieves 85.3% precision and 86.5% recall.
Automatic summarization of open-domain multiparty dialogues in diverse genres
- Computational Linguistics
, 2002
"... Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different gen ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
Automatic summarization of open-domain spoken dialogues is a relatively new research area. This article introduces the task and the challenges involved and motivates and presents an approach for obtaining automatic-extract summaries for human transcripts of multiparty dialogues of four different genres, without any restriction on domain. We address the following issues, which are intrinsic to spoken-dialogue summarization and typically can be ignored when summarizing written text such as news wire data: (1) detection and removal of speech disfluencies; (2) detection and insertion of sentence boundaries; and (3) detection and linking of cross-speaker information units (question-answer pairs). A system evaluation is performed using a corpus of 23 dialogue excerpts with an average duration of about 10 minutes, comprising 80 topical segments and about 47,000 words total. The corpus was manually annotated for relevant text spans by six human annotators. The global evaluation shows that for the two more informal genres, our summarization system using dialoguespecific components significantly outperforms two baselines: (1) a maximum-marginal-relevance ranking algorithm using TF*IDF term weighting, and (2) a LEAD baseline that extracts the first n words from a text. 1.
A TAG-based noisy channel model of speech repairs
- IN PROC. ASSOC. FOR COMPUTATIONAL LINGUISTICS
, 2004
"... This paper describes a noisy channel model of speech repairs, which can identify and correct repairs in speech transcripts. A syntactic parser is used as the source model, and a novel type of TAG-based transducer is the channel model. The use of TAG is motivated by the intuition that the reparandum ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
This paper describes a noisy channel model of speech repairs, which can identify and correct repairs in speech transcripts. A syntactic parser is used as the source model, and a novel type of TAG-based transducer is the channel model. The use of TAG is motivated by the intuition that the reparandum is a "rough copy" of the repair. The model is trained and tested on the Switchboard disfluency-annotated corpus.
Automatic disfluency identification in conversational speech using multiple knowledge sources
- In Proc. Eurospeech
, 2003
"... Disfluencies occur frequently in spontaneous speech. Detection and correction of disfluencies can make automatic speech recognition transcripts more readable for human readers, and can aid downstream processing by machine. This work investigates a number of knowledge sources for disfluency detection ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Disfluencies occur frequently in spontaneous speech. Detection and correction of disfluencies can make automatic speech recognition transcripts more readable for human readers, and can aid downstream processing by machine. This work investigates a number of knowledge sources for disfluency detection, including acoustic-prosodic features, a language model (LM) to account for repetition patterns, a part-of-speech (POS) based LM, and rule-based knowledge. Different components are designed for different purposes in the system. Results show that detection of disfluency interruption points is best achieved by a combination of prosodic cues, word-based cues, and POS-based cues. The onset of a disfluency to be removed, in contrast, is best found using knowledge-based rules. Finally, specific disfluency types can be aided by the modeling of word patterns. 1.
To `errrr' is human: ecology and acoustics of speech disfluencies
- JOURNAL OF THE INTERNATIONAL PHONETIC ASSOCIATION
, 2001
"... ... This paper aims to promote `disuency awareness' especially in the field of phonetics which has much to offer in the way of increasing our understanding of these phenomena. Two broad claims are made, based on analyses of disfluencies in different corpora of spontaneous American English speech. Fi ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
... This paper aims to promote `disuency awareness' especially in the field of phonetics which has much to offer in the way of increasing our understanding of these phenomena. Two broad claims are made, based on analyses of disfluencies in different corpora of spontaneous American English speech. First, an Ecology Claim suggests that disfluencies are related to aspects of the speaking environments in which they arise. The claim is supported by evidence from task effects, location analyses, speaker effects and sociolinguistic effects. Second, an Acoustics Claim argues that disfluency has consequences for phonetic and prosodic aspects of speech that are not represented in the speech patterns of laboratory speech. Such effects include modifications in segment durations, intonation, voice quality, vowel quality and coarticulation patterns. The ecological and acoustic evidence provide insights about human language production in real-world contexts. Such evidence can also guide methods for the processing of spontaneous speech in automatic speech recognition applications
Spontaneous speech: How people really talk and why engineers should care
- in Proc. European Conf. on Speech Communication and Technology (Eurospeech
, 2005
"... Spontaneous conversation is optimized for human-human communication, but differs in some important ways from the types of speech for which human language technology is often developed. This overview describes four fundamental properties of spontaneousspeech that present challenges for spoken languag ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Spontaneous conversation is optimized for human-human communication, but differs in some important ways from the types of speech for which human language technology is often developed. This overview describes four fundamental properties of spontaneousspeech that present challenges for spoken language applications because they violate assumptions often applied in automatic processing technology. 1.
Automatic Detection of Nonreferential It in Spoken Multi-Party Dialog
, 2006
"... We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog.
A lexically-driven algorithm for disfluency detection
- in Proc. of HLT/NAACL
, 2004
"... This paper describes a transformation-based learning approach to disfluency detection in speech transcripts using primarily lexical features. Our method produces comparable results to two other systems that make heavy use of prosodic features, thus demonstrating that reasonable performance can be ac ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
This paper describes a transformation-based learning approach to disfluency detection in speech transcripts using primarily lexical features. Our method produces comparable results to two other systems that make heavy use of prosodic features, thus demonstrating that reasonable performance can be achieved without extensive prosodic cues. In addition, we show that it is possible to facilitate the identification of less frequently disfluent discourse markers by taking speaker style into account. 1
Resolving It, This, and That in Unrestricted Multi-Party Dialog
"... We present an implemented system for the resolution of it, this, and that in transcribed multi-party dialog. The system handles NP-anaphoric as well as discoursedeictic anaphors, i.e. pronouns with VP antecedents. Selectional preferences for NP or VP antecedents are determined on the basis of corpus ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
We present an implemented system for the resolution of it, this, and that in transcribed multi-party dialog. The system handles NP-anaphoric as well as discoursedeictic anaphors, i.e. pronouns with VP antecedents. Selectional preferences for NP or VP antecedents are determined on the basis of corpus counts. Our results show that the system performs significantly better than a recency-based baseline.

