Results 1 - 10
of
25
The Use Of Prosody In A Combined System For Punctuation Generation And Speech Recognition
- In Proc. EUROSPEECH
, 2001
"... In this paper, we discuss a combined system for punctuation generation and speech recognition. This system incorporates prosodic information with acoustic and language model information. Experiments are conducted for both the reference transcriptions and speech recogniser outputs. For the reference ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
In this paper, we discuss a combined system for punctuation generation and speech recognition. This system incorporates prosodic information with acoustic and language model information. Experiments are conducted for both the reference transcriptions and speech recogniser outputs. For the reference transcription case, prosodic information is shown to be more useful than language model information. When these information sources are combined, we can obtain an F-measure of up to 0.7830 for punctuation recognition. A few straightforward...
Punctuation annotation using statistical prosody models
- in Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding
, 2001
"... This paper is about the development of statistical models of prosodic features to generate linguistic meta-data for spoken language. In particular, we are concerned with automatically punctuating the output of a broadcast news speech recogniser. We present a statistical finite state model that combi ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
This paper is about the development of statistical models of prosodic features to generate linguistic meta-data for spoken language. In particular, we are concerned with automatically punctuating the output of a broadcast news speech recogniser. We present a statistical finite state model that combines prosodic, linguistic and punctuation class features. Experimental results are presented using the Hub–4 Broadcast News corpus, and in the light of our results we discuss the issue of a suitable method of evaluating the present task. 1.
G.: “Model Adaptation for Sentence Segmentation from Speech
- in Proc. IEEE/ACL Workshop on Spoken Language Technology (SLT
, 2006
"... This paper analyzes various methods to adapt sentence segmentation models trained on conversational telephone speech (CTS) to meeting style conversations. The sentence segmentation model trained using a large amount of CTS data is used to improve the performance when various amounts of meeting data ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
This paper analyzes various methods to adapt sentence segmentation models trained on conversational telephone speech (CTS) to meeting style conversations. The sentence segmentation model trained using a large amount of CTS data is used to improve the performance when various amounts of meeting data are available. We test the sentence segmentation performance on both reference and speech-to-text (STT) conditions on the ICSI MRDA Meeting Corpus using the Switchboard CTS Corpus as the out-of-domain data. Results show that the sentence segmentation performance is significantly improved by the adapted classification model compared to the one obtained by using in-domain data only, independently of the amount of in-domain data used: 17.5 % and 8.4 % relative error reductions with only 1,000 and 3,000 in-domain sentences, respectively, and 3.7 % relative error reduction with all in-domain data of 80,000 words. 1.
Automatic Headline Generation for Newspaper Stories
- IN THE PROCEEDINGS OF THE ACL WORKSHOP ON AUTOMATIC SUMMARIZATION/DOCUMENT UNDERSTANDING CONFERENCE (DUC
, 2002
"... In this paper we propose a novel application of Hidden Markov Models to automatic generation of informative headlines for English texts. We propose four decoding parameters to make the headlines appear more like Headlinese, the language of informative newspaper headlines. We also allow for morpholog ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
In this paper we propose a novel application of Hidden Markov Models to automatic generation of informative headlines for English texts. We propose four decoding parameters to make the headlines appear more like Headlinese, the language of informative newspaper headlines. We also allow for morphological variation in words between headline and story English. Informal and formal evaluations indicate that our approach produces informative headlines, mimicking a Headlinese style generated by humans.
Multi-stage compaction approach to broadcast news summarisation
- in Proceedings of Eurospeech 2005
, 2005
"... This paper presents a fully automatic, multi-stage compaction approach to broadcast news summarisation, targeting transcripts from automatic speech recognition (ASR) systems. It employs a network of multi-layer perceptrons to remove incorrectly transcribed words based on confidence scores, and to se ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This paper presents a fully automatic, multi-stage compaction approach to broadcast news summarisation, targeting transcripts from automatic speech recognition (ASR) systems. It employs a network of multi-layer perceptrons to remove incorrectly transcribed words based on confidence scores, and to select significant chunks at multiple stages based on tf.idf scores and named entity frequency. The resulting summaries are assessed using a combination of cross comprehension test and a fluency test, finally compared with an automatic evaluation scheme. The experimental results show the approach can produce summaries with good information content. 1.
The ICSI-SRI-UW metadata extraction system
- in Proc. Int. Conf. Spoken Lang. Process
"... Both human and automatic processing of speech require recognizing more than just the words. We describe a state-of-the-art system for automatic detection of “metadata ” (information beyond the words) in both broadcast news and spontaneous telephone conversations, developed as part of the DARPA EARS ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Both human and automatic processing of speech require recognizing more than just the words. We describe a state-of-the-art system for automatic detection of “metadata ” (information beyond the words) in both broadcast news and spontaneous telephone conversations, developed as part of the DARPA EARS Rich Transcription program. System tasks include sentence boundary detection, filler word detection, and detection/correction of disfluencies. To achieve best performance, we combine information from different types of language models (based on words, part-of-speech classes, and automatically induced classes) with information from a prosodic classifier. The prosodic classifier employs bagging and ensemble approaches to better estimate posterior probabilities. We use confusion networks to improve robustness to speech recognition errors. Most recently, we have investigated a maximum entropy approach for the sentence boundary detection task, yielding a gain over our standard HMM approach. We report results for these techniques on the official NIST Rich Transcription metadata tasks. 1.
A Study in Machine Learning from Imbalanced Data for Sentence Boundary Detection in Speech
- Computer Speech and Language
, 2006
"... Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information. ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information. Since there are more nonsentence boundaries than sentence boundaries in the data, the prosody model, which is implemented as a decision tree classifier, must be constructed to effectively learn from the imbalanced data distribution. To address this problem, we investigate a variety of sampling approaches and a bagging scheme. A pilot study was carried out to select methods to apply to the full NIST sentence boundary evaluation task across two corpora (conversational telephone speech and broadcast news speech), using both human transcriptions and recognition output. In the pilot study, when classification error rate is the performance measure, using the original training set achieves the best performance among the sampling methods, and an ensemble of multiple classifiers from different downsampled training sets achieves
RESTORING PUNCTUATION AND CAPITALIZATION IN TRANSCRIBED SPEECH
"... Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3to n = ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-based n-gram language model. We study the effect on performance of varying the n-gram order (from n = 3to n = 6) and the amount of training data (from 58 million to 55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much. Index Terms — Speech recognition, punctuation, capitalization. 1.
Maximum Entropy Segmentation of Broadcast News
- in Proceedings of ICASSP 2005
, 2005
"... This paper presents an automatic system for structuring and preparing a news broadcast for applications such as speech summarization, browsing, archiving and information retrieval. This process comprises transcribing the audio using an automatic speech recognizer and subsequently segmenting the text ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper presents an automatic system for structuring and preparing a news broadcast for applications such as speech summarization, browsing, archiving and information retrieval. This process comprises transcribing the audio using an automatic speech recognizer and subsequently segmenting the text into utterances and topics. A maximum entropy approach is used to build statistical models for both utterance and topic segmentation. The experimental work addresses the effect on performance of the topic boundary detector of three factors: the information sources used, the quality of the ASR transcripts, and the quality of the utterance boundary detector. The results show that the topic segmentation is not affected severely by transcripts errors, whereas errors in the utterance segmentation are more devastating.

