Results 1 - 10
of
19
Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech
- IEEE TRANS. ON SPEECH AND AUDIO PROCESSING
, 2004
"... This paper presents techniques for speech-to-text and speech-to-speech automatic summarization based on speech unit extraction and concatenation. For the former case, a two-stage summarization method consisting of important sentence extraction and word-based sentence compaction is investigated. Sent ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
This paper presents techniques for speech-to-text and speech-to-speech automatic summarization based on speech unit extraction and concatenation. For the former case, a two-stage summarization method consisting of important sentence extraction and word-based sentence compaction is investigated. Sentence and word units which maximize the weighted sum of linguistic likelihood, amount of information, confidence measure, and grammatical likelihood of concatenated units are extracted from the speech recognition results and concatenated for producing summaries. For the latter case, sentences, words, and between-filler units are investigated as units to be extracted from original speech. These methods are applied to the summarization of unrestricted-domain spontaneous presentations and evaluated by objective and subjective measures. It was confirmed that proposed methods are effective in spontaneous speech summarization.
Automatic Summarization of Broadcast News Using Structural Features
- Proceedings of Eurospeech 2003
"... We present a method for summarizing broadcast news that is not affected by word errors in an automatic speech recognition transcription, using information about the structure of the news program. We construct a directed graphical model to represent the probability distribution and dependencies among ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
(Show Context)
We present a method for summarizing broadcast news that is not affected by word errors in an automatic speech recognition transcription, using information about the structure of the news program. We construct a directed graphical model to represent the probability distribution and dependencies among the structural features which we train by finding the values of parameters of the conditional probability tables. We then rank segments of the test set and extract the highest ranked ones as a summary. We present the procedure and preliminary test results. 1.
Automatic Generation of Concise Summaries of Spoken Dialogues in Unrestricted Domains
- In Proc. ACM SIGIR
, 2001
"... Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address th ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address the following issues which are intrinsic to spoken dialogue summarization and typically can be ignored when summarizing written text such as newswire data: (i) detection and removal of speech disfluencies; (ii) detection and insertion of sentence boundaries; (iii) detection and linking of cross-speaker information units (question-answer pairs). A global system evaluation using a corpus of 23 relevance annotated dialogues containing 80 topical segments shows that for the two more informal genres, our summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).
Automatic music classification and summarization
- IEEE TRANS.SPEECH AUD. PROCESSING
, 2005
"... Automatic music classification and summarization are very useful to music indexing, content-based music retrieval and on-line music distribution, but it is a challenge to extract the most common and salient themes from unstructured raw music data. In this paper, we propose effective algorithms to a ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Automatic music classification and summarization are very useful to music indexing, content-based music retrieval and on-line music distribution, but it is a challenge to extract the most common and salient themes from unstructured raw music data. In this paper, we propose effective algorithms to automatically classify and summarize music content. Support vector machines are applied to classify music into pure music and vocal music by learning from training data. For pure music and vocal music, a number of features are extracted to characterize the music content, respectively. Based on calculated features, a clustering algorithm is applied to structure the music content. Finally, a music summary is created based on the clustering results and domain knowledge related to pure and vocal music. Support vector machine learning shows a better performance in music classification than traditional Euclidean distance methods and hidden Markov model methods. Listening tests are conducted to evaluate the quality of summarization. The experiments on different genres of pure and vocal music illustrate the results of summarization are significant and effective.
Automatic summarization of voicemail messages using lexical and prosodic features
- ACM Transactions on Speech and Language Processing
, 2005
"... This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech.
Speech Recognition Performance On A Voicemail Transcription Task
- In Proc. IEEE ICASSP
, 1999
"... this paper we describe a new testbed for developing speech recognition algorithms- the ARPA-sponsored VoiceMail transcription task, analogus to other tasks such as the Switchboard, CallHome [1] and the Hub 4 tasks [2] which are currently used by speech recog- nition researchers. As the name indicate ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
this paper we describe a new testbed for developing speech recognition algorithms- the ARPA-sponsored VoiceMail transcription task, analogus to other tasks such as the Switchboard, CallHome [1] and the Hub 4 tasks [2] which are currently used by speech recog- nition researchers. As the name indicates, the task involves the transcription of voicemail conversations. Voicemail represents a very large volume of real-world speech data, which is however not particularly well represented in existing databases. For instance, the Switchboard and CallHome databases contain telephone conversations between two humans, representing telephonebandwidth spontaneous speech; the Hub 4 database contains radio broadcasts which represents different kinds of speech data such as spontaneous speech from a well-trained speaker, conversations between two humans possibly over the telephone, etc. The Voicemail database on the other hand also represents telephone bandwidth spontaneous speech, however the difference with respect to the Switchboard and CallHome tasks is that the interaction is not between two humans, but rather between a human and a machine- consequently, the speech is expected to be a little more formal in its nature, without the problems of cross-talk, barge-in etc. This eliminates some of the variables and provides more controlled conditions enabling one to concentrate on the aspects of spontaneous speech and effects of the telephone channel. In this paper, we will describe the modaltry of collection of the speech data, and some algorithmic techniques that were devised based on this data. We will also describe the initial results of tran- scription performance on this task
Information Extraction from Voicemail
- In Proceedings of the Conference of the Association for Computational Linguistics (ACL
, 2001
"... In this paper we address the problem of extracting key pieces of information from voicemail messages, such as the identity and phone number of the caller. ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
(Show Context)
In this paper we address the problem of extracting key pieces of information from voicemail messages, such as the identity and phone number of the caller.
The Role of Prosody in a Voicemail Summarization System
- In Proc. ISCA Workshop on Prosody in Speech Recognition and Understanding
, 2001
"... When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe t ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
(Show Context)
When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper we compare and visualize the relative contribution of these two types of features within a voicemail summarization system. We describe the system's ability to generate summaries of two test sets, having trained and validated using 700 messages from the IBM Voicemail corpus. Results measuring the quality of summary artifacts show that combined lexical and prosodic features are at least as robust as combined lexical features alone across all operating conditions. 1.
Advances in automatic speech summarization
- In Proceedings of the 7th European Conference on Speech Communication and Technology
, 2001
"... Speech summarization technology, which extracts important information and removes irrelevant information from speech, is expected to play an important role in building speech archives and improving the efficiency of spoken document retrieval. However, speech summarization has a number of significant ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Speech summarization technology, which extracts important information and removes irrelevant information from speech, is expected to play an important role in building speech archives and improving the efficiency of spoken document retrieval. However, speech summarization has a number of significant challenges that distinguish it from general text summarization. Fundamental problems with speech summarization include speech recognition errors, disfluencies, and difficulties of sentence segmentation. Typical speech summarization systems consist of speech recognition, sentence segmentation, sentence extraction, and sentence compaction components. Most research up to now has focused on sentence extraction, using LSA (Latent Semantic Analysis), MMR (Maximal Marginal Relevance), or feature-based approaches, among which no decisive method has yet been found. Proper sentence segmentation is also essential to achieve good summarization performance. How to objectively evaluate speech summarization results is also an important issue. Several measures, including families of SumACCY and ROUGE measures, have been proposed, and correlation analyses between subjective and objective evaluation scores have been performed. Although these measures are useful for ranking various summarization methods, they do not correlate well with human evaluations, especially when spontaneous speech is targeted. 1.
Extractive Summarization of Voicemail using Lexical and Prosodic Feature Subset Selection
- In Proc. Eurospeech
, 2001
"... This paper presents a novel data-driven approach to summarizing spoken audio transcripts utilizing lexical and prosodic features. The former are obtained from a speech recognizer and the latter are extracted automatically from speech waveforms. We employ a feature subset selection algorithm, based o ..."
Abstract
-
Cited by 13 (8 self)
- Add to MetaCart
(Show Context)
This paper presents a novel data-driven approach to summarizing spoken audio transcripts utilizing lexical and prosodic features. The former are obtained from a speech recognizer and the latter are extracted automatically from speech waveforms. We employ a feature subset selection algorithm, based on ROC curves, which examines different combinations of features at different target operating conditions. The approach is evaluated on the IBM Voicemail corpus, demonstrating that it is possible and desirable to avoid complete commitment to a single best classifier or feature set.