Results 1 - 10
of
11
Indexing and Retrieval of Broadcast News
- Speech Communication
, 2000
"... This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval system. We discuss the development of a realtime Broadcast News speech r ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval system. We discuss the development of a realtime Broadcast News speech recognizer, and its integration into an SDR system. Two advances were made for this task: automatic segmentation and statistical query expansion using a secondary corpus. Precision and recall results using the Text Retrieval Conference (TREC) SDR evaluation infrastructure are reported throughout the paper, and we discuss the application of these developments to a large scale SDR task based on an archive of British English broadcast news. Keywords: Spoken Document Retrieval; Information Retrieval; Broadcast Speech; Large Vocabulary Speech Recognition. 1 Introduction Retrieval of audio segments according to their content is a challenging and significant problem. It has been estimated th...
Automatic Generation of Concise Summaries of Spoken Dialogues in Unrestricted Domains
- In Proc. ACM SIGIR
, 2001
"... Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address th ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Automatic summarization of open domain spoken dialogues is a new research area. This paper introduces the task, the challenges involved, and presents an approach to obtain automatic extract summaries for multi-party dialogues of four different genres, without any restriction on domain. We address the following issues which are intrinsic to spoken dialogue summarization and typically can be ignored when summarizing written text such as newswire data: (i) detection and removal of speech disfluencies; (ii) detection and insertion of sentence boundaries; (iii) detection and linking of cross-speaker information units (question-answer pairs). A global system evaluation using a corpus of 23 relevance annotated dialogues containing 80 topical segments shows that for the two more informal genres, our summarization system using dialogue specific components significantly outperforms a baseline using TFIDF term weighting with maximum marginal relevance ranking (MMR).
DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains
, 2000
"... In this paper, we present a summarization system for spontaneous dialogues which consists of a novel multi-stage architecture. It is specifically aimed at addressing issues related to the nature of the texts being spoken vs. written and being dialogical vs. monological. The system is embedded in a g ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
In this paper, we present a summarization system for spontaneous dialogues which consists of a novel multi-stage architecture. It is specifically aimed at addressing issues related to the nature of the texts being spoken vs. written and being dialogical vs. monological. The system is embedded in a graphical user interface and was developed and tested on transcripts of recorded telephone conversations in English and Spanish (CALLHOME).
Advances in automatic speech summarization
- In Proceedings of the 7th European Conference on Speech Communication and Technology
, 2001
"... Speech summarization technology, which extracts important information and removes irrelevant information from speech, is expected to play an important role in building speech archives and improving the efficiency of spoken document retrieval. However, speech summarization has a number of significant ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Speech summarization technology, which extracts important information and removes irrelevant information from speech, is expected to play an important role in building speech archives and improving the efficiency of spoken document retrieval. However, speech summarization has a number of significant challenges that distinguish it from general text summarization. Fundamental problems with speech summarization include speech recognition errors, disfluencies, and difficulties of sentence segmentation. Typical speech summarization systems consist of speech recognition, sentence segmentation, sentence extraction, and sentence compaction components. Most research up to now has focused on sentence extraction, using LSA (Latent Semantic Analysis), MMR (Maximal Marginal Relevance), or feature-based approaches, among which no decisive method has yet been found. Proper sentence segmentation is also essential to achieve good summarization performance. How to objectively evaluate speech summarization results is also an important issue. Several measures, including families of SumACCY and ROUGE measures, have been proposed, and correlation analyses between subjective and objective evaluation scores have been performed. Although these measures are useful for ranking various summarization methods, they do not correlate well with human evaluations, especially when spontaneous speech is targeted. 1.
Speech-to-Text and Speech-to-Speech Summarization of Spontaneous Speech
- IEEE TRANS. ON SPEECH AND AUDIO PROCESSING
, 2004
"... This paper presents techniques for speech-to-text and speech-to-speech automatic summarization based on speech unit extraction and concatenation. For the former case, a two-stage summarization method consisting of important sentence extraction and word-based sentence compaction is investigated. Sent ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This paper presents techniques for speech-to-text and speech-to-speech automatic summarization based on speech unit extraction and concatenation. For the former case, a two-stage summarization method consisting of important sentence extraction and word-based sentence compaction is investigated. Sentence and word units which maximize the weighted sum of linguistic likelihood, amount of information, confidence measure, and grammatical likelihood of concatenated units are extracted from the speech recognition results and concatenated for producing summaries. For the latter case, sentences, words, and between-filler units are investigated as units to be extracted from original speech. These methods are applied to the summarization of unrestricted-domain spontaneous presentations and evaluated by objective and subjective measures. It was confirmed that proposed methods are effective in spontaneous speech summarization.
Summarization of Spoken Language -- Challenges, Methods, and Prospects
- JANUARY 2002 JAMIL ANWAR, M.M.AWAIS, SHAHID MASUD, AND SHAFAY SHAMAIL AUTOMATIC ARABIC SPEECH SEGMENTATION SYSTEM
, 2002
"... ..."
Thematic Indexing of Spoken Documents by Using Self-Organizing Maps
- RR 00-5, IDIAP
, 2000
"... A method is presented to provide a useful searchable index for spoken audio documents. The task differs from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take adv ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
A method is presented to provide a useful searchable index for spoken audio documents. The task differs from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with the help of the other documents close to it using a semantic vector space. First, the audio stream is converted into a text stream by a speech recognizer. Then the text of each story is represented by a document vector which is the normalized sum of the word vectors in the story. A large collection of document vectors is used to train a self-organizing map to find the clusters and latent semantic structures in the collection. Because the news stories are quite short and include speech recognition errors, the idea of smoothing the document vectors using the thematic clusters determined by the self-...
Indexing Audio Documents by using Latent Semantic Analysis and SOM
- Kohonen Maps
, 1999
"... This paper describes an important application for state-of-art automatic speech recognition, natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and t ..."
Abstract
- Add to MetaCart
This paper describes an important application for state-of-art automatic speech recognition, natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection and use it for more accurate indexing by generating new index terms and stochastic index weights. Indexing methods are evaluated for two broadcast news databases (one French and one English) using the average document perplexity dened in this paper and test queries analyzed by human experts.
The Thisl Sdr System At Trec-8
- Proc. of the 8th Text Retrieval Conference TREC-8, Nov 1999. Martine Adda-Decker, Gilles Adda
"... This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR perfo ..."
Abstract
- Add to MetaCart
This paper describes the participation of the THISL group at the TREC-8 Spoken Document Retrieval (SDR) track. The THISL SDR system consists of the realtime version of the ABBOT large vocabulary speech recognition system and the THISLIR text retrieval system. The TREC-8 evaluation assessed SDR performance on a corpus of 500 hours of broadcast news material collected over a five month period. The main test condition involved retrieval of stories defined by manual segmentation of the corpus in which non-news material, such as commercials, were excluded. An optional test condition required required retrieval of the same stories from the unsegmented audio stream. The THISL SDR system participated at both test conditions. The results show that a system such as THISL can produce respectable information retrieval performance on a realistically-sized corpus of unsegmented audio material. 1. INTRODUCTION The TREC-8 test collection was obtained from the TDT2 corpus and consisted of 902 shows (...
SPEAKER IDENTITY INDEXING IN AUDIO-VISUAL DOCUMENTS
"... The identity of persons in audiovisual documents represents very important semantic information for content-based indexing and retrieval. The task of speaker’s identity detection can be carried out by exploiting data elements resulting from different modalities (text, image and audio). In this artic ..."
Abstract
- Add to MetaCart
The identity of persons in audiovisual documents represents very important semantic information for content-based indexing and retrieval. The task of speaker’s identity detection can be carried out by exploiting data elements resulting from different modalities (text, image and audio). In this article, we propose an approach for speaker identity indexing in broadcast news using audio content. After a speaker segmentation phase, an identity is given to speech segments by applying linguistic patterns to their transcription from speech recognition. Three types of patterns are used to predict the speaker in the previous, current and next speech segments. Predictions are then propagated to other segments by similarity at the acoustic level. Evaluations have been conducted on part of the TREC 2003 corpus: a speaker identity could be assigned to 53 % of the annotated corpus with an 82 % precision. 1.

