Results 1 -
8 of
8
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
Retrieving Spoken Documents by Combining Multiple Index Sources
, 1996
"... This paper presents domain-independent methods of spoken document retrieval. Both a continuous-speech large vocabulary recognition system, and a phone-lattice word spotter, are used to locate index units within an experimental corpus of voice messages. Possible index terms are nearly unconstrained; ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
This paper presents domain-independent methods of spoken document retrieval. Both a continuous-speech large vocabulary recognition system, and a phone-lattice word spotter, are used to locate index units within an experimental corpus of voice messages. Possible index terms are nearly unconstrained; terms not in a 20,000 word recognition system vocabulary can be identified bytheword spotter at search time. Though either system alone can yield respectable retrieval performance, the two methods are complementary and work best in combination. Different ways of combining them are investigated, and it is shown that the best of these can increase retrieval average precision for a speaker-independent retrieval system to85% of that achieved for full-text transcriptions of the test documents.
Open-Vocabulary Speech Indexing for Voice and Video Mail Retrieval
, 1996
"... This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow ex- tremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
This paper presents recent work on a multimedia retrieval project at Cambridge University and Olivetti Research Limited (ORL). We present novel techniques that allow ex- tremely rapid audio indexing, at rates approaching several thousand times real time. Unlike other methods, these techniques do not depend on a fixed vocabulary recognition system or on keywords that must be known well in advance. Using statistical methods developed for text, these indexing techniques allow rapid and efficient retrieval and browsing of audio and video documents. This paper presents the project background, the indexing and retrieval techniques, and a video mail retrieval application incorporating content-based audio indexing, retrieval, and browsing.
Experiments in Spoken Queries for Document Retrieval
, 1997
"... We report the results of three experiments using the errorful output of a large vocabulary continuous speech recognition (LVCSR) system as the input to a statistical information retrieval (IR) system. Our goal is to allow a user to speak, rather than type, query terms into an IR engine and still obt ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
We report the results of three experiments using the errorful output of a large vocabulary continuous speech recognition (LVCSR) system as the input to a statistical information retrieval (IR) system. Our goal is to allow a user to speak, rather than type, query terms into an IR engine and still obtain relevant documents. The purpose of these experiments is to test whether IR systems are robust to errors in the query terms introduced by the speech recognizer. If the correctly recognized words in the search query outweigh the misinformation from the incorrectly recognized words, the relevant documents will still be retrieved. This paper presents evidence that speech-driven IR can be effective, although with a reduced precision. We also find that longer spoken queries produce higher precision retrieval than shorter queries. For queries containing many (50-60) search terms and a recognizer word error rate (WER) of 27.9%, the precision at 30 documents retrieved is degraded by only 11.1%. F...
Indexing and Retrieval of Broadcast News
- Speech Communication
, 2000
"... This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval system. We discuss the development of a realtime Broadcast News speech r ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
This paper describes a spoken document retrieval (SDR) system for British and North American Broadcast News. The system is based on a connectionist large vocabulary speech recognizer and a probabilistic information retrieval system. We discuss the development of a realtime Broadcast News speech recognizer, and its integration into an SDR system. Two advances were made for this task: automatic segmentation and statistical query expansion using a secondary corpus. Precision and recall results using the Text Retrieval Conference (TREC) SDR evaluation infrastructure are reported throughout the paper, and we discuss the application of these developments to a large scale SDR task based on an archive of British English broadcast news. Keywords: Spoken Document Retrieval; Information Retrieval; Broadcast Speech; Large Vocabulary Speech Recognition. 1 Introduction Retrieval of audio segments according to their content is a challenging and significant problem. It has been estimated th...
Speaker Identification Based Text To Audio Alignment For An Audio Retrieval System
, 1997
"... We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generate ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
We report on an audio retrieval system which lets Internet users efficiently access a large audio database containing recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to text transcripts of the proceedings (which are manually generated by the U.S. Government) using a novel method based on speaker identification. Speaker sequence and approximate timing information is extracted from the text transcript and used to constrain a Viterbi alignment of speaker models to the observed audio. Speakers are modeled by computing Gaussian statistics of cepstral coefficients extracted from samples of each person's speech. The speaker identification is used to locate speaker transition points in the audio which are then linked to corresponding speaker transitions in the text transcript. The alignment system has been successfully integrated into a World Wide Web based search and browse system as an experimental service on the Internet.
Experimental Results in Audio Indexing
- In Proc. DARPA 1997 Speech Recognition Workshop
, 1997
"... In this paper we describe the IBM Audio-Indexing System and present some experimental results on the performance of the system on an audio indexing task. 1. Introduction In today's information technology age we encounter large quantities of information, both audio and video, in our daily lives and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we describe the IBM Audio-Indexing System and present some experimental results on the performance of the system on an audio indexing task. 1. Introduction In today's information technology age we encounter large quantities of information, both audio and video, in our daily lives and there is a great need for efficient ways of searching and retrieving relevant information. The goal of an audioindexing system is to provide the capability of searching and browsing through audio content. The system is formed by integrating information retrieval methods with large vocabulary continuous speech recognition. In this paper we describe the IBM audio indexing system and present some experimental results on a simple audio indexing task. The simplest way of searching through speech is by locating potential search keys through wordspotting. Wordspotting, however, is computationally expensive and therefore ceases to be practical for large-scale applications. A more efficient method w...
Integration of a Large Text and Audio Corpus Using Speaker Identification
, 1997
"... We report on an audio retrieval system which lets Internet users efficiently access a large text and audio corpus containing the transcripts and recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to corresponding text transcripts (which ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We report on an audio retrieval system which lets Internet users efficiently access a large text and audio corpus containing the transcripts and recordings of the proceedings of the United States House of Representatives. The audio has been temporally aligned to corresponding text transcripts (which are manually generated by the U.S. Government) using an automatic method based on speaker identification. This system is an example of using digital storage and structured media to make a large multimedia archive easily accessible. Introduction In the United States, the text of proceedings of the two houses of the Congress has long been published in the Congressional Record. No systematic effort has been made, however, to record audio from the floor of the House and Senate. In 1995, the non-profit Internet Multicasting Service (IMS) began sending out live streaming audio to the Internet and making complete digital audio recordings of the proceedings on computer disks. The challenge was to ...

