Results 1 -
7 of
7
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
Informedia: News-on-Demand Multimedia Information Acquisition and Retrieval
- INTELLIGENT MULTIMEDIA INFORMATION RETRIEVAL
, 1997
"... In theory, speech recognition technology can make any spoken words in video or audio media subject to text indexing, search and retrieval. This article describes the News-on-Demand application created within the Informedia Digital Video Library project and discusses how speech recognition is used f ..."
Abstract
-
Cited by 75 (6 self)
- Add to MetaCart
In theory, speech recognition technology can make any spoken words in video or audio media subject to text indexing, search and retrieval. This article describes the News-on-Demand application created within the Informedia Digital Video Library project and discusses how speech recognition is used for transcript creation from video, time alignment of closed-captioned transcripts, a speech query interface, and audio paragraph segmentation. Our results show that speech recognition accuracy varies dramatically depending on the quality and type of data used, but the system is quite useable with only moderate speech recognition accuracy.
Error-responsive feedback mechanisms for speech recognizers
, 1997
"... This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
This thesis is about modeling, analyzing, and predicting errorful behavior in large vocabulary continuous speech recognition systems. Because today's state-of-the-art recognizers are not designed to be situated naturally in an error feedback loop, they are ill-positioned for inclusion in multi-modal interfaces, multi-media databases, and other interesting applications. I make improvements to the current approach to predicting and analyzing error behaviors, which is currently based only on the measurement ofword error rate. The speech recognizer's functionality is extended to include con dence annotations, which are \meta-level " markings that indicate how certain the recognizer is that it has decoded its input correctly. This is accomplished by feeding externally de ned error conditions back to the recognizer. Error feedback enables the construction of statistical models that map measurements of the recognizer's internal states and behaviors to externally de ned error conditions.
Story Segmentation and Detection of Commercials In Broadcast News Video
- Proceedings of Advances in Digital Libraries Conference
, 1998
"... The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can segment the broadcast into video paragraphs, or stories, that are useful for information retrieval. In previous papers [Hauptmann97, Witbrock97, Witbrock98], we have shown that speech recognition is sufficient for information retrieval of pre-segmented video news stories. In this paper we address the issue of segmentation and demonstrate that a fully automatic system can extract story boundaries using available audio, video and closed-captioning cues. The story segmentation step for the Informedia Digital Video Library splits full-length news broadcasts into individual news stories. During this phas...
A Survey on Video Indexing
- JOURNAL OF VISUAL COMMUNICATIONS AND IMAGE REPRESENTATION
, 1996
"... Extracting information from the ever growing stream of multimedia data is becoming increasingly difficult. One of the main reasons lies within the unstructured way multimedia data are usually presented. Audio-visual material represents a large part of current multimedia material and can be structure ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Extracting information from the ever growing stream of multimedia data is becoming increasingly difficult. One of the main reasons lies within the unstructured way multimedia data are usually presented. Audio-visual material represents a large part of current multimedia material and can be structured in meaningful ways due to the nature of visual communication. This paper surveys several approaches and algorithms that have been recently developed to help in automatically structuring audio-visual data, both for annotation and access
Cheating with Imperfect Transcripts
, 1996
"... Most speech recognition systems try to reconstruct a word sequence given an acoustic input, using prior information about the language being spoken. In some cases, there is more information available to the decoder than simply the acoustics. When decoding a television news broadcast, for example, th ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Most speech recognition systems try to reconstruct a word sequence given an acoustic input, using prior information about the language being spoken. In some cases, there is more information available to the decoder than simply the acoustics. When decoding a television news broadcast, for example, the closed-caption information that is often recorded for hearing impaired viewers may also be available. While these captions are generally not completely accurate transcriptions, they can be considered to be a strong hint as to what was actually spoken.
Speech Recognition in the Informedia TM Digital Library: Uses and Limitations
- In Proc. 7th International Conference on Tools with AI
, 1995
"... In principle, speech recognition technology can make any spoken data useful for library indexing and retrieval. This paper describes the Informedia Digital Video Library project and discusses how speech recognition is used for transcript creation from video, alignment with handgenerated transcripts, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In principle, speech recognition technology can make any spoken data useful for library indexing and retrieval. This paper describes the Informedia Digital Video Library project and discusses how speech recognition is used for transcript creation from video, alignment with handgenerated transcripts, query interface and audio paragraph segmentation. The results show that speech recognition accuracy varies dramatically depending on the quality and type of data used. Our information retrieval experiments also show that reasonable recall and precision can be obtained with moderate speech recognition accuracy. Finally we discuss some active areas of speech research relevant to the digital video library problem. 1.

