Results 1 - 10
of
55
An overview of audio information retrieval
, 1999
"... The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent ..."
Abstract
-
Cited by 112 (1 self)
- Add to MetaCart
The problem of audio information retrieval is familiar to anyone who has returned from vacation to find an answering machine full of messages. While there is not yet an “AltaVista ” for the audio data type, many workers are finding ways to automatically locate, index, and browse audio using recent advances in speech recognition and machine listening. This paper reviews the state of the art in audio information retrieval, and presents recent advances in automatic speech recognition, word spotting, speaker and music identification, and audio similarity with a view towards making audio less “opaque”. A special section addresses intelligent interfaces for navigating and browsing audio and multimedia documents, using automatically derived information to go beyond the tape recorder metaphor.
MARSYAS: A framework for audio analysis
, 2000
"... Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using c ..."
Abstract
-
Cited by 89 (16 self)
- Add to MetaCart
Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using current techniques.
Auto-Summarization of Audio-Video Presentations
, 1999
"... As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video s ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video summaries; that is, a shorter version assembled by picking important segments from the original. We evaluate three techniques for automatic creation of summaries for online audio-video presentations. These techniques exploit information in the audio signal (e.g., pitch and pause information), knowledge of slide transition points in the presentation, and information about access patterns of previous users. We report a user study that compares automatically generated summaries that are 20%- 25% the length of full presentations to author generated summaries. Users learn from the computer-generated summaries, although less than from authors' summaries. They initially find computer-generated summ...
The Audio Notebook - Paper and Pen Interaction with Structured Speech
, 2001
"... This paper addresses the problem that a listener experiences when attempting to capture information presented during a lecture, meeting, or interview. Listeners must divide their attention between the talker and their notetaking activity. We propose a new device -- the Audio Notebook -- for taking n ..."
Abstract
-
Cited by 59 (2 self)
- Add to MetaCart
This paper addresses the problem that a listener experiences when attempting to capture information presented during a lecture, meeting, or interview. Listeners must divide their attention between the talker and their notetaking activity. We propose a new device -- the Audio Notebook -- for taking notes and interacting with a speech recording. The Audio Notebook is a combination of a digital audio recorder and paper notebook, all in one device. Audio recordings are structured using two techniques: user structuring based on notetaking activity, and acoustic structuring based on a talker's changes in pitch, pausing, and energy. A field study showed that the interaction techniques enabled a range of usage styles, from detailed review to high speed skimming. The study motivated the addition of phrase detection and topic suggestions to improve access to the audio recordings. Through these audio interaction techniques, the Audio Notebook defines a new approach for navigation in the audio domain.
Advances in automatic meeting record creation and access
- in Proc. IEEE ICASSP
, 2001
"... Oral communication is transient but many important decisions, so-cial contracts and fact 'ndings are 'rst canied out in an oral setup, documented in written form and later retrieved. At Carnegie Mel-lons University s Interactive Systems Laboratories we have been experimenting with the documentation ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
Oral communication is transient but many important decisions, so-cial contracts and fact 'ndings are 'rst canied out in an oral setup, documented in written form and later retrieved. At Carnegie Mel-lons University s Interactive Systems Laboratories we have been experimenting with the documentation of meetings. Ths paper summarizes part of the progress that we have made in this test bed, speci'cally on the question of automatic transcription us-ing LVCSR, information access using non-keyword based meth-ods, summarization and user interfaces. The system is capable to automatically construct a searchable and browsable audiovisual database of meetings and provide access to these records. 1.
Multifeature Audio Segmentation For Browsing And Annotation
- IN PROC.1999 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, WASPAA99
, 1999
"... Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been pro ..."
Abstract
-
Cited by 45 (8 self)
- Add to MetaCart
Indexing and content-based retrieval are necessary to handle the large amounts of audio and multimedia data that is becoming available on the web and elsewhere. Since manual indexing using existing audio editors is extremely time consuming a number of automatic content analysis systems have been proposed. Most of these systems rely on speech recognition techniques to create text indices. On the other hand, very few systems have been proposed for automatic indexing of music and general audio. Typically these systems rely on classification and similarity-retrieval techniques and work in restricted audio domains. A somewhat
SUEDE: A Wizard of Oz Prototyping Tool for Speech User Interfaces
, 2000
"... Speech-based user interfaces are growing in popularity. Unfortunately, the technology expertise required to build speech UIs precludes many individuals from participating in the speech interface design process. Furthermore, the time and knowledge costs of building even simple speech systems make it ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Speech-based user interfaces are growing in popularity. Unfortunately, the technology expertise required to build speech UIs precludes many individuals from participating in the speech interface design process. Furthermore, the time and knowledge costs of building even simple speech systems make it difficult for designers to iteratively design speech UIs. SUEDE, the speech interface prototyping tool we describe in this paper, allows designers to rapidly create prompt/response speech interfaces. It offers an electronically supported Wizard of Oz (WOz) technique that captures test data, allowing designers to analyze the interface after testing. This informal tool enables speech user interface designers, even non-experts, to quickly create, test, and analyze speech user interface prototypes.
Designing Presentations for On-Demand Viewing
, 2000
"... Streaming digital video is becoming increasingly widespread. How should video presentations be designed for web access? How is video accessed and used online? We examined detailed behavior patterns of more than 9000 users of a large corpus of professionally prepared presentations. We find that as ma ..."
Abstract
-
Cited by 32 (6 self)
- Add to MetaCart
Streaming digital video is becoming increasingly widespread. How should video presentations be designed for web access? How is video accessed and used online? We examined detailed behavior patterns of more than 9000 users of a large corpus of professionally prepared presentations. We find that as many people are accessing the talks on demand as attend live, but online access patterns differ markedly from live attendance. People watch less overall and they utilize the ability to skip to different parts of a talk. In designing presentations that will be viewed later on demand, speakers should emphasize key points early in the talk and early within each slide, use slide titles that are meaningful outside the flow of the talk, and reveal as much structure as possible in the slide titles. The results also provide guidance for those developing tools for on-demand multimedia authoring and use. Keywords Video on-demand, streaming media, digital library INTRODUCTION Steady improvements in n...
Browsing Digital Video
, 1999
"... Video in digital format coupled with digital/programmable playback devices presents opportunities for significantly enhancing the user's viewing experience. For example, time compression can shorten the viewing length of a video and shot boundary frames can provide a visual index into the content. S ..."
Abstract
-
Cited by 28 (3 self)
- Add to MetaCart
Video in digital format coupled with digital/programmable playback devices presents opportunities for significantly enhancing the user's viewing experience. For example, time compression can shorten the viewing length of a video and shot boundary frames can provide a visual index into the content. Such features have primarily been evaluated in isolation with a narrow set of video content types. We investigated as well as implemented the design of a software video browsing application that combines many such features. In addition, we evaluated its use in watching six different video content types and present the resulting data for analysis and discussion. The participants in the evaluation found the browser to be useful and effective for watching the different types of video in a limited amount of time. Also, the results show that both the experience of using the browser and value of each feature varies depending on the content type.
An intelligent media browser using automatic multimodal analysis
- ACM Multimedia
, 1998
"... Many techniques can extract information from an multimedia stream, such as speaker identity or shot boundaries. We present a browser that uses this information to navigate through stored media. Because automatically-derived information is not wholly reliable, it is transformed into a time-dependent ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
Many techniques can extract information from an multimedia stream, such as speaker identity or shot boundaries. We present a browser that uses this information to navigate through stored media. Because automatically-derived information is not wholly reliable, it is transformed into a time-dependent “confidence score. ” When presented graphically, confidence scores enable users to make informed decisions about regions of interest in the media, so that non-interesting areas may be skipped. Additionally, index points may be determined automatically for easy navigation, selection, editing, and annotation and will support analysis types other than the speaker identification and shot detection used here. 1.1 Keywords Content-based retrieval, video, speaker identification, automatic analysis, visualization, skimming 2.

