Results 1 - 10
of
30
A Critique and Improvement of an Evaluation Metric for Text Segmentation
- COMPUTATIONAL LINGUISTICS
, 2002
"... ..."
Auto-Summarization of Audio-Video Presentations
, 1999
"... As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video s ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
As streaming audio-video technology becomes widespread, there is a dramatic increase in the amount of multimedia content available on the net. Users face a new challenge: How to examine large amounts of multimedia content quickly. One technique that can enable quick overview of multimedia is video summaries; that is, a shorter version assembled by picking important segments from the original. We evaluate three techniques for automatic creation of summaries for online audio-video presentations. These techniques exploit information in the audio signal (e.g., pitch and pause information), knowledge of slide transition points in the presentation, and information about access patterns of previous users. We report a user study that compares automatically generated summaries that are 20%- 25% the length of full presentations to author generated summaries. Users learn from the computer-generated summaries, although less than from authors' summaries. They initially find computer-generated summ...
Learning query-class dependent weights in automatic video retrieval
- In Proceedings of the 12th annual ACM international conference on Multimedia
, 2004
"... Combining retrieval results from multiple modalities plays a crucial role for video retrieval systems, especially for automatic video retrieval systems without any user feedback and query expansion. However, most of current systems only utilize query independent combination or rely on explicit user ..."
Abstract
-
Cited by 46 (13 self)
- Add to MetaCart
Combining retrieval results from multiple modalities plays a crucial role for video retrieval systems, especially for automatic video retrieval systems without any user feedback and query expansion. However, most of current systems only utilize query independent combination or rely on explicit user weighting. In this work, we propose using query-class dependent weights within a hierarchial mixture-of-expert framework to combine multiple retrieval results. We first classify each user query into one of the four predefined categories and then aggregate the retrieval results with query-class associated weights, which can be learned from the development data efficiently and generalized to the unseen queries easily. Our experimental results demonstrate that the performance with query-class dependent weights can considerably surpass that with the query independent weights.
Story Segmentation and Detection of Commercials In Broadcast News Video
- Proceedings of Advances in Digital Libraries Conference
, 1998
"... The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
The Informedia Digital Library Project [Wactlar96] allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can segment the broadcast into video paragraphs, or stories, that are useful for information retrieval. In previous papers [Hauptmann97, Witbrock97, Witbrock98], we have shown that speech recognition is sufficient for information retrieval of pre-segmented video news stories. In this paper we address the issue of segmentation and demonstrate that a fully automatic system can extract story boundaries using available audio, video and closed-captioning cues. The story segmentation step for the Informedia Digital Video Library splits full-length news broadcasts into individual news stories. During this phas...
Collages as Dynamic Summaries for News Video
- DIGITAL VIDEO SUMMARIES, INDEXING AND RETRIEVAL
, 2002
"... This paper introduces the video collage, a novel effective interface for browsing and interpreting video collections. The paper discusses how collages are automatically produced, illustrates their use, and evaluates their effectiveness as summaries across news stories. Collages are presentations ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
This paper introduces the video collage, a novel effective interface for browsing and interpreting video collections. The paper discusses how collages are automatically produced, illustrates their use, and evaluates their effectiveness as summaries across news stories. Collages are presentations of text and images derived from multiple video sources, which provide an interactive visualization for a set of video documents, summarizing their contents and providing a navigation aid for further exploration. The dynamic creation of collages is based on user context, e.g., an originating query, coupled with automatic processing to refine the candidate imagery. Named entity identification and common phrase extraction provides descriptive text. The dynamic manipulation of collages allows user-directed browsing and reveals additional detail. The utility of collages as summaries is examined with respect to other published news summaries.
Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval
- In Ph.D Thesis
, 2006
"... In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval m ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval model as a principled framework to combine diverse knowledge sources. An efficient rank-based learning approach has been developed to explicitly model the ranking relations in the learning process. Under this retrieval framework, we overview and develop a number of state-of-the-art approaches for extracting ranking features from multimedia knowledge sources. To incorporate query information in the combination model, this thesis develops a number of query analysis models that can automatically discover mixing structure of the query space based on previous retrieval results. To adapt the combination function on a per query basis, this thesis also presents a probabilistic local context analysis(pLCA) model to automatically leverage additional retrieval sources to improve initial retrieval outputs. All the proposed approaches are evaluated on multimedia retrieval tasks with large-scale video collections as well as meta-search tasks with large-scale text collections. 1
Comparing Presentation Summaries: Slides vs. Reading vs. Listening
, 1999
"... As more audio and video technical presentations go online, it becomes imperative to give users effective summarizing and skimming tools so that they can find the presentation they want and browse through it quickly. In a previous study we reported various automated methods for summarizing audio-vide ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
As more audio and video technical presentations go online, it becomes imperative to give users effective summarizing and skimming tools so that they can find the presentation they want and browse through it quickly. In a previous study we reported various automated methods for summarizing audio-video of presentations, and user response. An open question remained about how well various text/image only techniques will compare to the audio-video summarizations. This study attempts to fill that gap. This paper reports a user study that compares four possible ways of allowing a user to skim a presentation: 1) PowerPoint slides used by the speaker during the presentation, 2) the text transcript created by professional transcribers from the presentation, 3) the transcript with important points highlighted by the speaker, and 4) a audio-video summary created by the speaker. Results show that although some text-only conditions can match the audio-video summary, users have a preference for audi...
Event Based Video Indexing by Intermodal Collaboration
- Proceedings of First International Workshop on Multimedia Intelligent Storage and Retrieval Management (MISRM'99
, 1999
"... In this paper, we propose event based video indexing, which is a kind of indexing by semantical contents. To achieve this, we exploit the idea of intermodal collaboration, i.e. collaborative processing taking account of the semantical dependency between multimodal information streams consisting of v ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
In this paper, we propose event based video indexing, which is a kind of indexing by semantical contents. To achieve this, we exploit the idea of intermodal collaboration, i.e. collaborative processing taking account of the semantical dependency between multimodal information streams consisting of visual, auditory, and text (closed caption: CC) streams. The proposed method attempts to make temporal correspondence between visual and CC streams and to index shots in the visual stream by detecting the events. From experimental results for broadcasted sports video of American football games, we obtained the recall rate of 86% and the precision rate of 74% for indexing accuracy. The results indicate that the method is effective for video indexing.
Automatic scene extraction in motion pictures
- IEEE Transactions on Circuits and Systems for Video Technology
, 2003
"... In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., p ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center,
Infolink: analysis of dutch broadcast news and cross-media browsing
- Proceedings of IEEE International Conference on Multimedia and Expo, ICME 2005
, 2005
"... In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-d ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper, a cross-media browsing demonstrator named InfoLink is described. InfoLink automatically links the content of Dutch broadcast news videos to related information sources in parallel collections containing text and/or video. Automatic segmentation, speech recognition and available meta-data are used to index and link items. The concept is visualised using SMIL-scripts for presenting the streaming broadcast news video and the information links. 1.

