Results 1 - 10
of
29
Multimodal Video Indexing: A Review of the State-of-the-art
- Multimedia Tools and Applications
, 2003
"... Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video in ..."
Abstract
-
Cited by 103 (18 self)
- Add to MetaCart
Efficient and effective handling of video documents depends on the availability of indexes. Manual indexing is unfeasible for large video collections. In this paper we survey several methods aiming at automating this time and resource consuming process. Good reviews on single modality based video indexing have appeared in literature. Effective indexing, however, requires a multimodal approach in which either the most appropriate modality is selected or the different modalities are used in collaborative fashion. Therefore, instead of separately treating the different information sources involved, and their specific algorithms, we focus on the similarities and differences between the modalities. To that end we put forward a unifying and multimodal framework, which views a video document from the perspective of its author. This framework forms the guiding principle for identifying index types, for which automatic methods are found in literature. It furthermore forms the basis for categorizing these different methods.
A Critique and Improvement of an Evaluation Metric for Text Segmentation
- COMPUTATIONAL LINGUISTICS
, 2002
"... ..."
Adjustable Filmstrips and Skims as Abstractions for a Digital Video Library
- IEEE Advances in Digital Libraries Conference
, 1999
"... Filmstrips and video skims are two presentation schemes for abstracting information in a digital video segment. Filmstrips present information all at once in a static form, while video skims are played and disclose information temporally. This paper discusses the evolution of the filmstrip and skim ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
Filmstrips and video skims are two presentation schemes for abstracting information in a digital video segment. Filmstrips present information all at once in a static form, while video skims are played and disclose information temporally. This paper discusses the evolution of the filmstrip and skim interfaces in the Informedia Digital Video Library. Filmstrips are commonly deployed as interfaces for video and image libraries, but we found initial Informedia filmstrips and skims received little use. We discuss the interface considerations motivating the redesign of filmstrips and skims to adjust their presentations dynamically based on user context and preference.
Discovery and fusion of salient multi-modal features towards news story segmentation
- in IS&T/SPIE Electronic Imaging
, 2004
"... In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to effectively fuse diverse features from multiple levels and modalities, inc ..."
Abstract
-
Cited by 18 (11 self)
- Add to MetaCart
In this paper, we present our new results in news video story segmentation and classification in the context of TRECVID video retrieval benchmarking event 2003. We applied and extended the Maximum Entropy statistical model to effectively fuse diverse features from multiple levels and modalities, including visual, audio, and text. We have included various features such as motion, face, music/speech types, prosody, and high-level text segmentation information. The statistical fusion model is used to automatically discover relevant features contributing to the detection of story boundaries. One novel aspect of our method is the use of a feature wrapper to address different types of features – asynchronous, discrete, continuous and delta ones. We also developed several novel features related to prosody. Using the large news video set from the TRECVID 2003 benchmark, we demonstrate satisfactory performance (F1 measures up to 0.76 in ABC news and 0.73 in CNN news), present how these multi-level multi-modal features construct the probabilistic framework, and more importantly observe an interesting opportunity for further improvement.
Visual Digests for News Video Libraries
- In Proc. ACM Multimedia '99
, 1999
"... The Informedia Digital Video Library contains over 2000 hours of video, growing at a rate of 15 hours per week. A good query engine is not sufficient for information retrieval because often the candidate result sets grow in number as the library grows. Video digests summarize sets of stories from th ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
The Informedia Digital Video Library contains over 2000 hours of video, growing at a rate of 15 hours per week. A good query engine is not sufficient for information retrieval because often the candidate result sets grow in number as the library grows. Video digests summarize sets of stories from the library, providing users with a visual mechanism for interactive browsing and query refinement. These digests are generated dynamically under the direction of the user based on automatically derived metadata from the video library. Three types of digests are discussed: VIBE digests emphasizing word relationships, timelines showing trends against time, and maps showing geographic correlations. Multiple digests can be combined into a single view or animated into a temporal presentation.
Video Classification Based On HMM Using Text And Faces
- In European Signal Processing Conference
, 2000
"... Video content classification and retrieval is a necessary tool in the current merging of entertainment and information media. With the advent of broadband networking, every consumer will have video programs available on-line as well as in the traditional distribution channels. Systems that help in c ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Video content classification and retrieval is a necessary tool in the current merging of entertainment and information media. With the advent of broadband networking, every consumer will have video programs available on-line as well as in the traditional distribution channels. Systems that help in content management have to discern between different categories of video in order to provide for fast retrieval. In this paper we present a novel method for video classification based on face and text trajectories. This is based on the observation that in different TV categories there are different face and text trajectory patterns. Face and text tracking is applied to arbitrary video clips to extract faces and text trajectories. We used Hidden Markov Models (HMM) to classify a given video clip into predefined categories, e.g., commercial, news, sitcom and soap. Our preliminary experimental results show classification accuracy of over 80% for HMM method on short video clips. This paper descri...
The Físchlár-News-Stories System: Personalised Access to an Archive of TV News
- In RIAO
, 2004
"... The “Físchlár ” systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. Físchlár-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. Físchlár-News-S ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
The “Físchlár ” systems are a family of tools for capturing, analysis, indexing, browsing, searching and summarisation of digital video information. Físchlár-News-Stories, described in this paper, is one of those systems, and provides access to a growing archive of broadcast TV news. Físchlár-News-Stories has several notable features including the fact that it automatically records TV news and segments a broadcast news program into stories, eliminating advertisements and credits at the start/end of the broadcast. Físchlár-News-Stories supports access to individual stories via calendar lookup, text search through closed captions, automatically-generated links between related stories, and personalised access using a personalisation and recommender system based on collaborative filtering. Access to individual news stories is supported either by browsing keyframes with synchronised closed captions, or by playback of the recorded video. One strength of the Físchlár-News-Stories system is that it is actually used, in practice, daily, to access news. Several aspects of the Físchlár systems have been published before, bit in this paper we give a summary of the Físchlár-News-Stories system in operation by following a scenario in which it is used and also outlining how the underlying system realises the functions it offers.
Classification of Summarized Videos using Hidden Markov Models on Compressed Chromaticity Signatures
- In ACM International Conference on Multimedia
, 2001
"... As digital libraries and video databases grow, we need methods to assist us in the synthesis and analysis of digital video. Since the information in video databases can be measured in thousands of gigabytes of uncompressed data, tools for efficient summarizing and indexing of video sequences are ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
As digital libraries and video databases grow, we need methods to assist us in the synthesis and analysis of digital video. Since the information in video databases can be measured in thousands of gigabytes of uncompressed data, tools for efficient summarizing and indexing of video sequences are indispensable. In this paper, we present a method for effective classification of different types of videos that makes use of video summarization that is the form of a storyboard of keyframes. To produce the summarization, we first generate a universal basis on which to project a video frame that effectively reduces any video to the same lighting conditions. Each frame is represented by a compressed chromaticity signature. We then set out a multi-stage hierarchical clustering method to efficiently summarize a video. Finally we classify TV videos using a trained hidden Markov model on the compressed chromaticity signatures and also temporal features of videos that are represented by their summaries.
InsightVideo: Towards hierarchical video content organization for efficient browsing, summarization and retrieval
"... Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Hierarchical video browsing and feature-based video retrieval are two standard methods for accessing video content. Very little research, however, has addressed the benefits of integrating these two methods for more effective and efficient video content access.
Joint Video Scene Segmentation And Classification Based On Hidden Markov Model
- ICME-2000
, 2000
"... Video classi#cation and segmentation are fundamental steps for e#cient accessing, retrieving and browsing large amount of video data. Wehave developed a scene classi#cation scheme using a Hidden Markov Model #HMM#- based classi#er. By utilizing the temporal behaviors of di#erent scene classes, HMM ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Video classi#cation and segmentation are fundamental steps for e#cient accessing, retrieving and browsing large amount of video data. Wehave developed a scene classi#cation scheme using a Hidden Markov Model #HMM#- based classi#er. By utilizing the temporal behaviors of di#erent scene classes, HMM classi#er can e#ectively classify video segments into one of the prede#ned scene classes. In this paper, we describe two approaches for joint video classi#cation and segmentation based on HMM, which works by searching for the most likely class transition path utilizing the dynamic programming technique. 1. INTRODUCTION Video classi#cation and segmentation are fundamental steps for e#cient accessing, retrieving and browsing large amount of video data. Recently, several research groups have developed algorithms to detect scene change by incorporating audio and visual information. Most of these works #1, 2, 3# are based on some prior scene models, #e.g. dialog, setting, etc.# and accomplish ...

