Results 1 -
7 of
7
A factor graph framework for semantic video indexing
- IEEE Transactions on Circuits and Systems for Video Technology
, 2002
"... most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this pr ..."
Abstract
-
Cited by 39 (4 self)
- Add to MetaCart
most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern-recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music, etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework which can enforce spatio-temporal constraints. Using probabilistic models for multijects, rocks, sky, snow, water-body, and forestry/greenery, and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account. Our experiments using a large video database consisting of clips from several movies and based on a set of five semantic concepts reveal a significant improvement in detection performance by over 22%. We also show how the multinet is extended to take temporal correlation into account. By constructing a dynamic multinet, we show that the detection performance is further enhanced by as much as 12%. With this framework, we show how keyword-based query and semantic filtering is possible for a predetermined set of concepts. Index Terms—Factor graphs, hidden Markov models, likelihood ratio test, multimedia understanding, probabilistic graphical networks, probability propagation, query by example, query by keywords, ROC curves, semantic video indexing, sum-product algorithm. I.
Semantic indexing of multimedia content using visual, audio and text cues
- EURASIP Journal on Applied Signal Processing
, 2003
"... In this paper we present a learning-based approach to semantic indexing of multimedia content using cues derived from audio, visual and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of concepts in th ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
In this paper we present a learning-based approach to semantic indexing of multimedia content using cues derived from audio, visual and text features. We approach the problem by developing a set of statistical models for a predefined lexicon. Novel concepts are then mapped in terms of concepts in the lexicon. To achieve robust detection of concepts, we exploit features from multiple modalities, namely audio, visual and text. Concept representations are modeled using Gaussian Mixtures (GMM), Hidden Markov Models (HMM), and Support Vector Machines (SVM). Models such as Bayesian Networks and SVMs are used in a late fusion approach to model concepts that are not explicitly modeled in terms of features. Our experiments indicate promise in the proposed classification and fusion methodologies: Our proposed fusion scheme achieves more than 10 % relative improvement over the best uni-modal concept detector. 1
Real time repeated video sequence identification
- COMPUTER VISION AND IMAGE UNDERSTANDING
, 2004
"... ..."
A Unified Approach to the Generation of Semantic Cues for Sports Video Annotation
"... The use of video and audio features for automated annotation of audio-visual data is becoming widespread. A major limitation of many of the current methods is that the stored indexing features are too low-level --- they relate directly to properties of the data. In this work we apply a further st ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The use of video and audio features for automated annotation of audio-visual data is becoming widespread. A major limitation of many of the current methods is that the stored indexing features are too low-level --- they relate directly to properties of the data. In this work we apply a further stage of processing that associates the feature measurements with real-world objects or events. The outputs, which we call "cues", are combined to enable us to compute directly the probability of the object being present in the scene. An additional advantage of this approach is that the cues from di#erent types of features are presented in a homogeneous way.
A Film Classifier Based on Low-level Visual Features
"... Abstract — We propose an approach to classify the film classes by using low level features and visual features. This approach aims to classify the films into genres. Our current domain of study is using the movie preview. A movie preview often emphasizes the theme of a film and hence provides suitab ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — We propose an approach to classify the film classes by using low level features and visual features. This approach aims to classify the films into genres. Our current domain of study is using the movie preview. A movie preview often emphasizes the theme of a film and hence provides suitable information for classifying process. In our approach, we categorize films into three broad categories: action, dramas, and thriller films. Four computable video features (average shot length, color variance, motion content and lighting key) and visual features (show and fast moving effects) are combined in our approach to provide the advantage information to demonstrate the movie category. The experimental results present that visual features are the useful messages for processing the film classification. On the other hand, our approach can also be extended for other potential applications, including the browsing and retrieval of videos on the internet, video-on-demand, and video libraries. Index Terms — Film classifier, movie genre, shot boundary detection, visual feature I.
Semantic Film Preview Classification Using Low-Level Computable Features
- In 3rd International Workshop on Multimedia Data and Document Engineering (MDDE-2003
, 2003
"... This paper presents a framework for the classification of feature films into genres, based on computable visual cues. The authors view the work as a step towards high-level semantic film interpretation, currently using low-level video features and knowledge of ubiquitous cinematic practices. Our cur ..."
Abstract
- Add to MetaCart
This paper presents a framework for the classification of feature films into genres, based on computable visual cues. The authors view the work as a step towards high-level semantic film interpretation, currently using low-level video features and knowledge of ubiquitous cinematic practices. Our current domain of study is the film preview (the commercial advertisements primarily created to attract audiences) . A preview often emphasizes the theme of a film and hence provides suitable information for classification. In our approach, we classify movies into four broad categories: Comedies, Action Films, Dramas or Horror Films. Computable video features are combined in a framework with cinematic principles to provide a mapping to the four high-level semantic classes. An unsupervised clustering technique is used to discover the structure of the mapping between the computed features and each film genre. Through experiments, we notably demonstrate the structure that exists between low-level features and high-level film genres.

