Results 1 - 10
of
18
Knowledge adaptation for ad hoc multimedia event detection with few exemplars.
- In ACM Multimedia,
, 2012
"... ABSTRACT Multimedia event detection (MED) has a significant impact on many applications. Though video concept annotation has received much research effort, video event detection remains largely unaddressed. Current research mainly focuses on sports and news event detection or abnormality detection ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
ABSTRACT Multimedia event detection (MED) has a significant impact on many applications. Though video concept annotation has received much research effort, video event detection remains largely unaddressed. Current research mainly focuses on sports and news event detection or abnormality detection in surveillance videos. Our research on this topic is capable of detecting more complicated and generic events. Moreover, the curse of reality, i.e., precisely labeled multimedia content is scarce, necessitates the study on how to attain respectable detection performance using only limited positive examples. Research addressing these two aforementioned issues is still in its infancy. In light of this, we explore Ad Hoc MED, which aims to detect complicated and generic events by using few positive examples. To the best of our knowledge, our work makes the first attempt on this topic. As the information from these few positive examples is limited, we propose to infer knowledge from other multimedia resources to facilitate event detection. Experiments are performed on real-world multimedia archives consisting of several challenging events. The results show that our approach outperforms several other detection algorithms. Most notably, our algorithm outperforms SVM by 43% and 14% comparatively in Average Precision when using Gaussian and χ 2 kernel respectively.
Zero-Example Event Search using MultiModal Pseudo Relevance Feedback
"... We propose a novel method MultiModal Pseudo Relevance Feedback (MMPRF) for event search in video, which re-quires no search examples from the user. Pseudo Relevance Feedback has shown great potential in retrieval tasks, but previous works are limited to unimodal tasks with only a single ranked list. ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
We propose a novel method MultiModal Pseudo Relevance Feedback (MMPRF) for event search in video, which re-quires no search examples from the user. Pseudo Relevance Feedback has shown great potential in retrieval tasks, but previous works are limited to unimodal tasks with only a single ranked list. To tackle the event search task which is inherently multimodal, our proposed MMPRF takes ad-vantage of multiple modalities and multiple ranked lists to enhance event search performance in a principled way. The approach is unique in that it leverages not only semantic features, but also non-semantic low-level features for event search in the absence of training data. Evaluated on the TRECVID MEDTest dataset, the approach improves the baseline by up to 158 % in terms of the mean average preci-sion. It also significantly contributes to CMU Team’s final
Querying for Video Events by Semantic Signatures from Few Examples
"... We aim to query web video for complex events using only a handful of video query examples, where the standard ap-proach learns a ranker from hundreds of examples. We consider a semantic signature representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appea ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
We aim to query web video for complex events using only a handful of video query examples, where the standard ap-proach learns a ranker from hundreds of examples. We consider a semantic signature representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appearance of events. Since it is unknown what similarity metric and query fusion to use in such an event retrieval setting, we perform three experiments on uncon-strained web videos from the TRECVID event detection task. It reveals that: retrieval with semantic signatures us-ing normalized correlation as similarity metric outperforms a low-level bag-of-words alternative, multiple queries are best combined using late fusion with an average operator, and event retrieval is preferred over event classification when less than eight positive video examples are available.
E-lamp: integration of innovative ideas for multimedia event detection.
- In Machine Vision and Applications,
, 2014
"... Abstract Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
(Show Context)
Abstract Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) system to deal with different aspects of the overall problem of event detection. More specifically, we have developed efficient methods for feature extraction so that we are able to handle large collections of video data with thousands of hours of videos. Second, we represent the extracted raw features in a spatial bag-of-words model with more effective tilings such that the spatial layout information of different features and different events can be better captured, thus the overall detection performance can be improved. Third, different from widely used early and late fusion schemes, a novel algorithm is developed to learn a more robust and discriminative intermediate feature representation from multiple features so that better event models can be built upon it. Finally, to tackle the additional challenge of event detection with only very few positive exemplars, we have developed a novel algorithm which is able to effectively adapt the knowledge learnt from auxiliary sources to assist the event detection. Both our empirical results and the official evaluation results on TRECVID MED'11 and MED'12 demonstrate the excellent performance of the integration of these ideas.
Detection Bank: An Object Detection Based Video Representation for Multimedia Event Recognition
"... While low-level image features have proven to be e↵ective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve highlevel visual tasks such as multimedia event detection and recognit ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
While low-level image features have proven to be e↵ective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve highlevel visual tasks such as multimedia event detection and recognition. Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence. In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by di↵erent statistics derived from these detections. This representation is extended to video by aggregating the key frame level image representations through mean and max pooling. We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank. These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data.
Feature weighting via optimal thresholding for video analysis
- In ICCV
"... Fusion of multiple features can boost the performance of large-scale visual classification and detection tasks like TRECVID Multimedia Event Detection (MED) competi-tion [1]. In this paper, we propose a novel feature fusion approach, namely Feature Weighting via Optimal Thresh-olding (FWOT) to effec ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
Fusion of multiple features can boost the performance of large-scale visual classification and detection tasks like TRECVID Multimedia Event Detection (MED) competi-tion [1]. In this paper, we propose a novel feature fusion approach, namely Feature Weighting via Optimal Thresh-olding (FWOT) to effectively fuse various features. FWOT learns the weights, thresholding and smoothing parame-ters in a joint framework to combine the decision values obtained from all the individual features and the early fu-sion. To the best of our knowledge, this is the first work to consider the weight and threshold factors of fusion prob-lem simultaneously. Compared to state-of-the-art fusion al-gorithms, our approach achieves promising improvements on HMDB [8] action recognition dataset and CCV [5] video classification dataset. In addition, experiments on two TRECVID MED 2011 collections show that our approach outperforms the state-of-the-art fusion methods for complex event detection. 1.
Modeling Concept Dependencies for Event Detection
"... Event detection is a recent and challenging task. The aim is to retrieve the relevant videos given an event description. A set of training examples associated with the events are gen-erally provided as well, since retrieving relevant videos from textual queries solely is not feasible. Early attempts ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Event detection is a recent and challenging task. The aim is to retrieve the relevant videos given an event description. A set of training examples associated with the events are gen-erally provided as well, since retrieving relevant videos from textual queries solely is not feasible. Early attempts of event detection are based on low-level features. High level features such as concepts for event detection have been introduced as an alternative to low-level features since high-level features provide semantically richer information. In this work, we focus on object-based concepts and exploit their dependen-cies using a Markov Random Field (MRF) based model for event detection. This enables us to model likelihood of con-cepts, either pairwise or individually, present in the videos. Here, we propose a method incorporating the strengths of concepts and MRF based model for event detection task. We evaluate our models on an Multimedia Event Detec-tion (MED) dataset from NIST’s 2011 TRECVID Multime-dia, which consists of approximately 45,000 unconstrained videos. This type of work is beneficial from several respects. First, we focus on the task of concept-based event detection using a very large number of unconstrained Youtube videos. Second, we introduce the application of MRF’s for the event detection purpose, which can further be enhanced incorpo-rating other features or temporal information. At last but not means least, we exploit the occurrence and co-occurrence of object based concepts for event detection that enables us to reveal interactions of such concepts in the video level. Experimental results show that revealing these interactions provide promising event detection results.
Few-Example Video Event Retrieval using Tag Propagation
"... An emerging topic in multimedia retrieval is to detect a com-plex event in video using only a handful of video examples. Di↵erent from existing work which learns a ranker from pos-itive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few vi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
An emerging topic in multimedia retrieval is to detect a com-plex event in video using only a handful of video examples. Di↵erent from existing work which learns a ranker from pos-itive video examples and hundreds of negative examples, we aim to query web video for events using zero or only a few visual examples. To that end, we propose in this paper a tag-based video retrieval system which propagates tags from a tagged video source to an unlabeled video collection with-out the need of any training examples. Our algorithm is based on weighted frequency neighbor voting using concept vector similarity. Once tags are propagated to unlabeled video we can rely on o↵-the-shelf language models to rank these videos by the tag similarity. We study the behavior of our tag-based video event retrieval system by performing three experiments on web videos from the TRECVID mul-timedia event detection corpus, with zero, one and multiple query examples that beats a recent alternative.
1Feature Fusion for Efficient Content-Based Video Retrieval
"... Abstract—Content-based video retrieval is a complex task because of the large amount of information in single items and because databases of videos can be very large. In this paper we explore a possible solution for efficient similar item retrieval. In our experiments we combine relevant feature set ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Content-based video retrieval is a complex task because of the large amount of information in single items and because databases of videos can be very large. In this paper we explore a possible solution for efficient similar item retrieval. In our experiments we combine relevant feature sets together with a learned Mahalanobis metric while using an efficient nearest neighbor search algorithm. The efficient nearest neighbor algorithms we compare are Locality Sensitive Hashing and Vantage Point trees. The two options are compared to several baseline systems in the general video retrieval framework. We used three sets of features to test the system: SURF features, color histograms and topics. The topics where extracted using a Latent Dirichlet Allocation topic model. We show that fusing the individual feature sets with a learned metric improves the performance upon the best individual feature set. The feature fusion can be combined with an efficient nearest neighbor search algorithm to reduce the number of exact distance computations with limited impact on retrieval performance. Index Terms—Content-based video retrieval, feature fusion, metric learning, efficient retrieval, nearest neighbor search, locality sensitive hashing, vantage point trees I.