Results 1 - 10
of
19
E-lamp: integration of innovative ideas for multimedia event detection.
- In Machine Vision and Applications,
, 2014
"... Abstract Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
(Show Context)
Abstract Detecting multimedia events in web videos is an emerging hot research area in the fields of multimedia and computer vision. In this paper, we introduce the core methods and technologies of the framework we developed recently for our Event Labeling through Analytic Media Processing (E-LAMP) system to deal with different aspects of the overall problem of event detection. More specifically, we have developed efficient methods for feature extraction so that we are able to handle large collections of video data with thousands of hours of videos. Second, we represent the extracted raw features in a spatial bag-of-words model with more effective tilings such that the spatial layout information of different features and different events can be better captured, thus the overall detection performance can be improved. Third, different from widely used early and late fusion schemes, a novel algorithm is developed to learn a more robust and discriminative intermediate feature representation from multiple features so that better event models can be built upon it. Finally, to tackle the additional challenge of event detection with only very few positive exemplars, we have developed a novel algorithm which is able to effectively adapt the knowledge learnt from auxiliary sources to assist the event detection. Both our empirical results and the official evaluation results on TRECVID MED'11 and MED'12 demonstrate the excellent performance of the integration of these ideas.
Anomaly Detection and Localization in Crowded Scenes
"... Abstract—The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and dynamics, using a set of mixture of dynam ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract—The detection and localization of anomalous behaviors in crowded scenes is considered, and a joint detector of temporal and spatial anomalies is proposed. The proposed detector is based on a video representation that accounts for both appearance and dynamics, using a set of mixture of dynamic textures models. These models are used to implement 1) a center-surround discriminant saliency detector that produces spatial saliency scores, and 2) a model of normal behavior that is learned from training data and produces temporal saliency scores. Spatial and temporal anomaly maps are then defined at multiple spatial scales, by considering the scores of these operators at progressively larger regions of support. The multiscale scores act as potentials of a conditional random field that guarantees global consistency of the anomaly judgments. A data set of densely crowded pedestrian walkways is introduced and used to evaluate the proposed anomaly detector. Experiments on this and other data sets show that the latter achieves state-of-the-art anomaly detection results. Index Terms—Video analysis, surveillance, anomaly detection, crowded scene, dynamic texture, center-surround saliency Ç 1
Exploiting sparse representations for robust analysis of noisy complex video scenes
- in Proc. of the 12th European Conf. on Computer Vision, Volume Part VI
, 2012
"... Abstract. Recent works have shown that, even with simple low level visual cues, complex behaviors can be extracted automatically from crowded scenes, e.g. those depicting public spaces recorded from video surveillance cameras. However, low level features as optical flow or fore-ground pixels are inh ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract. Recent works have shown that, even with simple low level visual cues, complex behaviors can be extracted automatically from crowded scenes, e.g. those depicting public spaces recorded from video surveillance cameras. However, low level features as optical flow or fore-ground pixels are inherently noisy. In this paper we propose a novel unsupervised learning approach for the analysis of complex scenes which is specifically tailored to cope directly with features ’ noise and uncer-tainty. We formalize the task of extracting activity patterns as a matrix factorization problem, considering as reconstruction function the robust Earth Mover’s Distance. A constraint of sparsity on the computed basis matrix is imposed, filtering out noise and leading to the identification of the most relevant elementary activities in a typical high level behavior. We further derive an alternate optimization approach to solve the pro-posed problem efficiently and we show that it is reduced to a sequence of linear programs. Finally, we propose to use short trajectory snippets to account for object motion information, in alternative to the noisy optical flow vectors used in previous works. Experimental results demonstrate that our method yields similar or superior performance to state-of-the arts approaches. 1
A Unified Framework for Event Summarization and Rare Event Detection
"... A novel approach for event summarization and rare event detection is proposed. Unlike conventional methods that deal with event summarization and rare event detection independently, we solve them together by transforming the problems into a graph editing framework. In our approach, a video is repres ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
A novel approach for event summarization and rare event detection is proposed. Unlike conventional methods that deal with event summarization and rare event detection independently, we solve them together by transforming the problems into a graph editing framework. In our approach, a video is represented as a graph, in which each node of the graph indicates an event obtained by segmenting the video spatially and temporally, while edges between nodes describe the events related to each other. Based on the degree of relations, edges have different weights. After learning the graph structure, our method edits the graph by merging its subgraphs or pruning its edges. The graph is edited toward minimizing a predefined energy model with the Data-Driven Markov Chain Monte Carlo method. The energy model consists of several parameters that represent causality, frequency, and significance of events. We design a specific energy model utilizing these parameters to satisfy each objective of event summarization and rare event detection. Experimental results show that the proposed approach accurately summarizes a video in a fully unsupervised manner. Moreover, the experiments also demonstrate that the approach is advantageous in detecting the rare transition of events. 1.
Novelty detection in images by sparse representations
- in Proceedings of IEEE Symposium on Intelligent Embedded Systems (IES
"... Abstract—We address the problem of automatically detecting anomalies in images, i.e., patterns that do not conform to those appearing in a reference training set. This is a very important feature for enabling an intelligent system to autonomously check the validity of acquired data, thus performing ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—We address the problem of automatically detecting anomalies in images, i.e., patterns that do not conform to those appearing in a reference training set. This is a very important feature for enabling an intelligent system to autonomously check the validity of acquired data, thus performing a preliminary, automatic, diagnosis. We approach this problem in a patch-wise manner, by learning a model to represent patches belonging to a training set of normal images. Here, we consider a model based on sparse representations, and we show that jointly monitoring the sparsity and the reconstruction error of such representation substantially improves the detection performance with respect to other ap-proaches leveraging sparse models. As an illustrative application, we consider the detection of anomalies in scanning electron microscope (SEM) images, which is essential for supervising the production of nanofibrous materials. I.
Learning Multi-level Sparse Representations
"... Bilinear approximation of a matrix is a powerful paradigm of unsupervised learn-ing. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neuro-sciences image sequence considered here, there are the seman ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Bilinear approximation of a matrix is a powerful paradigm of unsupervised learn-ing. In some applications, however, there is a natural hierarchy of concepts that ought to be reflected in the unsupervised analysis. For example, in the neuro-sciences image sequence considered here, there are the semantic concepts of pixel → neuron → assembly that should find their counterpart in the unsupervised anal-ysis. Driven by this concrete problem, we propose a decomposition of the matrix of observations into a product of more than two sparse matrices, with the rank de-creasing from lower to higher levels. In contrast to prior work, we allow for both hierarchical and heterarchical relations of lower-level to higher-level concepts. In addition, we learn the nature of these relations rather than imposing them. Finally, we describe an optimization scheme that allows to optimize the decomposition over all levels jointly, rather than in a greedy level-by-level fashion. The proposed bilevel SHMF (sparse heterarchical matrix factorization) is the first formalism that allows to simultaneously interpret a calcium imaging sequence in terms of the constituent neurons, their membership in assemblies, and the time courses of both neurons and assemblies. Experiments show that the proposed model fully recovers the structure from difficult synthetic data designed to imitate the experimental data. More importantly, bilevel SHMF yields plausible interpre-tations of real-world Calcium imaging data. 1
A Stream Algebra for Computer Vision Pipelines
"... Abstract—Recent interest in developing online computer vision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images and video streams. Online vision algorithms for managing, processing and ana ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Recent interest in developing online computer vision algorithms is spurred in part by a growth of applications capable of generating large volumes of images and videos. These applications are rich sources of images and video streams. Online vision algorithms for managing, processing and analyzing these streams need to rely upon streaming concepts, such as pipelines, to ensure timely and incremental processing of data. This paper is a first attempt at defining a formal stream algebra that provides a mathematical description of vision pipelines and describes the distributed manipulation of image and video streams. We also show how our algebra can effectively describe the vision pipelines of two state of the art techniques.
Neuromorphic Bayesian Surprise for Far-Range Event Detection
"... In this paper we address the problem of detecting small, rare events in very high resolution, far-field video streams. Rather than learning color distributions for individual pixels, our method utilizes a uniquely structured network of Bayesian learning units which compute a combined measure of “sur ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper we address the problem of detecting small, rare events in very high resolution, far-field video streams. Rather than learning color distributions for individual pixels, our method utilizes a uniquely structured network of Bayesian learning units which compute a combined measure of “surprise ” across multiple spatial and temporal scales on various visual features. The features used, as well as the learning rules for these units are derived from recent work in computational neuroscience. We test the system extensively on both real and virtual data, and show that it outperforms a standard foreground/background segmentation approach as well as a standard visual saliency algorithm. 1.
Panic Detection in Human Crowds using Sparse Coding
"... c ○ Abhishek Kumar 2012I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. Recently, the surveillance of huma ..."
Abstract
- Add to MetaCart
(Show Context)
c ○ Abhishek Kumar 2012I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. Recently, the surveillance of human activities has drawn a lot of attention from the research community and the camera based surveillance is being tried with the aid of computers. Surveillance is required to detect abnormal or unwanted activities. Such abnormal activities are very infrequent as compared to regular activities. At present, surveillance is done manually, where the job of operators is to watch a set of surveillance video screens to discover an abnormal event. This is expensive and prone to error. The limitation of these surveillance systems can be effectively removed if an automated anomaly detection system is designed. With powerful computers, computer vision is being seen as a panacea for surveillance. A computer vision aided anomaly detection system will enable the selection of those video frames which contain an anomaly, and only those selected frames will be used for manual verifications.
Long-Range Spatio-Temporal Modeling of Video with Application to Fire Detection
"... Abstract. We describe a methodology for modeling backgrounds subject to significant variability over time-scales ranging from days to years, where the events of interest exhibit subtle variability relative to the normal mode. The motivating application is fire monitoring from remote stations, where ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We describe a methodology for modeling backgrounds subject to significant variability over time-scales ranging from days to years, where the events of interest exhibit subtle variability relative to the normal mode. The motivating application is fire monitoring from remote stations, where illumination changes spanning the day and the season, meteorological phenomena resembling smoke, and the absence of sufficient training data for the two classes make out-of-the-box classification algorithms ineffective. We exploit low-level descriptors, incorporate explicit modeling of nuisance variability, and learn the residual normalmodel variability. Our algorithm achieves state-of-the-art performance not only compared to other anomaly detection schemes, but also compared to human performance, both for untrained and trained operators. 1