Results 1 - 10
of
53
V.: What’s going on? Discovering spatiotemporal dependencies in dynamic scenes
- In: Proc. of the IEEE CVPR
, 2010
"... We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be ..."
Abstract
-
Cited by 75 (1 self)
- Add to MetaCart
(Show Context)
We present two novel methods to automatically learn spatio-temporal dependencies of moving agents in complex dynamic scenes. They allow to discover temporal rules, such as the right of way between different lanes or typical traffic light sequences. To extract them, sequences of activities need to be learned. While the first method extracts rules based on a learned topic model, the second model called DDP-HMM jointly learns co-occurring activities and their time dependencies. To this end we employ Dependent Dirichlet Processes to learn an arbitrary number of infinite Hidden Markov Models. In contrast to previous work, we build on state-of-the-art topic models that allow to automatically infer all parameters such as the optimal number of HMMs necessary to explain the rules governing a scene. The models are trained offline by Gibbs Sampling using unlabeled training data. 1.
Understanding collective crowd behaviors:learning a mixture model of dynamic pedestrian-agents
- IN: PROC. CVPR
, 2012
"... In this paper, a new Mixture model of Dynamic pedestrian-Agents (MDA) is proposed to learn the collective behavior patterns of pedestrians in crowded scenes. Col-lective behaviors characterize the intrinsic dynamics of the crowd. From the agent-based modeling, each pedestrian in the crowd is driven ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
(Show Context)
In this paper, a new Mixture model of Dynamic pedestrian-Agents (MDA) is proposed to learn the collective behavior patterns of pedestrians in crowded scenes. Col-lective behaviors characterize the intrinsic dynamics of the crowd. From the agent-based modeling, each pedestrian in the crowd is driven by a dynamic pedestrian-agent, which is a linear dynamic system with its initial and termination states reflecting a pedestrian’s belief of the starting point and the destination. Then the whole crowd is modeled as a mixture of dynamic pedestrian-agents. Once the model is unsupervisedly learned from real data, MDA can simu-late the crowd behaviors. Furthermore, MDA can well in-fer the past behaviors and predict the future behaviors of pedestrians given their trajectories only partially observed, and classify different pedestrian behaviors in the scene. The effectiveness of MDA and its applications are demonstrat-ed by qualitative and quantitative experiments on the video surveillance dataset collected from the New York Grand Central Station.
Random field topic model for semantic region analysis in crowded scenes from tracklets
- IN: IEEE CONFERENCE COMPUTER VISION AND PATTERN RECOGNITION
, 2011
"... In this paper, a Random Field Topic (RFT) model is pro-posed for semantic region analysis from motions of objects in crowded scenes. Different from existing approaches of learning semantic regions either from optical flows or from complete trajectories, our model assumes that fragments of trajectori ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
(Show Context)
In this paper, a Random Field Topic (RFT) model is pro-posed for semantic region analysis from motions of objects in crowded scenes. Different from existing approaches of learning semantic regions either from optical flows or from complete trajectories, our model assumes that fragments of trajectories (called tracklets) are observed in crowded scenes. It advances the existing Latent Dirichlet Allocation topic model, by integrating the Markov random fields (MR-F) as prior to enforce the spatial and temporal coherence between tracklets during the learning process. Two kinds of MRF, pairwise MRF and the forest of randomly span-ning trees, are defined. Another contribution of this model is to include sources and sinks as high-level semantic prior, which effectively improves the learning of semantic regions and the clustering of tracklets. Experiments on a large s-cale data set, which includes 40, 000+ tracklets collected from the crowded New York Grand Central station, show that our model outperforms state-of-the-art methods both on qualitative results of learning semantic regions and on quantitative results of clustering tracklets.
Probabilistic latent sequential motifs: Discovering temporal activity patterns in video scenes
- In British Machine Vision Conference (BMVC
, 2010
"... This paper introduces a novel probabilistic activity modeling approach that mines recurrent sequential patterns from documents given as word-time occurrences. In this model, documents are represented as a mixture of sequential activity motifs (or topics) and their starting occurrences. The novelties ..."
Abstract
-
Cited by 17 (8 self)
- Add to MetaCart
(Show Context)
This paper introduces a novel probabilistic activity modeling approach that mines recurrent sequential patterns from documents given as word-time occurrences. In this model, documents are represented as a mixture of sequential activity motifs (or topics) and their starting occurrences. The novelties are threefold. First, unlike previous ap-proaches where topics only modeled the co-occurrence of words at a given time instant, our topics model the co-occurrence and temporal order in which the words occur within a temporal window. Second, our model accounts for the important case where activities occur concurrently in the document. And third, our method explicitly models with latent variables the starting time of the activities within the documents, enabling to implicitly align the occurrences of the same pattern during the joint inference of the temporal topics and their starting times. The model and its robustness to the presence of noise have been validated on synthetic data. Its effectiveness is also illustrated in video activity analysis from low-level motion features, where the discovered topics capture frequent patterns that implicitly represent typical trajectories of scene objects. 1
Extracting and locating temporal motifs in video scenes using a hierarchical non parametric bayesian model
- in IEEE Conference on Computer Vision and Pattern Recognition
, 2011
"... In this paper, we present an unsupervised method for mining activities in videos. From unlabeled video sequences of a scene, our method can automatically recover what are the recurrent temporal activity patterns (or motifs) and when they occur. Using non parametric Bayesian methods, we are able to a ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
(Show Context)
In this paper, we present an unsupervised method for mining activities in videos. From unlabeled video sequences of a scene, our method can automatically recover what are the recurrent temporal activity patterns (or motifs) and when they occur. Using non parametric Bayesian methods, we are able to automatically find both the underlying number of motifs and the number of motif occurrences in each document. The model’s robustness is first validated on synthetic data. It is then applied on a large set of video data from state-of-the-art papers. We show that it can effectively recover temporal activities with high semantics for humans and strong temporal information. The model is also used for prediction where it is shown to be as efficient as other approaches. Although illustrated on video sequences, this model can be directly applied to various kinds of time series where multiple activities occur simultaneously. 1.
Incremental activity modelling in multiple disjoint cameras
- IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Abstract—Activity modeling and unusual event detection in a network of cameras is challenging, particularly when the camera views are not overlapped. We show that it is possible to detect unusual events in multiple disjoint cameras as context-incoherent patterns through incremental learning of time ..."
Abstract
-
Cited by 13 (7 self)
- Add to MetaCart
(Show Context)
Abstract—Activity modeling and unusual event detection in a network of cameras is challenging, particularly when the camera views are not overlapped. We show that it is possible to detect unusual events in multiple disjoint cameras as context-incoherent patterns through incremental learning of time delayed dependencies between distributed local activities observed within and across camera views. Specifically, we model multicamera activities using a Time Delayed Probabilistic Graphical Model (TD-PGM) with different nodes representing activities in different decomposed regions from different views and the directed links between nodes encoding their time delayed dependencies. To deal with visual context changes, we formulate a novel incremental learning method for modeling time delayed dependencies that change over time. We validate the effectiveness of the proposed approach using a synthetic data set and videos captured from a camera network installed at a busy underground station. Index Terms—Unusual event detection, multicamera activity modeling, time delay estimation, incremental structure learning. Ç 1
Earth mover’s prototypes: a convex learning approach for discovering activity patterns in dynamic scenes
- in IEEE Conf. on Computer Vision and Pattern Recognition
, 2011
"... Mining behaviors in complex scenes ..."
Bridging the past, present and future: Modeling scene activities from event relationships and global rules
- In CVPR
, 2012
"... This paper addresses the discovery of activities and learns the underlying processes that govern their occurrences over time in complex surveillance scenes. To this end, we propose a novel topic model that accounts for the two main factors that affect these occurrences: (1) the existence of global s ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
This paper addresses the discovery of activities and learns the underlying processes that govern their occurrences over time in complex surveillance scenes. To this end, we propose a novel topic model that accounts for the two main factors that affect these occurrences: (1) the existence of global scene states that regulate which of the activities can spontaneously occur; (2) local rules that link past activity occurrences to current ones with temporal lags. These complementary factors are mixed in the probabilistic generative process, thanks to the use of a binary random variable that selects for each activity occurrence which one of the above two factors is applicable. All model parameters are efficiently inferred using a collapsed Gibbs sampling inference scheme. Experiments on various datasets from the literature show that the model is able to capture temporal processes at multiple scales: the scene-level first order Markovian process, and causal relationships amongst activities that can be used to predict which activity can happen after another one, and after what delay, thus providing a rich interpretation of the scene’s dynamical content. 1.
Video parsing for abnormality detection
- In ICCV
, 2011
"... Detecting abnormalities in video is a challenging prob-lem since the class of all irregular objects and behaviors is infinite and thus no (or by far not enough) abnormal train-ing samples are available. Consequently, a standard set-ting is to find abnormalities without actually knowing what they are ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
(Show Context)
Detecting abnormalities in video is a challenging prob-lem since the class of all irregular objects and behaviors is infinite and thus no (or by far not enough) abnormal train-ing samples are available. Consequently, a standard set-ting is to find abnormalities without actually knowing what they are because we have not been shown abnormal exam-ples during training. However, although the training data does not define what an abnormality looks like, the main paradigm in this field is to directly search for individual ab-normal local patches or image regions independent of an-other. To address this problem we parse video frames by estab-lishing a set of hypotheses that jointly explain all the fore-ground while, at same time, trying to find normal training samples that explain the hypotheses. Consequently, we can avoid a direct detection of abnormalities. They are discov-ered indirectly as those hypotheses which are needed for covering the foreground without finding an explanation by normal samples for themselves. We present a probabilistic model that localizes abnormalities using statistical infer-ence. On the challenging dataset of [15] it outperforms the state-of-the-art by 7 % to achieve a frame-based abnormal-ity classification performance of 91 % and the localization performance improves by 32 % to 76%. 1.
A prototype learning framework using emd: Application to complex scenes analysis
- IEEE Trans. Pattern Anal. Mach. Intell
"... Abstract—In the last decades, many efforts have been devoted to develop methods for automatic scene understanding in the context of video surveillance applications. This paper presents a novel non-object centric approach for complex scene analysis. Similarly to previous methods, we use low-level cue ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
(Show Context)
Abstract—In the last decades, many efforts have been devoted to develop methods for automatic scene understanding in the context of video surveillance applications. This paper presents a novel non-object centric approach for complex scene analysis. Similarly to previous methods, we use low-level cues to individuate atomic activities and create clip histograms. Differently from recent works, the task of discovering high-level activity patterns is formulated as a convex prototype learning problem. This problem results into a simple linear program that can be solved efficiently with standard solvers. The main advantage of our approach is that, using as objective function the Earth Mover’s Distance (EMD), the similarity among elementary activities is taken into account in the learning phase. To improve scalability we also consider some variants of EMD adopting L1 as ground distance for one and two dimensional, linear and circular histograms. In these cases only the similarity between neighboring atomic activities, corresponding to adjacent histogram bins, is taken into account. Therefore we also propose an automatic strategy for sorting atomic activities. Experimental results on publicly available datasets show that our method compares favorably with state-of-the-art approaches, often outperforming them.