Results 1 - 10
of
13
SALIENT MOTION DETECTION IN CROWDED SCENES
"... To reduce cognitive overload in CCTV monitoring, it is critical to have an automated way to focus the attention of operators on interesting events taking place in crowded public scenes. We present a global motion saliency detection method based on spectral analysis, which aims to discover and locali ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
To reduce cognitive overload in CCTV monitoring, it is critical to have an automated way to focus the attention of operators on interesting events taking place in crowded public scenes. We present a global motion saliency detection method based on spectral analysis, which aims to discover and localise interesting regions, of which the flows are salient in relation to the dominant crowd flows. The method is fast and does not rely on prior knowledge specific to a scene and any training videos. We demonstrate its potential on public scene videos, with applications in salient action detection, counter flow detection, and unstable crowd flow detection. 1.
L0 Regularized Stationary Time Estimation for Crowd Group Analysis
"... We tackle stationary crowd analysis in this paper, which is similarly important as modeling mobile groups in crowd scenes and finds many applications in surveillance. Our key contribution is to propose a robust algorithm of estimat-ing how long a foreground pixel becomes stationary. It is much more ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
We tackle stationary crowd analysis in this paper, which is similarly important as modeling mobile groups in crowd scenes and finds many applications in surveillance. Our key contribution is to propose a robust algorithm of estimat-ing how long a foreground pixel becomes stationary. It is much more challenging than only subtracting background because failure at a single frame due to local movement of objects, lighting variation, and occlusion could lead to large errors on stationary time estimation. To accomplish decent results, sparse constraints along spatial and tempo-ral dimensions are jointly added by mixed partials to shape a 3D stationary time map. It is formulated as a L0 opti-mization problem. Besides background subtraction, it distinguishes among different foreground objects, which are close or overlapped in the spatio-temporal space by using a locally shared fore-ground codebook. The proposed technologies are used to detect four types of stationary group activities and analyze crowd scene structures. We provide the first public bench-mark dataset1 for stationary time estimation and stationary group analysis.
From Semi-Supervised to Transfer Counting of Crowds
"... Regression-based techniques have shown promising re-sults for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustivel ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Regression-based techniques have shown promising re-sults for people counting in crowded scenes. However, most existing techniques require expensive and laborious data annotation for model training. In this study, we propose to address this problem from three perspectives: (1) Instead of exhaustively annotating every single frame, the most in-formative frames are selected for annotation automatically and actively. (2) Rather than learning from only labelled data, the abundant unlabelled data are exploited. (3) La-belled data from other scenes are employed to further al-leviate the burden for data annotation. All three ideas are implemented in a unified active and semi-supervised regres-sion framework with ability to perform transfer learning, by exploiting the underlying geometric structure of crowd pat-terns via manifold analysis. Extensive experiments validate the effectiveness of our approach. 1.
PROFILING STATIONARY CROWD GROUPS
"... Detecting stationary crowd groups and analyzing their behav-iors have important applications in crowd video surveillance, but have rarely been studied. The contributions of this pa-per are in two aspects. First, a stationary crowd detection algorithm is proposed to estimate the stationary time of fo ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
Detecting stationary crowd groups and analyzing their behav-iors have important applications in crowd video surveillance, but have rarely been studied. The contributions of this pa-per are in two aspects. First, a stationary crowd detection algorithm is proposed to estimate the stationary time of fore-ground pixels. It employs spatial-temporal filtering and mo-tion filtering in order to be robust to noise caused by occlu-sions and crowd clutters. Second, in order to characterize the emergence and dispersal processes of stationary crowds and their behaviors during the stationary periods, three attributes are proposed for quantitative analysis. These attributes are recognized with a set of proposed crowd descriptors which extract visual features from the results of stationary crowd detection. The effectiveness of the proposed algorithms is shown through experiments on a benchmark dataset. Index Terms — Stationary crowd detection, stationary crowd analysis, crowd video surveillance
Crowd Counting and Profiling: Methodology and Evaluation
"... Abstract Video imagery based crowd analysis for population profiling and density estimation in public spaces can be a highly effective tool for establishing global situational awareness. Different strategies such as counting by detection and counting by clustering have been proposed, and more recent ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract Video imagery based crowd analysis for population profiling and density estimation in public spaces can be a highly effective tool for establishing global situational awareness. Different strategies such as counting by detection and counting by clustering have been proposed, and more recently counting by regression has also gained considerable interest due to its feasibility in handling relatively more crowded environments. However, the scenarios studied by existing regression-based techniques are rather diverse in terms of both evaluation data and experimental settings. It can be difficult to compare them in order to draw general conclusions on their effectiveness. In addition, contributions of individual components in the processing pipeline such as feature extraction and perspective normalisation remain unclear and less well studied. This study describes and compares the state-of-the-art methods for video imagery based crowd counting, and provides a systematic evaluation of different methods using the same protocol. Moreover, we evaluate critically each processing component to identify potential bottlenecks encountered by existing techniques. Extensive evaluation is conducted on three public scene datasets, including a new shopping centre environment with labelled ground truth for validation. Our study reveals new insights into solving the problem of crowd analysis for population profiling and density estimation, and considers open questions for future studies.
Joint Inference of Groups, Events and Human Roles in Aerial Videos
"... With the advent of drones, aerial video analysis becomes increasingly important; yet, it has received scant attention in the literature. This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of 1) grouping, 2) recognizing events and 3) assign-ing ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
With the advent of drones, aerial video analysis becomes increasingly important; yet, it has received scant attention in the literature. This paper addresses a new problem of parsing low-resolution aerial videos of large spatial areas, in terms of 1) grouping, 2) recognizing events and 3) assign-ing roles to people engaged in events. We propose a novel framework aimed at conducting joint inference of the above tasks, as reasoning about each in isolation typically fails in our setting. Given noisy tracklets of people and detections of large objects and scene surfaces (e.g., building, grass), we use a spatiotemporal AND-OR graph to drive our joint inference, using Markov Chain Monte Carlo and dynamic programming. We also introduce a new formalism of spa-tiotemporal templates characterizing latent sub-events. For evaluation, we have collected and released a new aerial videos dataset using a hex-rotor flying over picnic areas rich with group events. Our results demonstrate that we successfully address above inference tasks under challeng-ing conditions.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 1 Discovery of Shared Semantic Spaces for Multi-Scene Video Query and Summarization
"... Abstract—The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, be-haviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene u ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—The growing rate of public space CCTV installations has generated a need for automated methods for exploiting video surveillance data including scene understanding, query, be-haviour annotation and summarization. For this reason, extensive research has been performed on surveillance scene understanding and analysis. However, most studies have considered single scenes, or groups of adjacent scenes. The semantic similarity between different but related scenes (e.g., many different traffic scenes of similar layout) is not generally exploited to improve any auto-mated surveillance tasks and reduce manual effort. Exploiting commonality, and sharing any supervised annotations, between different scenes is however challenging due to: Some scenes are totally un-related – and thus any information sharing between them would be detrimental; while others may only share a subset of common activities – and thus information sharing is only useful if it is selective. Moreover, semantically similar activities which should be modelled together and shared across scenes may have quite different pixel-level appearance in each scene. To address these issues we develop a new framework for distributed multiple-scene global understanding that clusters surveillance scenes by their ability to explain each other’s behaviours; and further discovers which subset of activities are shared versus scene-specific within each cluster. We show how to use this structured representation of multiple scenes to improve common surveillance tasks including scene activity understanding, cross-scene query-by-example, behaviour classification with reduced supervised labelling requirements, and video summarization. In each case we demonstrate how our multi-scene model improves on a collection of standard single scene models and a flat model of all scenes.
Article Traffic Behavior Recognition Using the Pachinko Allocation Model
, 2015
"... sensors ..."
(Show Context)
1Multi-Source Video Summarisation
"... Abstract—Many visual surveillance tasks, e.g.video summarisation, is conventionally accomplished through analysing imagery-based features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—Many visual surveillance tasks, e.g.video summarisation, is conventionally accomplished through analysing imagery-based features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are often not sufficiently trustworthy and events of interest can be subtle. On the other hand, non-visual data sources such as weather reports and traffic sensory signals are readily accessible but are not explored jointly to complement visual data for video content analysis and summarisation. In this paper, we present a novel unsupervised framework to learn jointly from both visual and independently-drawn non-visual data sources for discovering meaningful latent structure of surveillance video data. In particular, we investigate ways to cope with discrepant dimension and representation whist associating these heterogeneous data sources, and derive effective mechanism to tolerate with missing and incomplete data from different sources. We show that the proposed multi-source learning framework not only achieves better video content clustering than state-of-the-art methods, but also is capable of accurately inferring missing non-visual semantics from previously unseen videos. In addition, a comprehensive user study is conducted to validate the quality of video summarisation generated using the proposed multi-source model. Index Terms—Multi-source data, heterogeneous data, visual surveillance, clustering, event recognition, video summarisation. F 1