Results 1 - 10
of
20
Scene segmentation for behaviour correlation
- In ECCV
, 2008
"... Abstract. This paper presents a novel framework for detecting abnormal pedestrian and vehicle behaviour by modelling cross-correlation among different co-occurring objects both locally and globally in a given scene. We address this problem by first segmenting a scene into semantic regions according ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Abstract. This paper presents a novel framework for detecting abnormal pedestrian and vehicle behaviour by modelling cross-correlation among different co-occurring objects both locally and globally in a given scene. We address this problem by first segmenting a scene into semantic regions according to how object events occur globally in the scene, and second modelling concurrent correlations among regional object events both locally (within the same region) and globally (across different regions). Instead of tracking objects, the model represents behaviour based on classification of atomic video events, designed to be more suitable for analysing crowded scenes. The proposed system works in an unsupervised manner throughout using automatic model order selection to estimate its parameters given video data of a scene for a brief training period. We demonstrate the effectiveness of this system with experiments on public road traffic data. 1
Global behaviour inference using probabilistic latent semantic analysis
- in: British Machine Vision Conference
, 2008
"... We present a novel framework for inferring global behaviour patterns through modelling behaviour correlations in a wide-area scene and detecting any anomaly in behaviours occurring both locally and globally. Specifically, we propose a semantic scene segmentation model to decompose a wide-area scene ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
We present a novel framework for inferring global behaviour patterns through modelling behaviour correlations in a wide-area scene and detecting any anomaly in behaviours occurring both locally and globally. Specifically, we propose a semantic scene segmentation model to decompose a wide-area scene into regions where behaviours share similar characteristic and are represented as classes of video events bearing similar features. To model behavioural correlations globally, we investigate both a probabilistic Latent Semantic Analysis (pLSA) model and a two-stage hierarchical pLSA model for global behaviour inference and anomaly detection. The proposed framework is validated by experiments using complex crowded outdoor scenes. 1
Spectral clustering with eigenvector selection
"... The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a dat ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The task of discovering natural groupings of input patterns, or clustering, is an important aspect of machine learning and pattern analysis. In this paper, we study the widely used spectral clustering algorithm which clusters data using eigenvectors of a similarity/affinity matrix derived from a data set. In particular, we aim to solve two critical issues in spectral clustering: (1) how to automatically determine the number of clusters, and (2) how to perform effective clustering given noisy and sparse data. An analysis of the characteristics of eigenspace is carried out which shows that (a) not every eigenvectors of a data affinity matrix is informative and relevant for clustering; (b) eigenvector selection is critical because using uninformative/irrelevant eigenvectors could lead to poor clustering results; and (c) the corresponding eigenvalues cannot be used for relevant eigenvector selection given a realistic data set. Motivated by the analysis, a novel spectral clustering algorithm is proposed which differs from previous approaches in that only informative/relevant eigenvectors are employed for determining the number of clusters and performing clustering. The key element of the proposed algorithm is a simple but effective relevance learning method which measures the relevance of an eigenvector according to how well it can separate the data set into different clusters. Our algorithm was evaluated using synthetic data sets as well as real-world data sets generated from two challenging visual learning problems. The results demonstrated that our algorithm is able to estimate the cluster number correctly and reveal natural grouping of the input data/patterns even given sparse and noisy data.
Learning spatiotemporal graphs of human activities
- In ICCV
, 2011
"... Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total number, and temporal relations (e.g., allow only followed-by), and then only estimates their relative significance for act ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total number, and temporal relations (e.g., allow only followed-by), and then only estimates their relative significance for activity recognition. We advance prior work by learning what activity parts and their spatiotemporal relations should be captured to represent the activity, and how relevant they are for enabling efficient inference in realistic videos. We represent videos by spatiotemporal graphs, where nodes correspond to multiscale video segments, and edges capture their hierarchical, temporal, and spatial relationships. Access to video segments is provided by our new, multiscale segmenter. Given a set of training spatiotemporal graphs, we learn their archetype graph, and pdf’s associated with model nodes and edges. The model adaptively learns from data relevant video segments and their relations, addressing the “what ” and “how. ” Inference and learning are formulated within the same framework – that of a robust, least-squares optimization – which is invariant to arbitrary permutations of nodes in spatiotemporal graphs. The model is used for parsing new videos in terms of detecting and localizing relevant activity parts. We outperform the state of the art on benchmark Olympic and UT human-interaction datasets, under a favorable complexityvs.-accuracy trade-off. 1.
Incremental and Adaptive Abnormal Behaviour Detection
"... We develop a novel visual behaviour modelling approach that performs incremental and adaptive model learning for online abnormality detection in a visual surveillance scene. The approach has the following key features that make it advantageous over previous ones: (1) Fully unsupervised learning: bot ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We develop a novel visual behaviour modelling approach that performs incremental and adaptive model learning for online abnormality detection in a visual surveillance scene. The approach has the following key features that make it advantageous over previous ones: (1) Fully unsupervised learning: both feature extraction for behaviour pattern representation and model construction are carried out without the laborious and unreliable process of data labelling. (2) Robust abnormality detection: using Likelihood Ratio Test (LRT) for abnormality detection, the proposed approach is robust to noise in behaviour representa-tion. (3) Online and incremental model construction: after being initialised using a small bootstrapping dataset, our behaviour model is learned incrementally whenever a new behaviour pattern is captured. This makes our approach computationally efficient and suitable for real-time applications. (4) Model adaptation to reflect changes in visual context. Online model structure adaptation is performed to accommodate changes in the definition of normality/abnormality caused by visual context changes. This caters for the need to reclassify what may initially be considered as being abnormal to be normal over time, and vice versa. These features are not only desirable but also necessary for processing large vol-ume of unlabelled surveillance video data with visual context changing over time. The effectiveness and robustness of our approach are demonstrated through experiments using noisy datasets collected from a real world surveillance scene. The experimental results show that our incremental and adaptive behaviour modelling approach is superior to a conventional batch-mode one in terms of both performance on abnormality detection and computational efficiency.
Retrieving Actions in Group Contexts
"... Abstract. We develop methods for action retrieval from surveillance video using contextual feature representations. The novelty of our proposed approach is two-fold. First, we introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information abou ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract. We develop methods for action retrieval from surveillance video using contextual feature representations. The novelty of our proposed approach is two-fold. First, we introduce a new feature representation called the action context (AC) descriptor. The AC descriptor encodes information about not only the action of an individual person in the video, but also the behaviour of other people nearby. This feature representation is inspired by the fact that the context of what other people are doing provides very useful cues for recognizing the actions of each individual. Second, we formulate our problem as a retrieval/ranking task, which is different from previous work on action classification. We develop an action retrieval technique based on rank-SVM, a state-of-the-art approach for solving ranking problems. We apply our proposed approach on two real-world datasets. The first dataset consists of videos of multiple people performing several group activities. The second dataset consists of surveillance videos from a nursing home environment. Our experimental results show the advantage of using contextual information for disambiguating different actions and the benefit of using rank-SVMs instead of regular SVMs for video retrieval problems. 1
VISUAL HUMAN TRACKING AND GROUP ACTIVITY ANALYSIS: A VIDEO MINING SYSTEM FOR RETAIL MARKETING
, 2007
"... ii ..."
Automated Detection and Classification of Positive vs. Negative Robot Interactions With Children With Autism Using Distance-Based Features
"... Recent feasibility studies involving children with autism spectrum disorders (ASD) interacting with socially assistive robots have shown that some children have positive reactions to robots, while others may have negative reactions. It is unlikely that children with ASD will enjoy any robot 100 % of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Recent feasibility studies involving children with autism spectrum disorders (ASD) interacting with socially assistive robots have shown that some children have positive reactions to robots, while others may have negative reactions. It is unlikely that children with ASD will enjoy any robot 100 % of the time. It is therefore important to develop methods for detecting negative child behaviors in order to minimize distress and facilitate effective human-robot interaction. Our past work has shown that negative reactions can be readily identified and classified by a human observer from overhead video data alone, and that an automated position tracker combined with human-determined heuristics can differentiate between the two classes of reactions. This paper describes and validates an improved, non-heuristic method for determining if a child is interacting positively or negatively with a robot, based on Gaussian mixture models (GMM) and a naive-Bayes classifier of overhead camera observations. The approach achieves a 91.4 % accuracy rate in classifying robot interaction, parent interaction, avoidance, and hiding against the wall behaviors and demonstrates that these classes are sufficient for distinguishing between positive and negative reactions of the child to the robot.
Scene understanding: perception, multi-sensor fusion, spatio-temporal reasoning and activity recognition.
, 2007
"... ..."
Multi-layered Decomposition of Recurrent Scenes
"... Abstract. There is considerable interest in techniques capable of identifying anomalies and unusual events in busy outdoor scenes, e.g. road junctions. Many approaches achieve this by exploiting deviations in spatial appearance from some expected norm accumulated by a model over time. In this work w ..."
Abstract
- Add to MetaCart
Abstract. There is considerable interest in techniques capable of identifying anomalies and unusual events in busy outdoor scenes, e.g. road junctions. Many approaches achieve this by exploiting deviations in spatial appearance from some expected norm accumulated by a model over time. In this work we show that much can be gained from explicitly modelling temporal aspects in detail. Specifically, many traffic junctions are regulated by lights controlled by a timing device of considerable precision, and it is in these situations that we advocate a model which learns periodic spatio-temporal patterns with a view to highlighting anomalous events such as broken-down vehicles, traffic accidents, or pedestrians jaywalking. More specifically, by estimating autocovariance of self-similarity, used previously in the context gait recognition, we characterize a scene by identifying a global fundamental period. As our model, we introduce a spatio-temporal grid of histograms built in accordance with some chosen feature. This model is then used to classify objects found in subsequent test data. In particular we demonstrate the effect of such characterization experimentally by monitoring the bounding box aspect ratio and optical flow field of objects detected on a road traffic junction, enabling our model to discriminate between people and cars sufficiently well to provide useful warnings of adverse behaviour in real time. 1

