Results 1 -
8 of
8
Unsupervised learning of human action categories using spatial-temporal words
- In Proc. BMVC
, 2006
"... Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences ..."
Abstract
-
Cited by 161 (4 self)
- Add to MetaCart
Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences is very useful for a variety of tasks, such as video surveillance, objectlevel video summarization, video indexing, digital library organization, etc. However, it remains a challenging task for computers to achieve robust action recognition due to cluttered background, camera motion, occlusion, and geometric and photometric variances of objects. For example, in a live video of a skating competition, the skater moves rapidly across the rink, and the camera also moves to follow the skater. With moving camera, non-stationary background, and moving target, few vision algorithms could identify, categorize and
View-invariant modeling and recognition of human actions using grammars. Workshop on Dynamical Vision at ICCV’05
- In WDV
, 2005
"... In this paper, we represent human actions as short sequences of atomic body poses. The knowledge of body pose is stored only implicitly as a set of silhouettes seen from multiple viewpoints; no explicit 3D poses or body models are used, and individual body parts are not identified. Actions and their ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
In this paper, we represent human actions as short sequences of atomic body poses. The knowledge of body pose is stored only implicitly as a set of silhouettes seen from multiple viewpoints; no explicit 3D poses or body models are used, and individual body parts are not identified. Actions and their constituent atomic poses are extracted from a set of multiview multiperson video sequences by an automatic keyframe selection process, and are used to automatically construct a probabilistic context-free grammar (PCFG). Given a new single viewpoint video, we can parse it to recognize actions and changes in viewpoint simultaneously. Experimental results are provided. 1.
D.: Searching video for complex activities with finite state models
- In: IEEE Conf. on Computer Vision and Pattern Recognition
, 2007
"... We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We describe a method of representing human activities that allows a collection of motions to be queried without examples, using a simple and effective query language. Our approach is based on units of activity at segments of the body, that can be composed across space and across the body to produce complex queries. The presence of search units is inferred automatically by tracking the body, lifting the tracks to 3D and comparing to models trained using motion capture data. We show results for a large range of queries applied to a collection of complex motion and activity. Our models of short time scale limb behaviour are built using labelled motion capture set. We compare with discriminative methods applied to tracker data; our method offers significantly improved performance. We show experimental evidence that our method is robust to view direction and is unaffected by the changes of clothing. 1.
Locally Time-Invariant Models of Human Activities using Trajectories on the Grassmannian
- IN IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION
, 2009
"... Human activity analysis is an important problem in computer vision with applications in surveillance and summarization and indexing of consumer content. Complex human activities are characterized by non-linear dynamics that make learning, inference and recognition hard. In this paper, we consider th ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Human activity analysis is an important problem in computer vision with applications in surveillance and summarization and indexing of consumer content. Complex human activities are characterized by non-linear dynamics that make learning, inference and recognition hard. In this paper, we consider the problem of modeling and recognizing complex activities which exhibit time-varying dynamics. To this end, we describe activities as outputs of linear dynamic systems (LDS) whose parameters vary with time, or a Time-Varying Linear Dynamic System (TV-LDS). We discuss parameter estimation methods for this class of models by assuming that the parameters are locally time-invariant. Then, we represent the space of LDS models as a Grassmann manifold. Then, the TV-LDS model is defined as a trajectory on the Grassmann manifold. We show how trajectories on the Grassmannian can be characterized using appropriate distance metrics and statistical methods that reflect the underlying geometry of the manifold. This results in more expressive and powerful models for complex human activities. We demonstrate the strength of the framework for activity-based summarization of long videos and recognition of complex human actions on two datasets.
Trajectory-based Representation of Human Actions
- LECTURE NOTES ON ARTIFICIAL INTELLIGENCE, SPEC. VOL. AI FOR HUMAN COMPUTING
, 2007
"... This work addresses the problem of human action recognition by introducing a representation of a human action as a collection of short trajectories that are extracted in areas of the scene with significant amount of visual activity. The trajectories are extracted by an auxiliary particle filtering t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This work addresses the problem of human action recognition by introducing a representation of a human action as a collection of short trajectories that are extracted in areas of the scene with significant amount of visual activity. The trajectories are extracted by an auxiliary particle filtering tracking scheme that is initialized at points that are considered salient both in space and time. The spatiotemporal salient points are detected by measuring the variations in the information content of pixel neighborhoods in space and time. We implement an online background estimation algorithm in order to deal with inadequate localization of the salient points on the moving parts in the scene, and to improve the overall performance of the particle filter tracking scheme. We use a variant of the Longest Common Subsequence algorithm (LCSS) in order to compare different sets of trajectories corresponding to different actions. We use Relevance Vector Machines (RVM) in order to address the classification problem. We propose new kernels for use by the RVM, which are specifically tailored to the proposed representation of short trajectories. The basis of these kernels is the modified LCSS distance of the previous step. We present results on real image sequences from a small database depicting people performing 12 aerobic exercises.
Spike Train Driven Dynamical Models for Human Actions
"... We investigate dynamical models of human motion that can support both synthesis and analysis tasks. Unlike coarser discriminative models that work well when action classes are nicely separated, we seek models that have finescale representational power and can therefore model subtle differences in th ..."
Abstract
- Add to MetaCart
We investigate dynamical models of human motion that can support both synthesis and analysis tasks. Unlike coarser discriminative models that work well when action classes are nicely separated, we seek models that have finescale representational power and can therefore model subtle differences in the way an action is performed. To this end, we model an observed action as an (unknown) linear time-invariant dynamical model of relatively small order, driven by a sparse bounded input signal. Our motivating intuition is that the time-invariant dynamics will capture the unchanging physical characteristics of an actor, while the inputs used to excite the system will correspond to a causal signature of the action being performed. We show that our model has sufficient representational power to closely approximate large classes of non-stationary actions with significantly reduced complexity. We also show that temporal statistics of the inferred input sequences can be compared in order to recognize actions and detect transitions between them. 1.
unknown title
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract
- Add to MetaCart
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
Computer Vision and Image Understanding 113 (2009) 353–371 Contents lists available at ScienceDirect Computer Vision and Image Understanding
"... journal homepage: www.elsevier.com/locate/cviu ..."

