• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Human Activity Recognition with Metric Learning

by Du Tran, Er Sorokin
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 16
Next 10 →

Social Signal Processing: Survey of an Emerging Domain

by Alessandro Vinciarelli , Maja Pantic , Hervé Bourlard , 2008
"... The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next- ..."
Abstract - Cited by 32 (10 self) - Add to MetaCart
The ability to understand and manage social signals of a person we are communicating with is the core of social intelligence. Social intelligence is a facet of human intelligence that has been argued to be indispensable and perhaps the most important for success in life. This paper argues that next-generation computing needs to include the essence of social intelligence – the ability to recognize human social signals and social behaviours like turn taking, politeness, and disagreement – in order to become more effective and more efficient. Although each one of us understands the importance of social signals in everyday life situations, and in spite of recent advances in machine analysis of relevant behavioural cues like blinks, smiles, crossed arms, laughter, and similar, design and development of automated systems for Social Signal Processing (SSP) are rather difficult. This paper surveys the past efforts in solving these problems by a computer, it summarizes the relevant findings in social psychology, and it proposes a set of recommendations for enabling the development of the next generation of socially-aware computing.

Histograms of oriented optical flow and binet-cauchy kernels on nonlinear dynamical systems for the recognition of human actions

by Rizwan Chaudhry, Avinash Ravich, Gregory Hager, René Vidal - in In IEEE Conference on Computer Vision and Pattern Recognition (CVPR , 2009
"... System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
System theoretic approaches to action recognition model the dynamics of a scene with linear dynamical systems (LDSs) and perform classification using metrics on the space of LDSs, e.g. Binet-Cauchy kernels. However, such approaches are only applicable to time series data living in a Euclidean space, e.g. joint trajectories extracted from motion capture data or feature point trajectories extracted from video. Much of the success of recent object recognition techniques relies on the use of more complex feature descriptors, such as SIFT descriptors or HOG descriptors, which are essentially histograms. Since histograms live in a non-Euclidean space, we can no longer model their temporal evolution with LDSs, nor can we classify them using a metric for LDSs. In this paper, we propose to represent each frame of a video using a histogram of oriented optical flow (HOOF) and to recognize human actions by classifying HOOF time-series. For this purpose, we propose a generalization of the Binet-Cauchy kernels to nonlinear dynamical systems (NLDS) whose output lives in a non-Euclidean space, e.g. the space of histograms. This can be achieved by using kernels defined on the original non-Euclidean space, leading to a well-defined metric for NLDSs. We use these kernels for the classification of actions in video sequences using (HOOF) as the output of the NLDS. We evaluate our approach to recognition of human actions in several scenarios and achieve encouraging results. 1.

Stabilizing motion tracking using retrieved motion priors

by Andreas Baak, Bodo Rosenhahn, Meinard Müller, Hans-peter Seidel - In IEEE ICCV , 2009
"... In this paper, we introduce a novel iterative motion tracking framework that combines 3D tracking techniques with motion retrieval for stabilizing markerless human motion capturing. The basic idea is to start human tracking without prior knowledge about the performed actions. The resulting 3D motion ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
In this paper, we introduce a novel iterative motion tracking framework that combines 3D tracking techniques with motion retrieval for stabilizing markerless human motion capturing. The basic idea is to start human tracking without prior knowledge about the performed actions. The resulting 3D motion sequences, which may be corrupted due to tracking errors, are locally classified according to available motion categories. Depending on the classification result, a retrieval system supplies suitable motion priors, which are then used to regularize and stabilize the tracking in the next iteration step. Experiments with the HumanEVA-II benchmark show that tracking and classification are remarkably improved after few iterations. 1.

Making Action Recognition Robust to Occlusions and Viewpoint Changes

by Daniel Weinl, Mustafa Özuysal, Pascal Fua
"... Abstract. Most state-of-the-art approaches to action recognition rely on global representations either by concatenating local information in a long descriptor vector or by computing a single location independent histogram. This limits their performance in presence of occlusions and when running on m ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Abstract. Most state-of-the-art approaches to action recognition rely on global representations either by concatenating local information in a long descriptor vector or by computing a single location independent histogram. This limits their performance in presence of occlusions and when running on multiple viewpoints. We propose a novel approach to providing robustness to both occlusions and viewpoint changes that yields significant improvements over existing techniques. At its heart is a local partitioning and hierarchical classification of the 3D Histogram of Oriented Gradients (HOG) descriptor to represent sequences of images that have been concatenated into a data volume. We achieve robustness to occlusions and viewpoint changes by combining training data from all viewpoints to train classifiers that estimate action labels independently over sets of HOG blocks. A top level classifier combines these local labels into a global action class decision. 1

Cross-View Action Recognition via View Knowledge Transfer

by Jingen Liu, Mubarak Shah, Benjamin Kuipers, Silvio Savarese
"... In this paper, we present a novel approach to recognizing human actions from different views by view knowledge transfer. An action is originally modelled as a bag of visual-words (BoVW), which is sensitive to view changes. We argue that, as opposed to visual words, there exist some higher level feat ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
In this paper, we present a novel approach to recognizing human actions from different views by view knowledge transfer. An action is originally modelled as a bag of visual-words (BoVW), which is sensitive to view changes. We argue that, as opposed to visual words, there exist some higher level features which can be shared across views and enable the connection of action models for different views. To discover these features, we use a bipartite graph to model two view-dependent vocabularies, then apply bipartite graph partitioning to co-cluster two vocabularies into visual-word clusters called bilingual-words (i.e., high-level features), which can bridge the semantic gap across viewdependent vocabularies. Consequently, we can transfer a BoVW action model into a bag-of-bilingual-words (BoBW) model, which is more discriminative in the presence of view changes. We tested our approach on the IXMAS data set and obtained very promising results. Moreover, to further fuse view knowledge from multiple views, we apply a Locally Weighted Ensemble scheme to dynamically weight transferred models based on the local distribution structure around each test example. This process can further improve the average recognition rate by about 7%. 1.

The action similarity labeling challenge

by Orit Kliper-Gross, Tal Hassner, Lior Wolf - IEEE TRANSACTIONS OF PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI , 2011
"... Recognizing actions in videos is rapidly becoming a topic of much research. To facilitate the development of methods for action recognition, several video collections, along with benchmark protocols, have previously been proposed. In this paper we present a novel video database, the “Action Similari ..."
Abstract - Cited by 3 (3 self) - Add to MetaCart
Recognizing actions in videos is rapidly becoming a topic of much research. To facilitate the development of methods for action recognition, several video collections, along with benchmark protocols, have previously been proposed. In this paper we present a novel video database, the “Action Similarity LAbeliNg ” (ASLAN) database, along with benchmark protocols. The ASLAN set includes thousands of videos collected from the web, in over 400 complex action classes. Our benchmark protocols focus on action similarity (same/not-same), rather than action classification, and testing is performed on never-before-seen actions. We propose this data set and benchmark as a means for gaining a more principled understanding of what makes actions different or similar, rather than learning the properties of particular action classes. We present baseline results on our benchmark, and compare them to human performance. To promote further study of action similarity techniques, we make the ASLAN database, benchmarks, and descriptor encodings publicly available to the research community.

Human Action Recognition from a Single Clip per Action

by Weilong Yang, Yang Wang, Greg Mori
"... Learning-based approaches for human action recognition often rely on large training sets. Most of these approaches do not perform well when only a few training samples are available. In this paper, we consider the problem of human action recognition from a single clip per action. Each clip contains ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Learning-based approaches for human action recognition often rely on large training sets. Most of these approaches do not perform well when only a few training samples are available. In this paper, we consider the problem of human action recognition from a single clip per action. Each clip contains at most 25 frames. Using a patch based motion descriptor and matching scheme, we can achieve promising results on three different action datasets with a single clip as the template. Our results are comparable to previously published results using much larger training sets. We also present a method for learning a transferable distance function for these patches. The transferable distance function learning extracts generic knowledge of patch weighting from previous training sets, and can be applied to videos of new actions without further learning. Our experimental results show that the transferable distance function learning not only improves the recognition accuracy of the single clip action recognition, but also significantly enhances the efficiency of the matching scheme. 1.

Group Action Recognition Using Space-Time Interest Points

by Qingdi Wei, Xiaoqin Zhang, Yu Kong, Weiming Hu, Haibin Ling
"... Abstract. Group action recognition is a challenging task in computer vision due to the large complexity induced by multiple motion patterns. This paper aims at analyzing group actions in video clips containing several activities. We combine the probability summation framework with the space-time (ST ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract. Group action recognition is a challenging task in computer vision due to the large complexity induced by multiple motion patterns. This paper aims at analyzing group actions in video clips containing several activities. We combine the probability summation framework with the space-time (ST) interest points for this task. First, ST interest points are extracted from video clips to form the feature space. Then we use k-means for feature clustering and build a compact representation, which is then used for group action classification. The proposed approach has been applied to classification tasks including four classes: badminton, tennis, basketball, and soccer videos. The experimental results demonstrate the advantages of the proposed approach. 1

Joint Segmentation and Classification of Human Actions in Video

by Minh Hoai, Zhen-zhong Lan, Fernando De Torre
"... Automatic video segmentation and action recognition has been a long-standing problem in computer vision. Much work in the literature treats video segmentation and action recognition as two independent problems; while segmentation is often done without a temporal model of the activity, action recogni ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Automatic video segmentation and action recognition has been a long-standing problem in computer vision. Much work in the literature treats video segmentation and action recognition as two independent problems; while segmentation is often done without a temporal model of the activity, action recognition is usually performed on pre-segmented clips. In this paper we propose a novel method that avoids the limitations of the above approaches by jointly performing video segmentation and action recognition. Unlike standard approaches based on extensions of dynamic Bayesian networks, our method is based on a discriminative temporal extension of the spatial bag-of-words model that has been very popular in object recognition. The classification is performed robustly within a multi-class SVM framework whereas the inference over the segments is done efficiently with dynamic programming. Experimental results on honeybee, Weizmann, and Hollywood datasets illustrate the benefits of our approach compared to state-of-the-art methods. 1.

Surveillance Event Detection

by Mert Dikmen, Huazhong Ning, Dennis J. Lin, Liangliang Cao, Vuong Le, Shen-fu Tsai, Kai-hsiang Lin, Zhen Li, Jianchao Yang, Thomas S. Huang, Fengjun Lv, Wei Xu, Ming Yang, Kai Yu, Zhao Zhao, Guangyu Zhu, Yihong Gong , 2009
"... We have developed and evaluated three generalized systems for event detection. The first system is a simple brute force search method, where each space-time location in the video is evaluated by a binary decision rule on whether it contains the event or not. The second system is build on top of a he ..."
Abstract - Add to MetaCart
We have developed and evaluated three generalized systems for event detection. The first system is a simple brute force search method, where each space-time location in the video is evaluated by a binary decision rule on whether it contains the event or not. The second system is build on top of a head tracker to avoid costly brute force searching. The decision stage is a combination of state of the art feature extractors and classifiers. Our third system has a probabilistic framework. From the observations, the pose of the people are estimated and used to determine the presence of event. Finally we introduce two ad-hoc methods that were designed to specifically detect OpposingFlow and TakePicture events. The results are promising as we are able to get good results on several event categories, while for all events we have gained valuable insights and experience. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University