#### DMCA

## Robust 3D action recognition with random occupancy patterns. (2012)

Venue: | In ECCV, |

Citations: | 48 - 2 self |

### Citations

3495 | A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55(1
- Freund, Schapire
- 1997
(Show Context)
Citation Context ...ion, 4D subvolumes are employed. 2 Related Work The Haar wavelet-like features have been successfully applied in [5] for face detection. A boosted classifier is learned by applying AdaBoost algorithm =-=[6]-=- on a very large pool of weak classifiers. The weak classifier pool is constructed based Robust 3D Action Recognition with Random Occupancy Patterns 3 on the features extracted from the rectangles at ... |

1887 | Robust real-time face detection
- Viola, Jones
- 2004
(Show Context)
Citation Context ...oposed method. The 3D subvolumes are shown for illustration purpose. In the implementation, 4D subvolumes are employed. 2 Related Work The Haar wavelet-like features have been successfully applied in =-=[5]-=- for face detection. A boosted classifier is learned by applying AdaBoost algorithm [6] on a very large pool of weak classifiers. The weak classifier pool is constructed based Robust 3D Action Recogni... |

972 | T.: Regularization and variable selection via the elastic net
- Zou, Hastie
- 2005
(Show Context)
Citation Context ...ifier is faster if the number of the selected features is smaller. Second, learning a sparse classification function is less prone to over-fitting if only limited amount of training data is available =-=[19]-=-. For each training data sample xi, Nf ROP features are extracted: h i j , j = 1, · · · , Nf , and the response is predicted by a linear function yi = Nf∑ j=1 wjh i j (6) Robust 3D Action Recognition ... |

819 | Space-time interest points,”
- Laptev, Lindeberg
- 2003
(Show Context)
Citation Context ...ata and the rest of them as test data, which is difficult because of the larger variations across the same actions performed by different subjects. Our method is also compared with the STIP features. =-=[21]-=-, which is a state-of-the-art local feature designed for action recognition from videos. The local spatio-temporal features do not work well for depth data because there is little texture in depth map... |

568 | Real-time human pose recognition in parts from a single depth image
- Shotton, Fitzgibbon, et al.
(Show Context)
Citation Context ...and depth images are insensitive to changes in lighting conditions. In this paper, we consider the problem of action recognition from depth sequences. Although skeleton tracking algorithm proposed in =-=[1]-=- is very robust for depth sequences when little occlusion occurs, it can produce inaccurate results or even fails when serious occlusion occurs. Moreover, the skeleton tracking is unavailable for huma... |

263 | Shape quantization and recognition with randomized trees.
- Amit, Geman
- 1997
(Show Context)
Citation Context ...v model. The proposed ROP feature is much simpler and more computationally efficient than Haar-like features, while achieving similar performances in depth datasets. Randomization has been applied in =-=[8]-=- and [9] to address this problem. [9] employs a random forest to learn discriminative features that are extracted either from a patch or from a pair of patches for fine-grained image categorization. [... |

213 | The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. Information Theory,
- Bartlett
- 1998
(Show Context)
Citation Context ...ighted sampling scheme is more effective than the uniform sampling scheme. Moreover, the proposed scheme does not suffer from overfitting even when the number of the sampled subvolumes is very large. =-=[23]-=- gives an intuitive proof of the generalization ability of classifier of the randomly generated features. The depth sequences are downsampled into different resolutions, and we explore the relationshi... |

121 | Action Recognition from Arbitrary Views using 3D Exemplars,
- Weinland, Boyer, et al.
- 2007
(Show Context)
Citation Context ...l for 2D images, when the data becomes 3D or 4D, the number of possible rectangles becomes so large that enumerating them or performing AdaBoost algorithm on them becomes computationally prohibitive. =-=[7]-=- utilizes 3D occupancy features and models the dynamics by an exemplarbased hidden Markov model. The proposed ROP feature is much simpler and more computationally efficient than Haar-like features, wh... |

102 | Mining actionlet ensemble for action recognition with depth cameras.
- Wang, Liu, et al.
- 2012
(Show Context)
Citation Context ...ve been made to develop features for action recognition in depth data. [12] represents each depth frame as a bag of 3D points on the human silhouette, and utilizes HMM to model the temporal dynamics. =-=[13]-=- uses relative skeleton position and local occupancy patterns to model the human-object interaction, and developed Fourier Temporal Pyramid to characterize temporal dynamics. [14] also applies spatio-... |

76 | Efficient regression of general-activity human poses from depth images.
- Girshick, Shotton, et al.
- 2011
(Show Context)
Citation Context ... the pixels in the region R as the separability score of R. The probability that a subvolume R is sampled should be proportional to its separability score JR, that is, PR sampled ∝ JR = 1 NR ∑ p2R Jp =-=(4)-=- where NR is the number of pixels in the subvolume R. We can uniformly draw a subvolume, and accept the subvolume with probability PR accept = WxWyWzWt∑ p2V Jp JR (5) Note that PR uniformPR accept = P... |

66 | Combining randomization and discrimination for fine-grained image categorization.
- Yao, Khosla, et al.
- 2011
(Show Context)
Citation Context ... The proposed ROP feature is much simpler and more computationally efficient than Haar-like features, while achieving similar performances in depth datasets. Randomization has been applied in [8] and =-=[9]-=- to address this problem. [9] employs a random forest to learn discriminative features that are extracted either from a patch or from a pair of patches for fine-grained image categorization. [8] appli... |

60 | 3d convolutional neural networks for human action recognition,” PAMI,
- Ji, Xu, et al.
- 2013
(Show Context)
Citation Context ...xture in depth maps. Another method we compare with is the convolutional network. We have implemented a 4-dimensional convolutional network by extending the three-dimensional convolutional network of =-=[22]-=-. Finally we compare with a Support Vector Machine classifier on the raw features consisting of the pixels on all the locations. Although the Support Vector Machine performs surprisingly well on our d... |

52 | A data-driven approach for real-time full body pose reconstruction from a depth camera.
- Baak, Muller, et al.
- 2011
(Show Context)
Citation Context ...ing data. A large separability measure means that these classes have small within-class scatter and large between-class scatter, and the class separability measure J can be defined as J = tr(SW) trSB =-=(3)-=- Denote V as the 4D volume of a depth sequence. For each pixel p ∈ V , we define a neighborhood subvolume centered at p, and extract the 8 Haar feature values from this neighborhood subvolume. These 8... |

47 | Weighted sums of random kitchen sinks: Replacing minimization with randomization in learning.
- Rahimi, Recht
- 2009
(Show Context)
Citation Context ...zation in dealing with this problem. We exploit randomization to perform action recognition from depth sequences, in which the data is sparse and the number of the possible subvolumes is much larger. =-=[10]-=- and [11] also employ randomization in the learning process. These approaches randomly map the data to features with a linear function whose weights and biases are uniformly sampled. Their empirical a... |

43 |
Extreme learning machines: a survey,”
- Huang, Wang, et al.
- 2011
(Show Context)
Citation Context ... dealing with this problem. We exploit randomization to perform action recognition from depth sequences, in which the data is sparse and the number of the possible subvolumes is much larger. [10] and =-=[11]-=- also employ randomization in the learning process. These approaches randomly map the data to features with a linear function whose weights and biases are uniformly sampled. Their empirical and theore... |

36 |
Eigenjoints-based action recognition using naive-bayesnearest-neighbor.
- Yang, Tian
- 2012
(Show Context)
Citation Context ...Temporal Pyramid to characterize temporal dynamics. [14] also applies spatio-temporal occupancy patterns, but all the cells in the grid have the same size, and the number of cells is empirically set. =-=[15]-=- proposes a dimension-reduced skeleton feature, and [16] developed a histogram of gradient feature over depth motion maps. Instead of carefully developing good features, this paper tries to learn semi... |

22 | Recognizing actions using depth motion mapsbased histograms of oriented gradients.
- Yang, Zhang, et al.
- 2012
(Show Context)
Citation Context ... also applies spatio-temporal occupancy patterns, but all the cells in the grid have the same size, and the number of cells is empirically set. [15] proposes a dimension-reduced skeleton feature, and =-=[16]-=- developed a histogram of gradient feature over depth motion maps. Instead of carefully developing good features, this paper tries to learn semi-local features automatically from the data, and we show... |

21 | Kinecting the dots: Particle based scene flow form depth sensors,”
- Hadfield, Bowden
- 2011
(Show Context)
Citation Context ...1, z1, t1]. A normal subvolume has the property that x0 ≤ x1, y0 ≤ y1, z0 ≤ z1, and t0 ≤ t1, and the subvolume is the set of points {[x, y, z, t] : x0 ≤ x ≤ x1, y0 ≤ y ≤ y1, z0 ≤ z ≤ z1, t0 ≤ t ≤ t1} =-=(2)-=- Our sampling space consists of all the subvolumes [x0, y0, z0, t0] ∼ [x1, y1, z1, t1] where x0, x1 ∈ {1, 2, · · · ,Wx}, y0, y1 ∈ {1, 2, · · · ,Wy}, z0, z1 ∈ {1, 2, · · · ,Wz}, t0, t1 ∈ {1, 2, · · · ,... |

20 | Stop: Spacetime occupancy patterns for 3d action recognition from depth map sequences.
- Vieira, Nascimento, et al.
- 2012
(Show Context)
Citation Context ...e temporal dynamics. [13] uses relative skeleton position and local occupancy patterns to model the human-object interaction, and developed Fourier Temporal Pyramid to characterize temporal dynamics. =-=[14]-=- also applies spatio-temporal occupancy patterns, but all the cells in the grid have the same size, and the number of cells is empirically set. [15] proposes a dimension-reduced skeleton feature, and ... |

11 |
A real-time system for dynamic hand gesture recognition with a depth sensor. In:
- Kurakin, Zhang, et al.
- 2012
(Show Context)
Citation Context ...n different sampling methods, and the relationship between the resolution of data and the classification accuracy for SVM and the proposed sampling method. 6.2 Gesture3D Dataset The Gesture3D dataset =-=[24]-=- is a hand gesture dataset of depth sequences captured by a depth camera. This dataset contains a subset of gestures defined by American Sign Language (ASL). There are 12 gestures in the dataset: bath... |

3 |
A note on the computation of high-dimensional integral images.
- Tapia
- 2011
(Show Context)
Citation Context ...tion function: δ(x) = 1 1+ex . This feature is able to capture the occupancy pattern of a 4D subvolume. Moreover, it can be computed in constant complexity with the high dimensional integral images =-=[17]-=-. As shown in Fig. 1, we extract ROP features from the subvolumes with different sizes and at different locations. However, the number of possible simple features is so large that we are not able to e... |

2 |
Action recognition based on a bag of 3d points. In: Human Communicative Behavior Analysis Workshop (in conjunction with CVPR)
- Li, Zhang, et al.
- 2010
(Show Context)
Citation Context ...Furthermore, we propose a weighted sampling technique that is more effective than uniform sampling. Recently, a lot of efforts have been made to develop features for action recognition in depth data. =-=[12]-=- represents each depth frame as a bag of 3D points on the human silhouette, and utilizes HMM to model the temporal dynamics. [13] uses relative skeleton position and local occupancy patterns to model ... |

1 |
K.L.: Learning Kernel Parameters bu using Class Separability Measure. In:
- Wang, Chan
- 2002
(Show Context)
Citation Context ...n the rejection sampling, which samples the discriminative subvolumes with high probability. To characterize how discriminative a subvolume is, we employ the scatter matrix class separability measure =-=[18]-=-. The scatter matrices include Withinclass scatter matrix (SW ), Between-class scatter matrix (SB), and Total scatter matrix (ST ). They are defined as SW = ∑c i=1 ∑ni j=1 (hi;j −mi)(hi;j −mi)T , SB =... |

1 |
Mairal: (SPArse Modeling Software http://www.di.ens.fr/willow/SPAMS
- Julien
(Show Context)
Citation Context ...s been shown that if the number of features Nf is much larger than that of the training data n, which is the case of this paper, Elastic-Net regularization works particularly well [19]. SPAMS toolbox =-=[20]-=- is employed to numerically solve this optimization problem. The selected feature f is obtained by discarding the features xj with corresponding wj less than a given threshold and multiplying the rest... |