Results 1  10
of
39
Visual Tracking Decomposition
 in CVPR
, 2010
"... We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion mod ..."
Abstract

Cited by 34 (2 self)
 Add to MetaCart
We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion models as well as trackers. In our scheme, the observation model is decomposed into multiple basic observation models that are constructed by sparse principal component analysis (SPCA) of a set of feature templates. Each basic observation model covers a specific appearance of the object. The motion model is also represented by the combination of multiple basic motion models, each of which covers a different type of motion. Then the multiple basic trackers are designed by associating the basic observation models and the basic motion models, so that each specific tracker takes charge of a certain change in the object. All basic trackers are then integrated into one compound tracker through an interactive Markov Chain Monte Carlo (IMCMC) framework in which the basic trackers communicate with one another interactively while run in parallel. By exchanging information with others, each tracker further improves its performance, which results in increasing the whole performance of tracking. Experimental results show that our method tracks the object accurately and reliably in realistic videos where the appearance and motion are drastically changing over time. 1.
A fast and incremental method for loopclosure detection using bags of visual words,” Conditionally accpeted for publication in
 IEEE Transactions On Robotics, Special Issue on Visual SLAM
, 2008
"... Abstract—In robotic applications of visual simultaneous localization and mapping techniques, loopclosure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it pos ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Abstract—In robotic applications of visual simultaneous localization and mapping techniques, loopclosure detection and global localization are two issues that require the capacity to recognize a previously visited place from current camera measurements. We present an online method that makes it possible to detect when an image comes from an already perceived scene using local shape and color information. Our approach extends the bagofwords method used in image classification to incremental conditions and relies on Bayesian filtering to estimate loopclosure probability. We demonstrate the efficiency of our solution by realtime loopclosure detection under strong perceptual aliasing conditions in both indoor and outdoor image sequences taken with a handheld camera. Index Terms—Loopclosure detection, localization, SLAM. I.
Human action recognition using distribution of oriented rectangular patches
 IN: WORKSHOP ON HUMAN MOTION
, 2007
"... We describe a “bagofrectangles ” method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distr ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
We describe a “bagofrectangles ” method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distribution of these rectangular patches. In order to carry the information from the spatial domain described by the bagofrectangles descriptor to temporal domain for recognition of the actions, four different methods are proposed. These are namely, (i) frame by frame voting, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis by rectangular patches, (iii) a classifier based approach using SVMs, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the descriptor. The detailed experiments are carried out on the action dataset of Blank et. al. High success rates (100%) prove that with a very simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations.
Approximate earth mover’s distance in linear time
"... The earth mover’s distance (EMD) [16] is an important perceptually meaningful metric for comparing histograms, butitsuffersfromhigh(O(N 3 log N))computationalcomplexity. We present a novel linear time algorithm for approximating the EMD for low dimensional histograms using the sum of absolute values ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
The earth mover’s distance (EMD) [16] is an important perceptually meaningful metric for comparing histograms, butitsuffersfromhigh(O(N 3 log N))computationalcomplexity. We present a novel linear time algorithm for approximating the EMD for low dimensional histograms using the sum of absolute values of the weighted wavelet coefficients of the difference histogram. EMD computation is aspecialcaseoftheKantorovichRubinsteintransshipment problem, andweexploittheHölder continuityconstraintin its dual form to convert it into a simple optimization problem with an explicit solution in the wavelet domain. We prove that the resulting wavelet EMD metric is equivalent to EMD, i.e. the ratio of the two is bounded. We also provide estimates for the bounds. Theweightedwavelettransformcanbecomputedintime linear in the number of histogram bins, while the comparison is about as fast as for normal Euclidean distance or χ 2 statistic. WeexperimentallyshowthatwaveletEMDisa good approximation to EMD, has similar performance, but requires much less computation. 1.
SpectralDriven IsometryInvariant Matching of 3D Shapes
, 2009
"... This paper presents a matching method for 3D shapes, which comprises a new technique for surface sampling and two algorithms for matching 3D shapes based on pointbased statistical shape descriptors. Our sampling technique is based on critical points of the eigenfunctions related to the smaller eige ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This paper presents a matching method for 3D shapes, which comprises a new technique for surface sampling and two algorithms for matching 3D shapes based on pointbased statistical shape descriptors. Our sampling technique is based on critical points of the eigenfunctions related to the smaller eigenvalues of the LaplaceBeltrami operator. These critical points are invariant to isometries and are used as anchor points of a sampling technique, which extends the farthest point sampling by using statistical criteria for controlling the density and number of reference points. Once a set of reference points has been computed, for each of them we construct a pointbased statistical descriptor (PSSD, for short) of the input surface. This descriptor incorporates an approximation of the geodesic shape distribution and other geometric information describing the surface at that point. Then, the dissimilarity between two surfaces is computed by comparing the corresponding sets of PSSDs with bipartite graph matching or measuring the L1distance between the reordered feature vectors of a proximity graph. Here, the reordering is given by the Fiedler vector of a Laplacian matrix
A linear time histogram metric for improved sift matching
 In ECCV
"... Abstract. We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Abstract. We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to quantization, distortion, occlusion etc. In this paper we present a new Earth Mover’s Distance (EMD) variant. We show that it is a metric (unlike the original EMD [1] which is a metric only for normalized histograms). Moreover, it is a natural extension of the L1 metric. Second, we propose a linear time algorithm for the computation of the EMD variant, with a robust ground distance for oriented gradients. Finally, extensive experimental results on the Mikolajczyk and Schmid dataset [2] show that our method outperforms state of the art distances. 1
Viewpoint manifolds for action recognition,”J
 Image Video Process
, 2009
"... Researchers are increasingly interested in providing videobased, viewinvariant action recognition for human motion. Addressing this problem will lead to more accurate modeling and analysis of the type of unconstrained video commonly collected in the areas of athletics and medicine. Previous viewpo ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Researchers are increasingly interested in providing videobased, viewinvariant action recognition for human motion. Addressing this problem will lead to more accurate modeling and analysis of the type of unconstrained video commonly collected in the areas of athletics and medicine. Previous viewpointinvariant methods use multiple cameras in both the training and testing phases of action recognition or require storing many examples of a single action from multiple viewpoints. In this paper, we present a framework for learning a compact representation of primitive actions (e.g., walk, punch, kick, sit) that can be used for video obtained from a single camera for simultaneous action recognition and viewpoint estimation. Using our method, which models the lowdimensional structure of these actions relative to viewpoint, we show recognition rates on a publicly available data set previously only acheieved using multiple simultaneous views. 1.
Interactive learning of visual topological navigation
"... Abstract — We present a topological navigation system that is able to visually recognize the different rooms of an apartment and guide a robot between them. Specifically tailored for small entertainment robots, the system relies on vision only and learns its navigation capabilities incrementally by ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
Abstract — We present a topological navigation system that is able to visually recognize the different rooms of an apartment and guide a robot between them. Specifically tailored for small entertainment robots, the system relies on vision only and learns its navigation capabilities incrementally by interacting with a user. This continuous learning strategy makes the system particularly adaptable to environmental lighting and structure modifications. From the computer vision point of view, the system uses a purely appearancebased image representation called bag of visual words, without any metric information. This representation was adapted to the incremental context of robotics and supplemented by active perception to enhance performances. Empirical validation on real robots and on the publicly available INDECS image database are presented. I.
The QuadraticChi Histogram Distance Family
"... Abstract. We present a new histogram distance family, the QuadraticChi (QC). QC members are QuadraticForm distances with a crossbin χ 2like normalization. The crossbin χ 2like normalization reduces the effect of large bins having undo influence. Normalization was shown to be helpful in many ca ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Abstract. We present a new histogram distance family, the QuadraticChi (QC). QC members are QuadraticForm distances with a crossbin χ 2like normalization. The crossbin χ 2like normalization reduces the effect of large bins having undo influence. Normalization was shown to be helpful in many cases, where the χ 2 histogram distance outperformed the L2 norm. However, χ 2 is sensitive to quantization effects, such as caused by light changes, shape deformations etc. The QuadraticForm part of QC members takes care of crossbin relationships (e.g. red and orange), alleviating the quantization problem. We present two new crossbin histogram distance properties: SimilarityMatrixQuantizationInvariance and SparsenessInvariance and show that QC distances have these properties. We also show that experimentally they boost performance. QC distances computation time complexity is linear in the number of nonzero entries in the binsimilarity matrix and histograms and it can easily be parallelized. We present results for image retrieval using the Scale Invariant Feature Transform (SIFT) and color image descriptors. In addition, we present results for shape classification using Shape Context (SC) and Inner Distance Shape Context (IDSC). We show that the new QC members outperform state of the art distances for these tasks, while having a short running time. The experimental results show that both the crossbin property and the normalization are important. 1
Isometryinvariant matching of point set surfaces
 In Proc. of the Eurographics workshop on 3D object retrieval
, 2008
"... Shape deformations preserving the intrinsic properties of a surface are called isometries. An isometry deforms a surface without tearing or stretching it, and preserves geodesic distances. We present a technique for matching point set surfaces, which is invariant with respect to isometries. A set of ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Shape deformations preserving the intrinsic properties of a surface are called isometries. An isometry deforms a surface without tearing or stretching it, and preserves geodesic distances. We present a technique for matching point set surfaces, which is invariant with respect to isometries. A set of reference points, evenly distributed on the point set surface, is sampled by farthest point sampling. The geodesic distance between reference points is normalized and stored in a geodesic distance matrix. Each row of the matrix yields a histogram of its elements. The set of histograms of the rows of a distance matrix is taken as a descriptor of the shape of the surface. The dissimilarity between two point set surfaces is computed by matching the corresponding sets of histograms with bipartite graph matching. This is an effective method for classifying and recognizing objects deformed with isometric transformations, e.g., nonrigid and articulated objects in different postures.