RealTime Tracking of NonRigid Objects using Mean Shift
 IEEE CVPR 2000
, 2000
"... A new method for realtime tracking of nonrigid objects seen from a moving camera isproposed. The central computational module is based on the mean shift iterations and nds the most probable target position in the current frame. The dissimilarity between the target model (its color distribution) an ..."
A new method for realtime tracking of nonrigid objects seen from a moving camera isproposed. The central computational module is based on the mean shift iterations and nds the most probable target position in the current frame. The dissimilarity between the target model (its color distribution) and the target candidates is expressed by a metric derived from the Bhattacharyya coefficient. The theoretical analysis of the approach shows that it relates to the Bayesian framework while providing a practical, fast and efficient solution. The capability of the tracker to handle in realtime partial occlusions, significant clutter, and target scale variations, is demonstrated for several image sequences.
Hierarchical Bayesian Inference in the Visual Cortex
, 2002
"... this paper, we propose a Bayesian theory of hierarchical cortical computation based both on (a) the mathematical and computational ideas of computer vision and pattern the ory and on (b) recent neurophysiological experimental evidence. We ,2 have proposed that Grenander's pattern theory 3 coul ..."
this paper, we propose a Bayesian theory of hierarchical cortical computation based both on (a) the mathematical and computational ideas of computer vision and pattern the ory and on (b) recent neurophysiological experimental evidence. We ,2 have proposed that Grenander's pattern theory 3 could potentially model the brain as a generafive model in such a way that feedback serves to disambiguate and 'explain away' the earlier representa tion. The Helmholtz machine 4, 5 was an excellent step towards approximating this proposal, with feedback implementing priors. Its development, however, was rather limited, dealing only with binary images. Moreover, its feedback mechanisms were engaged only during the learning of the feedforward connections but not during perceptual inference, though the Gibbs sampling process for inference can potentially be interpreted as topdown feedback disambiguating low level representations? Rao and Ballard's predictive coding/Kalman filter model 6 did integrate generafive feedback in the perceptual inference process, but it was primarily a linear model and thus severely limited in practical utility. The datadriven Markov Chain Monte Carlo approach of Zhu and colleagues 7, 8 might be the most successful recent application of this proposal in solving real and difficult computer vision problems using generafive models, though its connection to the visual cortex has not been explored. Here, we bring in a powerful and widely applicable paradigm from artificial intelligence and computer vision to propose some new ideas about the algorithms of visual cortical process ing and the nature of representations in the visual cortex. We will review some of our and others' neurophysiological experimental data to lend support to these ideas
Implicit Probabilistic Models of Human Motion for Synthesis and Tracking Hedvig Sidenblen
 In European Conference on Computer Vision
, 2002
"... This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthe ..."
(Show Context)
This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution . These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set
Visual Tracking and Recognition Using AppearanceAdaptive Models in Particle Filters
 IEEE Transactions on Image Processing
, 2004
"... We present an approach that incorporates appearanceadaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Tracking needs modeling interframe motion and appearance changes whereas recognition needs modeling appearance changes between frames and gallery ..."
(Show Context)
We present an approach that incorporates appearanceadaptive models in a particle filter to realize robust visual tracking and recognition algorithms. Tracking needs modeling interframe motion and appearance changes whereas recognition needs modeling appearance changes between frames and gallery images. In conventional tracking algorithms, the appearance model is either fixed or rapidly changing, and the motion model is simply a random walk with fixed noise variance. Also, the number of particles is typically fixed. All these factors make the visual tracker unstable. To stabilize the tracker, we propose the following modifications: an observation model arising from an adaptive appearance model, an adaptive velocity motion model with adaptive noise variance, and an adaptive number of particles. The adaptivevelocity model is derived using a firstorder linear predictor based on the appearance difference between the incoming observation and the previous particle configuration. Occlusion analysis is implemented using robust statistics. Experimental results on tracking visual objects in long outdoor and indoor video sequences demonstrate the effectiveness and robustness of our tracking algorithm. We then perform simultaneous tracking and recognition by embedding them in a particle filter. For recognition purposes, we model the appearance changes between frames and gallery images by constructing the intra and extrapersonal spaces. Accurate recognition is achieved when confronted by pose and view variations.
Data fusion for visual tracking with particles
 Proc. IEEE
, 2004
"... Abstract—The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. ..."
(Show Context)
Abstract—The effectiveness of probabilistic tracking of objects in image sequences has been revolutionized by the development of particle filtering. Whereas Kalman filters are restricted to Gaussian distributions, particle filters can propagate more general distributions, albeit only approximately. This is of particular benefit in visual tracking because of the inherent ambiguity of the visual world that stems from its richness and complexity. One important advantage of the particle filtering framework is that it allows the information from different measurement sources to be fused in a principled manner. Although this fact has been acknowledged before, it has not been fully exploited within a visual tracking context. Here we introduce generic importance sampling mechanisms for data fusion and discuss them for fusing color with either stereo sound, for teleconferencing, or with motion, for surveillance with a still camera. We show how each of the three cues can be modeled by an appropriate data likelihood function, and how the intermittent cues (sound or motion) are best handled by generating proposal distributions from their likelihood functions. Finally, the effective fusion of the cues by particle filtering is demonstrated on real teleconference and surveillance data. Index Terms — Visual tracking, data fusion, particle filters, sound, color, motion I.
Monocular Pedestrian Detection: Survey and Experiments
, 2008
"... Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first ..."
Pedestrian detection is a rapidly evolving area in computer vision with key applications in intelligent vehicles, surveillance and advanced robotics. The objective of this paper is to provide an overview of the current state of the art from both methodological and experimental perspective. The first part of the paper consists of a survey. We cover the main components of a pedestrian detection system and the underlying models. The second (and larger) part of the paper contains a corresponding experimental study. We consider a diverse set of stateoftheart systems: waveletbased AdaBoost cascade [74], HOG/linSVM [11], NN/LRF [75] and combined shapetexture detection [23]. Experiments are performed on an extensive dataset captured onboard a vehicle driving through urban environment. The dataset includes many thousands of training samples as well as a 27 minute test sequence involving more than 20000 images with annotated pedestrian locations. We consider a generic evaluation setting and one specific to pedestrian detection onboard a vehicle. Results indicate a clear advantage of HOG/linSVM at higher image resolutions and lower processing speeds, and a superiority of the waveletbased AdaBoost cascade approach at lower image resolutions and (near) realtime processing speeds. The dataset (8.5GB) is made public for benchmarking purposes.
Visual Tracking Decomposition
 in CVPR
, 2010
"... We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion mod ..."
(Show Context)
We propose a novel tracking algorithm that can work robustly in a challenging scenario such that several kinds of appearance and motion changes of an object occur at the same time. Our algorithm is based on a visual tracking decomposition scheme for the efficient design of observation and motion models as well as trackers. In our scheme, the observation model is decomposed into multiple basic observation models that are constructed by sparse principal component analysis (SPCA) of a set of feature templates. Each basic observation model covers a specific appearance of the object. The motion model is also represented by the combination of multiple basic motion models, each of which covers a different type of motion. Then the multiple basic trackers are designed by associating the basic observation models and the basic motion models, so that each specific tracker takes charge of a certain change in the object. All basic trackers are then integrated into one compound tracker through an interactive Markov Chain Monte Carlo (IMCMC) framework in which the basic trackers communicate with one another interactively while run in parallel. By exchanging information with others, each tracker further improves its performance, which results in increasing the whole performance of tracking. Experimental results show that our method tracks the object accurately and reliably in realistic videos where the appearance and motion are drastically changing over time. 1.
People Tracking Using Hybrid Monte Carlo Filtering
, 2001
"... Particle filters are used for hidden state estimation with nonlinear dynamical systems. The inference of 3d human motion is a natural application, given the nonlinear dynamics of the body and the nonlinear relation between states and image observations. However, the application of particle filters ..."
Particle filters are used for hidden state estimation with nonlinear dynamical systems. The inference of 3d human motion is a natural application, given the nonlinear dynamics of the body and the nonlinear relation between states and image observations. However, the application of particle filters has been limited to cases where the number of state variables is relatively small, because the number of samples needed with high dimensional problems can be prohibitive. We describe a filter that uses hybrid Monte Carlo (HMC) to obtain samples in high dimensional spaces. It uses multiple Markov chains that use posterior gradients to rapidly explore the state space, yielding fair samples from the posterior. We find that the HMC filter is several thousand times faster than a conventional particle filter on a 28D people tracking problem.
Capturing Natural Hand Articulation
 In ICCV
, 2001
"... Visionbased motion capturing of hand articulation is a challenging task, since the hand presents a motion of high degrees of freedom. Modelbased approaches could be taken to approach this problem by searching in a high dimensional hand state space, and matching projections of a hand model and imag ..."
(Show Context)
Visionbased motion capturing of hand articulation is a challenging task, since the hand presents a motion of high degrees of freedom. Modelbased approaches could be taken to approach this problem by searching in a high dimensional hand state space, and matching projections of a hand model and image observations. However, it is highly inefficient due to the curse of dimensionality. Fortunately, natural hand articulation is highly constrained, which largely reduces the dimensionality of hand state space. This paper presents a modelbased method to capture hand articulation by learning hand natural constraints. Our study shows that natural hand articulation lies in a lower dimensional configurations space characterized by a union of linear manifolds spanned by a set of basis configurations. By integrating hand motion constraints, an efficient articulated motioncapturing algorithm is proposed based on sequential Monte Carlo techniques. Our experiments show that this algorithm is robust and accurate for tracking natural hand movements. This algorithm is easy to extend to other articulated motion capturing tasks.