#### DMCA

## CONDENSATION-Conditional Density Propagation for Visual Tracking (1998)

### BibTeX

@MISC{Michael98condensation-conditionaldensity,

author = {Michael and Andrew Blake},

title = {CONDENSATION-Conditional Density Propagation for Visual Tracking},

year = {1998}

}

### OpenURL

### Abstract

Abstract. The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimodal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses "factored sampling", previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time. Tracking Curves in Clutter The purpose of this paper 1 is to establish a stochastic framework for tracking curves in visual clutter, using a sampling algorithm. The approach is rooted in ideas from statistics, control theory and computer vision. The problem is to track outlines and features of foreground objects, modelled as curves, as they move in substantial clutter, and to do it at, or close to, video frame-rate. This is challenging because elements in the background clutter may mimic parts of foreground features. In the most severe case of camouflage, the background may consist of objects similar to the foreground object, for instance, when a person is moving past a crowd. Our approach aims to dissolve the resulting ambiguity by applying probabilistic models of object shape and motion to analyse the video-stream. The degree of generality of these models is pitched carefully: sufficiently specific for effective disambiguation but sufficiently general to be broadly applicable over entire classes of foreground objects. Modelling Shape and Motion Effective methods have arisen in computer vision for modelling shape and motion. When suitable geometric models of a moving object are available, they can be matched effectively to image data, though usually at considerable computational cost 6 Isard and Blake Finally, prior probability densities can be defined over the curves Kalman Filters and Data-Association Spatio-temporal estimation, the tracking of shape and position over time, has been dealt with thoroughly by Kalman filtering, in the relatively clutter-free case in which p(x t ) can satisfactorily be modelled as Gaussian and can be applied to curves Temporal Propagation of Conditional Densities The Kalman filter as a recursive linear estimator is a special case, applying only to Gaussian densities, of a more general probability density propagation process. In continuous time this can be described in terms of diffusion, governed by a "Fokker-Planck" equation Condensation-Conditional Density Propagation for Visual Tracking 7 Figure 2. Probability density propagation: propagation is depicted here as it occurs over a discrete time-step. There are three phases: drift due to the deterministic component of object dynamics; diffusion due to the random component; reactive reinforcement due to observations. model leads to spreading-increasing uncertaintywhile the deterministic component causes the density function to drift bodily. The effect of an external observation z t is to superimpose a reactive effect on the diffusion in which the density tends to peak in the vicinity of observations. In clutter, there are typically several competing observations and these tend to encourage a non-Gaussian state-density The Condensation algorithm is designed to address this more general situation. It has the striking property that, generality notwithstanding, it is a considerably simpler algorithm than the Kalman filter. Moreover, despite its use of random sampling which is often thought to be computationally inefficient, the Condensation algorithm runs in near real-time. This is because tracking over time maintains relatively tight distributions for shape at successive time-steps, and particularly so given the availability of accurate, learned models of shape and motion. Discrete-Time Propagation of State Density For computational purposes, the propagation process must be set out in terms of discrete time t. The state of the modelled object at time t is denoted x t and its history is X t = {x 1 , . . . , x t }. Similarly, the set of image features at time t is z t with history Z t = {z 1 , . . . , z t }. Note that no functional assumptions (linearity, Gaussianity, unimodality) are made about densities in the general treatment, though particular choices will be made in due course in order to demonstrate the approach. Stochastic Dynamics A somewhat general assumption is made for the probabilistic framework that the object dynamics form a temporal Markov chain so that -the new state is conditioned directly only on the immediately preceding state, independent of the earlier history. This still allows quite general dynamics, including stochastic difference equations of arbitrary order; we use second order models and details are given later. The dynamics are entirely determined therefore by the form of the conditional density p(x t | x t−1 ). For instance, Isard and Blake represents a one-dimensional random walk (discrete diffusion) whose step length is a standard normal variate, superimposed on a rightward drift at unit speed. Of course, for realistic problems, the state x is multidimensional and the density is more complex (and, in the applications presented later, learned from training sequences). Measurement Observations z t are assumed to be independent, both mutually and with respect to the dynamical process. This is expressed probabilistically as follows: Note that integrating over x t implies the mutual conditional independence of observations: The observation process is therefore defined by specifying the conditional density p(z t | x t ) at each time t, and later, in computational examples, we take this to be a time-independent function p(z | x). Suffice it to say for now that, in clutter, the observation density is multi-modal. Details will be given in Section 6. Propagation Given a continuous-valued Markov chain with independent observations, the conditional state-density p t at time t is defined by This represents all information about the state at time t that is deducible from the entire data-stream up to that time. The rule for propagation of state density over time is where and k t is a normalisation constant that does not depend on x t . The validity of the rule is proved in the appendix. The propagation rule (4) should be interpreted simply as the equivalent of the Bayes' rule (6) for inferring posterior state density from data, for the time-varying case. The effective prior p(x t | Z t−1 ) is actually a prediction taken from the posterior p(x t−1 | Z t−1 ) from the previous time-step, onto which is superimposed one time-step from the dynamical model (Fokker-Planck drift plus diffusion as in Factored Sampling This section describes first the factored sampling algorithm dealing with non-Gaussian observations in single images. Then factored sampling is extended in the following section to deal with temporal image sequences. A standard problem in statistical pattern recognition is to find an object parameterised as x with prior p(x), using data z from a single image. The posterior density p(x | z) represents all the knowledge about x that is deducible from the data. It can be evaluated, in principle, by applying Bayes' rule where k is a normalisation constant that is independent of x. In cases where p(z | x) is sufficiently complex that p(x | z) cannot be evaluated simply in closed form, iterative sampling techniques can be used Condensation-Conditional Density Propagation for Visual Tracking 9 Figure 3. Factored sampling: a set of points s (n) , the centres of the blobs in the figure, is sampled randomly from a prior density p(x). Each sample is assigned a weight π i (depicted by blob area) in proportion to the value of the observation density p(z | x = s (n) ). The weighted point-set then serves as a representation of the posterior density p(x | z), suitable for sampling. The one-dimensional case illustrated here extends naturally to the practical case that the density is defined over several position and shape variables. n ∈ {1, . . . , N} is chosen with probability π n , where the conditional observation density. The value x = x n chosen in this fashion has a distribution which approximates the posterior p(x | z) increasingly accurately as N increases Note that posterior mean properties E[g(x) | z] can be generated directly from the samples {s (n) } by weighting with p z (x) to give: For example, the mean can be estimated using g(x) = x (illustrated in (1) , . . . , s (N) }. Otherwise, for low-dimensional parameterisations as in this paper, standard, direct methods can be used for Gaussians 2 The CONDENSATION Algorithm The Condensation algorithm is based on factored sampling but extended to apply iteratively to successive images in a sequence. The same sampling strategy has been developed elsewhere Given that the process at each time-step is a selfcontained iteration of factored sampling, the output of an iteration will be a weighted, time-stamped sample-set, denoted {s (n) t , n = 1, . . . , N} with weights π (n) t , representing approximately the conditional statedensity p(x t | Z t ) at time t. How is this sample-set obtained? Clearly, the process must begin with a prior density and the effective prior for time-step t should be p(x t | Z t−1 ). This prior is of course multi-modal in general and no functional representation of it is available. It is derived from the sample set representation {(s , the output from the previous time-step, to which prediction (5) must then be applied. The iterative process as applied to sample-sets, depicted in The aim is to maintain, at successive time-steps, sample sets of fixed size N , 10 Isard and Blake Condensation-Conditional Density Propagation for Visual Tracking 11 so that the algorithm can be guaranteed to run within a given computational resource. The first operation therefore is to sample (with replacement) N times from the set {s (n) t−1 }, choosing a given element with probability π (n) t−1 . Some elements, especially those with high weights, may be chosen several times, leading to identical copies of elements in the new set. Others with relatively low weights may not be chosen at all. Each element chosen from the new set is now subjected to the predictive steps. First, an element undergoes drift and, since this is deterministic, identical elements in the new set undergo the same drift. This is apparent in the t )} of state-density for time t. One of the striking properties of the Condensation algorithm is its simplicity, compared with the Kalman filter, despite its generality. Largely, this is due to the absence of the Riccati equation which appears in the Kalman filter for the propagation of covariance. The Riccati equation is relatively complex computationally but is not required in the Condensation algorithm which instead deals with variability by sampling, involving the repeated computation of a relatively simple propagation formula. Stochastic Dynamical Models for Curve Motion In order to apply the Condensation algorithm, which is general, to tracking curves in image-streams, specific probability densities must be established both for the dynamics of the object and for the observation process. In the examples described here, x is a linear parameterisation of the curve and allowed transformations of the curve are represented by linear transformations of x. The Condensation algorithm itself does not demand necessarily a linear parameterisation though linearity is an attraction for another reason-the availability of algorithms to learn object dynamics. The algorithm could also be used, in principle, with nonlinear parameterised kinematics-for instance, representing an articulated hand in terms of joint angles Linear Parameterisations of Splines for Tracking We represent the state of a tracked object following methods established for tracking using a Kalman filter where B(s) is a vector . . . , B N B (s)) T of B-spline basis functions, Q x and Q y are vectors of Bspline control point coordinates and L is the number of spans. It is usually desirable where the matrix W is a matrix of rank N X considerably lower than the 2N B degrees of freedom of the unconstrained spline. Typically the shape-space may allow affine deformations of the template shapeQ, or more generally a space of rigid and non-rigid deformations. The space is constructed by applying an appropriate combination of three methods to build a W -matrix: 1. determining analytically combinations of contours derived from one or more views 12 Isard and Blake 1993; Dynamical Model Exploiting earlier work on dynamical modelling where w t are independent vectors of independent standard normal variables, the state-vector and wherex is the mean value of the state and A, B are matrices representing the deterministic and stochastic components of the dynamical model, respectively. The system is a set of damped oscillators, whose modes, natural frequencies and damping constants are determined by A, driven by random accelerations coupled Condensation-Conditional Density Propagation for Visual Tracking 13 into the dynamics via B from the noise term Bw. While it is possible to set sensible defaults for A,x and B, it is more satisfactory and effective to estimate them from input data taken while the object performs typical motions. Methods for doing this via Maximum Likelihood Estimation are essential to the work described here and are described fully elsewhere where · · · is the Euclidean norm. It is therefore clear that the learned dynamical models are appropriate for use in the Condensation algorithm. Initial Conditions Initial conditions for tracking can be determined by specifying the prior density p(x 0 ), and if this is Gaussian, direct sampling can be used to initialise the Condensation algorithm. Alternatively, it is possible simply to allow the density p(x t ) to settle to a steady state p(x ∞ ), in the absence of object measurements. Provided the learned dynamics are stable (free of undamped oscillations) a unique steady state exists. is Gaussian with parameters that can be computed by iterating the Riccati equation (Background clutter, if present, will modify and bias this envelope to some extent.) Then, as soon as the foreground object arrives and is measured, the density p(x t ) begins to evolve appropriately. Observation Model The observation process defined by p(z t | x t ) is assumed here to be stationary in time (though the Condensation algorithm does not necessarily demand this) so a static function p(z | x) needs to be specified. As yet we have no capability to estimate it from data, though that would be ideal, so some reasonable assumptions must be made. First, a measurement model for one-dimensional data with clutter is suggested. Then an extension is proposed for twodimensional observations that is also used later in computational experiments. One-Dimensional Observations in Clutter In one dimension, observations reduce to a set of scalar positions {z = (z 1 , z 2 , . . . , z M )} and the observation density has the form p(z | x) where x is one-dimensional position. The multiplicity of measurements reflects the presence of clutter so either one of the events occurs, or else the target object is not visible with probability q = 1 − m P(φ m ). Such reasoning about clutter and false alarms is commonly used in target tracking A reasonable functional form for this can be obtained by making some specific assumptions: that 3 P(φ m ) = p, ∀m, that the clutter is a Poisson process along the line with spatial density λ and that any true target measurement is unbiased and normally distributed with standard deviation σ . This leads to where α = qλ and ν m = z m − x, and is illustrated in 2σ for instance because x is assumed to lie always within the image window. In that case, by Bayes' theorem, -the event ψ M provides no additional information about the position x. (If x is allowed also to fall outside the image window then the event ψ M is informative: a value of M well above the mean value for the background clutter enhances the probability that x lies within the window.) Two-Dimensional Observations In a two-dimensional image, the set of observations z is, in principle, the entire set of features visible in the image. However, an important aspect of earlier systems in achieving real-time performance The observation density p(z | x) in two dimensions describes the distribution of a (linearly) parameterised image curve z(s), given a hypothetical shape in the form of a curve r(s), 0 ≤ s ≤ 1, represented by a shape parameter x. The two-dimensional density be derived as an extension of the one-dimensional case. It is assumed that a mapping g(s) is known that associates each point z(s) on the image curve with a point r(g(s)) on the shape. In practice, this mapping is set up by tracing normals from the curve r. Note that g(s) is not necessarily injective because z(s) includes clutter as well as foreground features. Next, the one-dimensional density (13) is approximated in a more amenable form that neglects the possibility of more than one feature lying inside the search interval: Condensation-Conditional Density Propagation for Visual Tracking 15 Figure 8. Observation process: the thick line is a hypothesised shape, represented as a parametric spline curve. The spines are curve normals along which high-contrast features (white crosses) are sought. µ = √ 2σ log(1/ √ 2πασ ) is a spatial scale constant, and ν 1 is the ν m with smallest magnitude, representing the feature lying closest to the hypothesised position x. A natural extension to two dimensions is then in which r is a variance constant and z 1 (s) is the closest associated feature to r(s): Note that the constant of proportionality ("partition function") Z (x) is an unknown function. We make the assumption that the variation of Z with x is slow compared with the other term in (15) so that Z can be treated as constant. It remains to establish whether this assumption is justified. The observation density (15) can be computed via a discrete approximation, the simplest being: where s m = m/M. This is simply the product of one-dimensional densities (14) with σ = √ r M, evaluated independently along M curve normals as in Applying the CONDENSATION Algorithm to Video-Streams Four examples are shown here of the practical efficacy of the Condensation algorithm. Movie (MPEG) versions of some results are available on the web at http://www.robots.ox.ac.uk/~ab/. 16 Isard and Blake Tracking a Multi-Modal Distribution The ability of the Condensation algorithm to represent multi-modal distributions was tested using a 70 frame (2.8 s) sequence of a cluttered room containing three people each facing the camera