Results 1 - 10
of
20
Real-Time Tracking of Non-Rigid Objects using Mean Shift
- IEEE CVPR 2000
, 2000
"... A new method for real-time tracking of non-rigid objects seen from a moving camera isproposed. The central computational module is based on the mean shift iterations and nds the most probable target position in the current frame. The dissimilarity between the target model (its color distribution) an ..."
Abstract
-
Cited by 424 (16 self)
- Add to MetaCart
A new method for real-time tracking of non-rigid objects seen from a moving camera isproposed. The central computational module is based on the mean shift iterations and nds the most probable target position in the current frame. The dissimilarity between the target model (its color distribution) and the target candidates is expressed by a metric derived from the Bhattacharyya coefficient. The theoretical analysis of the approach shows that it relates to the Bayesian framework while providing a practical, fast and efficient solution. The capability of the tracker to handle in real-time partial occlusions, significant clutter, and target scale variations, is demonstrated for several image sequences.
The Variable Bandwidth Mean Shift and Data-Driven Scale Selection
- in Proc. 8th Intl. Conf. on Computer Vision
, 2001
"... We present two solutions for the scale selection problem in computer vision. The first one is completely nonparametric and is based on the the adaptive estimation of the normalized density gradient. Employing the sample point estimator, we define the Variable Bandwidth Mean Shift, prove its converge ..."
Abstract
-
Cited by 73 (9 self)
- Add to MetaCart
We present two solutions for the scale selection problem in computer vision. The first one is completely nonparametric and is based on the the adaptive estimation of the normalized density gradient. Employing the sample point estimator, we define the Variable Bandwidth Mean Shift, prove its convergence, and show its superiority over the fixed bandwidth procedure. The second technique has a semiparametric nature and imposes a local structure on the data to extract reliable scale information. The local scale of the underlying density is taken as the bandwidth which maximizes the magnitude of the normalized mean shift vector. Both estimators provide practical tools for autonomous image and quasi real-time video analysis and several examples are shown to illustrate their effectiveness. 1 Motivation for Variable Bandwidth The efficacy of Mean Shift analysis has been demonstrated in computer vision problems such as tracking and segmentation in [5, 6]. However, one of the limitations of the mean shift procedure as defined in these papers is that it involves the specification of a scale parameter. While results obtained appear satisfactory, when the local characteristics of the feature space differs significantly across data, it is difficult to find an optimal global bandwidth for the mean shift procedure. In this paper we address the issue of locally adapting the bandwidth. We also study an alternative approach for data-driven scale selection which imposes a local structure on the data. The proposed solutions are tested in the framework of quasi real-time video analysis. We review first the intrinsic limitations of the fixed bandwidth density estimation methods. Then, two of the most popular variable bandwidth estimators, the balloon and the sample point, are introduced and...
Robust Detection and Tracking of Human Faces with an Active Camera
, 2000
"... We present an efficient framework for the detection and tracking of human faces with an active camera. The Bhattacharyya coefficient is employed as a similarity measure between the color distribution of the face model and face candidates. The proper derivation of these distributions allows the use o ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
We present an efficient framework for the detection and tracking of human faces with an active camera. The Bhattacharyya coefficient is employed as a similarity measure between the color distribution of the face model and face candidates. The proper derivation of these distributions allows the use of the spatial gradient of the Bhattacharyya coefficient to guide a fast search for the best face candidate. The optimization, which is based on mean shift analysis, requires only a few iterations to converge. Scale changes of the trackedfaceare handled by exploiting the scale invariance of the similarity measure and the luminance gradient computed on the border of the hypothesized face region. The detection and tracking modules are almost identical, the difference being that the detection involves mean shift optimization with multiple initializations. Our dual-mode implementation of the camer controller determines the pan, tilt, and zoom camera to switch between smooth pursuit and saccadic movements, as a function of the target presence in the fovea region. The resulting system runs in real-time on a standard PC, being robust to partial occlusion, clutter, facescale variations, rotations in depth, and fast changes in subject/camera position.
Efficient Kernel Density Estimation using the Fast Gauss Transform with Applications to Color Modeling and Tracking
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... The study of many vision problems is reduced to the estimation of a probability density function from observations. Kernel density estimation techniques are quite general and powerful methods for this problem, but have a significant disadvantage in that they are computationally intensive. In this pa ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
The study of many vision problems is reduced to the estimation of a probability density function from observations. Kernel density estimation techniques are quite general and powerful methods for this problem, but have a significant disadvantage in that they are computationally intensive. In this paper we explore the use of kernel density estimation with the fast gauss transform (FGT) for problems in vision. The FGT allows the summation of a mixture of M Gaussians at N evaluation points in O(M + N) timeasopposedtoO(MN)time for a naive evaluation, and can be used to considerably speed up kernel density estimation. We present applications of the technique to problems from image segmentation and tracking, and show that the algorithm allows application of advanced statistical techniques to solve practical vision problems in real time with today’s computers. 1
Gamut constrained illuminant estimation
- International Journal of Computer Vision
, 2006
"... This paper presents a novel solution to the illuminant estimation problem: the problem of how, given an image of a scene taken under an unknown illuminant, we can recover an estimate of that light. The work is founded on previous gamut mapping solutions to the problem which solve for a scene illumin ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
This paper presents a novel solution to the illuminant estimation problem: the problem of how, given an image of a scene taken under an unknown illuminant, we can recover an estimate of that light. The work is founded on previous gamut mapping solutions to the problem which solve for a scene illuminant by determining the set of diagonal mappings which take image data captured under an unknown light to a gamut of reference colours taken under a known light. Unfortunately a diagonal model is not always a valid model of illumination change and so previous approaches sometimes return a null solution. In addition, previous methods are difficult to implement. We address these problems by recasting the problem as one of illuminant classification: we define aprioria set of plausible lights thus ensuring that a scene illuminant estimate will always be found. A plausible light is represented by the gamut of colours observable under it and the illuminant in an image is classified by determining the plausible light whose gamut is most consistent with the image data. We show that this step (the main computational burden of the algorithm) can be performed simply, quickly, and efficiently by means of a non-negative least-squares optimisation. We report results on a large set of real images which show that it provides excellent illuminant estimation, outperforming previous algorithms. 1.
Diffusions And Confusions In Signal And Image Processing
, 2001
"... . In this paper we link, through simple examples, between three basic approaches for signal and image denoising and segmentation: 1)PDE axiomatics 2) energy minimization 3) adaptive ltering. We show the relation between PDE's that are derived from a master energy functional, i.e. the Polyakov harmon ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
. In this paper we link, through simple examples, between three basic approaches for signal and image denoising and segmentation: 1)PDE axiomatics 2) energy minimization 3) adaptive ltering. We show the relation between PDE's that are derived from a master energy functional, i.e. the Polyakov harmonic action, and non-linear lters of robust statistics. This relation gives a simple and intuitive way of understanding geometric dierential lters like the Beltrami ow. The relation between PDE's and lters is mediated through the short time kernel. 1.
Mean Shift And Optimal Prediction For Efficient Object Tracking
- Tracking, International Conference on Image Processing
, 2000
"... A new paradigm for the efficient color-based tracking of objects seen from a moving camera is presented. The proposed technique employs the mean shift analysis to derive the target candidate that is the most similar to a given target model, while the prediction of the next target location is compute ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
A new paradigm for the efficient color-based tracking of objects seen from a moving camera is presented. The proposed technique employs the mean shift analysis to derive the target candidate that is the most similar to a given target model, while the prediction of the next target location is computed with a Kalman filter. The dissimilarity between the target model and the target candidates is expressed by a metric based on the Bhattacharyya coefficient. The implementation of the new method achieves real-time performance, being appropriate for a large variety of objects with different color patterns. The resulting tracking, tested on various sequences, is robust to partial occlusion, significant clutter, target scale variations, rotations in depth, and changes in camera position. 1.
Support vector machine based multi-view face detection and recognition
, 2004
"... Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant non-linear variation caused by rotation in depth, self-occlusion and self-shadowing. To address this problem, a novel approach is presented in this paper. The view sphere is se ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Detecting faces across multiple views is more challenging than in a fixed view, e.g. frontal view, owing to the significant non-linear variation caused by rotation in depth, self-occlusion and self-shadowing. To address this problem, a novel approach is presented in this paper. The view sphere is separated into several small segments. On each segment, a face detector is constructed. We explicitly estimate the pose of an image regardless of whether or not it is a face. A pose estimator is constructed using Support Vector Regression. The pose information is used to choose the appropriate face detector to determine if it is a face. With this pose-estimation based method, considerable computational efficiency is achieved. Meanwhile, the detection accuracy is also improved since each detector is constructed on a small range of views. We developed a novel algorithm for face detection by combining the Eigenface and SVM methods which performs almost as fast as the Eigenface method but with a significant improved speed. Detailed experimental results are presented in this paper including tuning the parameters of the pose estimators and face detectors, performance evaluation, and applications to video based face detection and frontal-view face recognition.
ISee: Perceptual features for image library navigation
, 2002
"... To develop more satisfying image navigation systems, we need tools to construct a semantic bridge between the user and the database. In this paper we present an image indexing scheme and a query language, which allow the user to introduce a cognitive dimension to the search. At an abstract level, th ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
To develop more satisfying image navigation systems, we need tools to construct a semantic bridge between the user and the database. In this paper we present an image indexing scheme and a query language, which allow the user to introduce a cognitive dimension to the search. At an abstract level, this approach consists of: 1) learning the "natural language" that humans speak to communicate their semantic experience of images, 2) understand the relationships between this language and objective measurable image attributes, and then 3) develop the corresponding feature extraction schemes. In our previous work we have conducted a number of subjective experiments in which we asked human subjects to group images, and then explain verbally why they did so [1]. The results of this study indicated that part of the abstraction involved in image interpretation is often driven by semantic categories, which can be broken into more tangible semantic entities, i.e. objective semantic indicators. By analyzing our experimental data, we identified some candidate semantic categories (i.e. portraits, people, crowds, cityscapes, landscapes, etc.), discovered their underlying semantic indicators (i.e. skin, sky, water, object, etc.), and derived important lowlevel image descriptors accounting for our perception of these indicators. In our recent work we have used these findings to develop a set of image features that match the way humans communicate image meaning, and a "semantic-friendly" query language for browsing and searching diverse collections of images. We have implemented our approach into an Internet search engine, ISee, and tested it on a large number of images. The results we obtained are very promising.
A Method For Color Naming And Description Of Color Composition In Images
- in Proc. IEEE Int. Conf. Image Processing
, 2002
"... Color is one of the main visual cues and has been frequently used in image processing, analysis and retrieval. The extraction of highlevel color descriptors is an increasingly important problem, as these descriptions often provide link to image content. When combined with image segmentation color na ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Color is one of the main visual cues and has been frequently used in image processing, analysis and retrieval. The extraction of highlevel color descriptors is an increasingly important problem, as these descriptions often provide link to image content. When combined with image segmentation color naming can be used to select objects by color, describe the appearance of the image and even generate semantic annotations. For example, regions labeled as light blue and strong green may represent sky and grass, vivid colors are typically found in man-made objects, and modifiers such as brownish, grayish and dark convey the impression of the atmosphere in the scene. This paper presents a computational model for color categorization, naming and extraction of color composition. In this work we start from the National Bureau of Standards' recommendation for color names [4], and through subjective experiments develop our color vocabulary and syntax. Next, to attach the color name to an arbitrary input color, we design a perceptually based color naming metric. Finally, we extend the method and develop a scheme for extracting the color composition of a complex image. The algorithm follows the relevant neurophysiological findings and studies on human color categorization. In testing the method the known color regions in different color spaces were identified accurately, the color names assigned to randomly selected colors agreed with human judgments, and the color composition extracted from natural images was consistent with human observations.

