Mean shift: A robust approach toward feature space analysis
 In PAMI
, 2002
"... A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence ..."
Cited by 1461 (34 self)
A general nonparametric technique is proposed for the analysis of a complex multimodal feature space and to delineate arbitrarily shaped clusters in it. The basic computational module of the technique is an old pattern recognition procedure, the mean shift. We prove for discrete data the convergence of a recursive mean shift procedure to the nearest stationary point of the underlying density function and thus its utility in detecting the modes of the density. The equivalence of the mean shift procedure to the Nadaraya–Watson estimator from kernel regression and the robust Mestimators of location is also established. Algorithms for two lowlevel vision tasks, discontinuity preserving smoothing and image segmentation are described as applications. In these algorithms the only user set parameter is the resolution of the analysis, and either gray level or color images are accepted as input. Extensive experimental results illustrate their excellent performance.
Image analogies
, 2001
"... Figure 1 An image analogy. Our problem is to compute a new “analogous ” image B ′ that relates to B in “the same way ” as A ′ relates to A. Here, A, A ′ , and B are inputs to our algorithm, and B ′ is the output. The fullsize images are shown in Figures 10 and 11. This paper describes a new framewo ..."
Cited by 353 (8 self)
Figure 1 An image analogy. Our problem is to compute a new “analogous ” image B ′ that relates to B in “the same way ” as A ′ relates to A. Here, A, A ′ , and B are inputs to our algorithm, and B ′ is the output. The fullsize images are shown in Figures 10 and 11. This paper describes a new framework for processing images by example, called “image analogies. ” The framework involves two stages: a design phase, in which a pair of images, with one image purported to be a “filtered ” version of the other, is presented as “training data”; and an application phase, in which the learned filter is applied to some new target image in order to create an “analogous” filtered result. Image analogies are based on a simple multiscale autoregression, inspired primarily by recent results in texture synthesis. By choosing different types of source image pairs as input, the framework supports a wide variety of “image filter ” effects, including traditional image filters, such as blurring or embossing; improved texture synthesis, in which some textures are synthesized with higher quality than by previous approaches; superresolution, in which a higherresolution image is inferred from a lowresolution source; texture transfer, in which images are “texturized ” with some arbitrary source texture; artistic filters, in which various drawing and painting styles are synthesized based on scanned realworld examples; and texturebynumbers, in which realistic scenes, composed of a variety of textures, are created using a simple painting interface.
Implicit Probabilistic Models of Human Motion for Synthesis and Tracking Hedvig Sidenblen
 In European Conference on Computer Vision
, 2002
"... This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthe ..."
Cited by 167 (4 self)
This paper addresses the problem of probabilistically modeling 3D human motion for synthesis and tracking. Given the high dimensional nature of human motion, learning an explicit probabilistic model from available training data is currently impractical. Instead we exploit methods from texture synthesis that treat images as representing an implicit empirical distribution . These methods replace the problem of representing the probability of a texture pattern with that of searching the training data for similar instances of that pattern. We extend this idea to temporal data representing 3D human motion with a large database of example motions. To make the method useful in practice, we must address the problem of efficient search in a large training set
Statistical Models for Images: Compression, Restoration and Synthesis
 In 31st Asilomar Conf on Signals, Systems and Computers
, 1997
"... this paper, we examine the problem of decomposing digitized images, through linear and/or nonlinear transformations, into statistically independent components. The classical approach to such a problem is Principal Components Analysis (PCA), also known as the KarhunenLoeve (KL) or Hotelling transfor ..."
Cited by 138 (33 self)
this paper, we examine the problem of decomposing digitized images, through linear and/or nonlinear transformations, into statistically independent components. The classical approach to such a problem is Principal Components Analysis (PCA), also known as the KarhunenLoeve (KL) or Hotelling transform. This is a linear transform that removes secondorder dependencies between input pixels. The most wellknown description of image statistics is that their power spectra take the form of a power law [e.g., 20, 11, 24]. Coupled with a constraint of translationinvariance, this suggests that the Fourier transform is an appropriate PCA representation. Fourier and related representations are widely used in image processing applications.
Distribution Free Decomposition of Multivariate Data
 Pattern Analysis and Applications
, 1998
"... We present a practical approach to nonparametric cluster analysis of large data sets. The number of clusters and the cluster centers are automatically derived by mode seeking with the mean shift procedure on a reduced set of points randomly selected from the data. The cluster boundaries are delineat ..."
Cited by 64 (16 self)
We present a practical approach to nonparametric cluster analysis of large data sets. The number of clusters and the cluster centers are automatically derived by mode seeking with the mean shift procedure on a reduced set of points randomly selected from the data. The cluster boundaries are delineated using a knearest neighbor technique. The proposed algorithm is stable and efficient, a 10000 point data set being decomposed in only a few seconds. Complex clustering examples and applications are discussed, and convergence of the gradient ascent mean shift procedure is demonstrated for arbitrary distribution and cardinality of the data. Keywords: Nonparametric cluster analysis, mode seeking, gradient density estimation, mean shift procedure, convergence, range searching. 1 Introduction In image understanding the feature spaces derived from real data most often have a complex structure and a priori information to guide the analysis may not be available. The significant features whose ...
Computer Identification of Musical Instruments Using Pattern Recognition With Cepstral Coefficients as Features
, 1997
"... Cepstral coefficients based on a constant Q transform have been calculated for 28 short (12 s) oboe sounds and 52 short saxophone sounds. These were used as features in a pattern analysis to determine for each of these sounds comprising the test set whether it belongs to the oboe or to the sax clas ..."
Cited by 57 (0 self)
Cepstral coefficients based on a constant Q transform have been calculated for 28 short (12 s) oboe sounds and 52 short saxophone sounds. These were used as features in a pattern analysis to determine for each of these sounds comprising the test set whether it belongs to the oboe or to the sax class. The training set consisted of longer sounds of 1 minute or more for each of the instruments. A kmeans algorithm was used to calculate clusters for the training data, and Gaussian probability density functions were formed from the mean and variance of each of the clusters. Each member of the test set was then analyzed to determine the probability that it belonged to each of the two classes; and a Bayes decision rule was invoked to assign it to one of the classes. Results have been extremely good and are compared to a human perception experiment identifying a subset of these same sounds.
A Society of Models for Video and Image Libraries
, 1996
"... The average person with a computer will soon have access to the world's collections of digital video and images. However, unlike text which can be alphabetized or numbers which can be ordered, image and video has no general language to aid in its organization. Although tools which can "see" and "und ..."
Cited by 53 (0 self)
The average person with a computer will soon have access to the world's collections of digital video and images. However, unlike text which can be alphabetized or numbers which can be ordered, image and video has no general language to aid in its organization. Although tools which can "see" and "understand" the content of imagery are still in their infancy, they are now at the point where they can provide substantial assistance to users in navigating through visual media. This paper describes new tools based on "vision texture" for modeling image and video. The focus of this research is the use of a society of lowlevel models for performing relatively highlevel tasks, such as retrieval and annotation of image and video libraries. This paper surveys our recent and present research in this fastgrowing area. 1 Introduction: Vision Texture Suppose you have a set of vacation photos of Paris and the surrounding countryside, and you accidentally drop them on the floor. They get out of or...
A clusterbased statistical model for object detection
 In Proc. IEEE Conference on Computer Vision and Pattern Recognition
, 1999
"... This paper presents an approach to object detection which is based on recent work in statistical models for texture synthesis and recognition [7, 4, 23, 17]. Our method follows the texture recognition work of De Bonet and Viola [4]. We use feature vectors which capture the joint occurrence oflocal f ..."
Cited by 34 (0 self)
This paper presents an approach to object detection which is based on recent work in statistical models for texture synthesis and recognition [7, 4, 23, 17]. Our method follows the texture recognition work of De Bonet and Viola [4]. We use feature vectors which capture the joint occurrence oflocal features at multiple resolutions. The distribution of feature vectors for a set of training images of an object class is estimated by clustering the data and then forming a mixture of gaussian model. The mixture model is further re ned by determining which clusters are the most discriminative for the class and retaining only those clusters. After the model is learned, test images are classi ed by computing the likelihood of their feature vectors with respect to the model. We present promising results in applying our technique to face detection and car detection. 1
Efficient clustering and matching for object class recognition
 In Proc. BMVC
, 2006
"... In this paper we address the problem of building object class representations based on local features and fast matching in a large database. We propose an efficient algorithm for hierarchical agglomerative clustering. We examine different agglomerative and partitional clustering strategies and compa ..."
Cited by 27 (3 self)
In this paper we address the problem of building object class representations based on local features and fast matching in a large database. We propose an efficient algorithm for hierarchical agglomerative clustering. We examine different agglomerative and partitional clustering strategies and compare the quality of obtained clusters. Our combination of partitionalagglomerative clustering gives significant improvement in terms of efficiency while maintaining the same quality of clusters. We also propose a method for building data structures for fast matching in high dimensional feature spaces. These improvements allow to deal with large sets of training data typically used in recognition of multiple object classes. 1
A Parametric Texture Model Based on Joint . . .
 INTERNATIONAL JOURNAL OF COMPUTER VISION
, 2000
"... We present a universal statistical model for texture images in the context of an overcomplete complex wavelet transform. The model is parameterized by a set of statistics computed on pairs of coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. We de ..."
Cited by 26 (0 self)
We present a universal statistical model for texture images in the context of an overcomplete complex wavelet transform. The model is parameterized by a set of statistics computed on pairs of coefficients corresponding to basis functions at adjacent spatial locations, orientations, and scales. We develop an efficient algorithm for synthesizing random images subject to these constraints, by iteratively projecting onto the set of images satisfying each constraint, and we use this to test the perceptual validity of the model. In particular, we demonstrate the necessity of subgroups of the parameter set by showing examples of texture synthesis that fail when those parameters are removed from the set. We also demonstrate the power of our model by successfully synthesizing examples drawn from a diverse collection of artificial and natural textures.