Results 1 - 10
of
510
Rapid object detection using a boosted cascade of simple features
- ACCEPTED CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION 2001
, 2001
"... This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "Inte ..."
Abstract
-
Cited by 1371 (6 self)
- Add to MetaCart
This paper describes a machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates. This work is distinguished by three key contributions. The first is the introduction of a new image representation called the "Integral Image" which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features from a larger set and yields extremely efficient classifiers[6]. The third contribution is a method for combining increasingly more complex classifiers in a "cascade" which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. The cascade can be viewed as an object specific focus-of-attention mechanism which unlike previous approaches provides statistical guarantees that discarded regions are unlikely to contain the object of interest. In the domain of face detection the system yields detection rates comparable to the best previous systems. Used in real-time applications, the detector runs at 15 frames per second without resorting to image differencing or skin color detection.
Geodesic Active Contours
, 1997
"... A novel scheme for the detection of object boundaries is presented. The technique is based on active contours evolving in time according to intrinsic geometric measures of the image. The evolving contours naturally split and merge, allowing the simultaneous detection of several objects and both in ..."
Abstract
-
Cited by 799 (41 self)
- Add to MetaCart
A novel scheme for the detection of object boundaries is presented. The technique is based on active contours evolving in time according to intrinsic geometric measures of the image. The evolving contours naturally split and merge, allowing the simultaneous detection of several objects and both interior and exterior boundaries. The proposed approach is based on the relation between active contours and the computation of geodesics or minimal distance curves. The minimal distance curve lays in a Riemannian space whose metric is defined by the image content. This geodesic approach for object segmentation allows to connect classical "snakes" based on energy minimization and geometric active contours based on the theory of curve evolution. Previous models of geometric active contours are improved, allowing stable boundary detection when their gradients suffer from large variations, including gaps. Formal results concerning existence, uniqueness, stability, and correctness of the evolution are presented as well. The scheme was implemented using an efficient algorithm for curve evolution. Experimental results of applying the scheme to real images including objects with holes and medical data imagery demonstrate its power. The results may be extended to 3D object segmentation as well.
A PERFORMANCE EVALUATION OF LOCAL DESCRIPTORS
, 2005
"... In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their perfo ..."
Abstract
-
Cited by 774 (24 self)
- Add to MetaCart
In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [3], steerable filters [12], PCA-SIFT [19], differential invariants [20], spin images [21], SIFT [26], complex filters [37], moment invariants [43], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor, and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors.
An affine invariant interest point detector
- In Proceedings of the 7th European Conference on Computer Vision
, 2002
"... Abstract. This paper presents a novel approach for detecting affine invariant interest points. Our method can deal with significant affine transformations including large scale changes. Such transformations introduce significant changes in the point location as well as in the scale and the shape of ..."
Abstract
-
Cited by 670 (39 self)
- Add to MetaCart
Abstract. This paper presents a novel approach for detecting affine invariant interest points. Our method can deal with significant affine transformations including large scale changes. Such transformations introduce significant changes in the point location as well as in the scale and the shape of the neighbourhood of an interest point. Our approach allows to solve for these problems simultaneously. It is based on three key ideas: 1) The second moment matrix computed in a point can be used to normalize a region in an affine invariant way (skew and stretch). 2) The scale of the local structure is indicated by local extrema of normalized derivatives over scale. 3) An affine-adapted Harris detector determines the location of interest points. A multi-scale version of this detector is used for initialization. An iterative algorithm then modifies location, scale and neighbourhood of each point and converges to affine invariant points. For matching and recognition, the image is characterized by a set of affine invariant points; the affine transformation associated with each point allows the computation of an affine invariant descriptor which is also invariant to affine illumination changes. A quantitative comparison of our detector with existing ones shows a significant improvement in the presence of large affine deformations. Experimental results for wide baseline matching show an excellent performance in the presence of large perspective transformations including significant scale changes. Results for recognition are very good for a database with more than 5000 images.
Robust Real-time Object Detection
- International Journal of Computer Vision
, 2001
"... This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image ” which allows the features ..."
Abstract
-
Cited by 570 (4 self)
- Add to MetaCart
This paper describes a visual object detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image ” which allows the features used by our detector to be computed very quickly. The second is a learning algorithm, based on AdaBoost, which selects a small number of critical visual features and yields extremely efficient classifiers [6]. The third contribution is a method for combining classifiers in a “cascade ” which allows background regions of the image to be quickly discarded while spending more computation on promising object-like regions. A set of experiments in the domain of face detection are presented. The system yields face detection performace comparable to the best previous systems [18, 13, 16, 12, 1]. Implemented on a conventional desktop, face detection proceeds at 15 frames per second. 1.
Shiftable Multi-scale Transforms
, 1992
"... Orthogonal wavelet transforms have recently become a popular representation for multiscale signal and image analysis. One of the major drawbacks of these representations is their lack of translation invariance: the content of wavelet subbands is unstable under translations of the input signal. Wavel ..."
Abstract
-
Cited by 365 (34 self)
- Add to MetaCart
Orthogonal wavelet transforms have recently become a popular representation for multiscale signal and image analysis. One of the major drawbacks of these representations is their lack of translation invariance: the content of wavelet subbands is unstable under translations of the input signal. Wavelet transforms are also unstable with respect to dilations of the input signal, and in two dimensions, rotations of the input signal. We formalize these problems by defining a type of translation invariance that we call "shiftability". In the spatial domain, shiftability corresponds to a lack of aliasing; thus, the conditions under which the property holds are specified by the sampling theorem. Shiftability may also be considered in the context of other domains, particularly orientation and scale. We explore "jointly shiftable" transforms that are simultaneously shiftable in more than one domain. Two examples of jointly shiftable transforms are designed and implemented: a one-dimensional tran...
Pyramid-Based Texture Analysis/Synthesis
, 1995
"... This paper describes a method for synthesizing images that match the texture appearanceof a given digitized sample. This synthesis is completely automatic and requires only the "target" texture as input. It allows generation of as much texture as desired so that any object can be covered. It can be ..."
Abstract
-
Cited by 330 (0 self)
- Add to MetaCart
This paper describes a method for synthesizing images that match the texture appearanceof a given digitized sample. This synthesis is completely automatic and requires only the "target" texture as input. It allows generation of as much texture as desired so that any object can be covered. It can be used to produce solid textures for creating textured 3-d objects without the distortions inherent in texture mapping. It can also be used to synthesize texture mixtures, images that look a bit like each of several digitized samples. The approach is based on a model of human texture perception, and has potential to be a practically useful tool for graphics applications. 1 Introduction Computer renderings of objects with surface texture are more interesting and realistic than those without texture. Texture mapping [15] is a technique for adding the appearance of surface detail by wrapping or projecting a digitized texture image ontoa surface. Digitized textures can be obtained from a variety ...
Pictorial Structures for Object Recognition
- IJCV
, 2003
"... In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance ..."
Abstract
-
Cited by 305 (13 self)
- Add to MetaCart
In this paper we present a statistical framework for modeling the appearance of objects. Our work is motivated by the pictorial structure models introduced by Fischler and Elschlager. The basic idea is to model an object by a collection of parts arranged in a deformable configuration. The appearance of each part is modeled separately, and the deformable configuration is represented by spring-like connections between pairs of parts. These models allow for qualitative descriptions of visual appearance, and are suitable for generic recognition problems. We use these models to address the problem of detecting an object in an image as well as the problem of learning an object model from training examples, and present efficient algorithms for both these problems. We demonstrate the techniques by learning models that represent faces and human bodies and using the resulting models to locate the corresponding objects in novel images.
Blobworld: Image segmentation using Expectation-Maximization and its application to image querying
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1999
"... Retrieving images from large and varied collections using image content as a key is a challenging and important problem. We present a new image representation which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture. This "Blobwo ..."
Abstract
-
Cited by 265 (8 self)
- Add to MetaCart
Retrieving images from large and varied collections using image content as a key is a challenging and important problem. We present a new image representation which provides a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture. This "Blobworld" representation is created by clustering pixels in a joint color-texture-position feature space. The segmentation algorithm is fully automatic and has been run on a collection of 10,000 natural images. We describe a system that uses the Blobworld representation to retrieve images from this collection. An important aspect of the system is that the user is allowed to view the internal representation of the submitted image and the query results. Similar systems do not offer the user this view into the workings of the system; consequently, query results from these systems can be inexplicable, despite the availability of knobs for adjusting the similarity metrics. By finding image regions whi...
Indexing based on scale invariant interest points
- In Proceedings of the 8th International Conference on Computer Vision
, 2001
"... This paper presents a new method for detecting scale invariant interest points. The method is based on two recent results on scale space: 1) Interest points can be adapted to scale and give repeatable results (geometrically stable). 2) Local extrema over scale of normalized derivatives indicate the ..."
Abstract
-
Cited by 245 (24 self)
- Add to MetaCart
This paper presents a new method for detecting scale invariant interest points. The method is based on two recent results on scale space: 1) Interest points can be adapted to scale and give repeatable results (geometrically stable). 2) Local extrema over scale of normalized derivatives indicate the presence of characteristic local structures. Our method first computes a multi-scale representation for the Harris interest point detector. We then select points at which a local measure (the Laplacian) is maximal over scales. This allows a selection of distinctive points for which the characteristic scale is known. These points are invariant to scale, rotation and translation as well as robust to illumination changes and limited changes of viewpoint. For indexing, the image is characterized by a set of scale invariant points; the scale associated with each point allows the computation of a scale invariant descriptor. Our descriptors are, in addition, invariant to image rotation, to affine illumination changes and robust to small perspective deformations. Experimental results for indexing show an excellent performance up to a scale factor of 4 for a database with more than 5000 images. 1

