## Automated image annotation using global features and robust nonparametric density estimation (2005)

Venue: | In International ACM Conference on Image and Video Retrieval (CIVR |

Citations: | 59 - 21 self |

### BibTeX

@INPROCEEDINGS{Yavlinsky05automatedimage,

author = {Alexei Yavlinsky and Edward Schofield and Stefan Rüger},

title = {Automated image annotation using global features and robust nonparametric density estimation},

booktitle = {In International ACM Conference on Image and Video Retrieval (CIVR},

year = {2005},

pages = {507--517}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This paper describes a simple framework for automatically annotating images using non-parametric models of distributions of image features. We show that under this framework quite simple image properties such as global colour and texture distributions provide a strong basis for reliably annotating images. We report results on subsets of two photographic libraries, the Corel Photo Archive and the Getty Image Archive. We also show how the popular Earth Mover’s Distance measure can be effectively incorporated within this framework. 1

### Citations

2361 | Latent dirichlet allocation
- Blei, Ng, et al.
(Show Context)
Citation Context ...Bernoulli distribution thus achieving better results than in [4] and [5]. Other relevant research is that of Blei and Jordan [7], proposing an extension of the Latent Dirichlet Allocation (LDA) model =-=[8]-=-, which assumes that a mixture of latent factors are used to generate words and blob features; the authors then show how the model can be used to assign words to individual blobs. A second way is a si... |

669 | Modeling the shape of the scene: a holistic representation of the spatial envelope
- Oliva, Torralba
(Show Context)
Citation Context ... was explored by Oliva and Torralba, who showed that images can be described with basic scene labels such as ‘street’, ‘buildings’ or ‘highways’, using a selection of relevant lowlevel global filters =-=[9, 10]-=-. They further showed how simple image statistics can be used to infer the presence and absence of objects in the scene [11]. This paper follows the second approach and explores the possibility of usi... |

472 | Applied nonparametric regression
- Härdle
- 1990
(Show Context)
Citation Context ...cripts w for the rest of this section to simplify the notation. Here the positive scalar h, called the bandwidth, reflects how wide a kernel is placed over each data point. Under some mild conditions =-=[14]-=-, ˆ f converges to f in probability as n → ∞. We experiment with two types of kernels. The first is a d-dimensional Gaussian kernel kG(t; h) = d� l=1 1 √ e 2πhl “ ” 1 tl 2 − 2 hl , (3) where t = x − x... |

443 | Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary
- Duygulu, Barnard, et al.
- 2002
(Show Context)
Citation Context ...n image segmentation algorithm to divide images into a number of irregularly shaped ‘blob’ regions and to operate on these blobs. This has been pursued by several researchers recently. Duygulu et al. =-=[2]-=- created a discrete ‘vocabulary’ of clusters of such blobs across an image collection and applied a model, inspired by machine translation, to translate between the set of blobs comprising an image an... |

340 |
Textural Features Corresponding to Visual Perception
- Tamura, Mori, et al.
- 1978
(Show Context)
Citation Context ...s. We attempt to model image densities using two simple classes of global image features: the distribution of pixel colour in CIE space, and a subset of perceptual texture features proposed by Tamura =-=[21]-=- and adapted for image retrieval by Howarth and Rüger [22]. For each pixel in the image, we compute CIELab colour values and the coarseness, contrast and directionality texture properties obtained usi... |

331 | Modeling annotated data
- Blei, Jordan
- 2003
(Show Context)
Citation Context ...ce blobs with rectangular blocks and model image keywords using a multiple Bernoulli distribution thus achieving better results than in [4] and [5]. Other relevant research is that of Blei and Jordan =-=[7]-=-, proposing an extension of the Latent Dirichlet Allocation (LDA) model [8], which assumes that a mixture of latent factors are used to generate words and blob features; the authors then show how the ... |

318 | Automatic image annotation and retrieval using cross-media relevance models
- Jeon, Lavrenko, et al.
(Show Context)
Citation Context ... clusters of such blobs across an image collection and applied a model, inspired by machine translation, to translate between the set of blobs comprising an image and annotation keywords. Jeon et al. =-=[3]-=- recast image annotation into a problem in cross-lingual information retrieval, applying a cross-media relevance model to perform image annotation and ranked retrieval, obtaining better retrieval perf... |

191 | Empirical evaluation of dissimilarity measures for color and texture
- Rubner, Puzicha, et al.
(Show Context)
Citation Context ...ping the problem, but this paper considers another way based on comparing image signatures under the Earth Mover’s Distance (EMD) measure [16], which has found several applications in image retrieval =-=[17]-=-. A signature is a representation of clustered data defined as s = {(c1, m1), . . . , (cd, md)}, where, for a cluster i, ci is the cluster’s centroid and mi is the number of points belonging to that c... |

181 |
Multiple bernoulli relevance models for image and video annotation
- Feng, Manmatha, et al.
(Show Context)
Citation Context ...ons in an inference network, whereby an unseen image is annotated by instantiating the network with its regions and propagating belief through the network to nodes representing the words. Feng et al. =-=[6]-=- replace blobs with rectangular blocks and model image keywords using a multiple Bernoulli distribution thus achieving better results than in [4] and [5]. Other relevant research is that of Blei and J... |

178 | A model for learning the semantics of pictures
- Lavrenko, Manmatha, et al.
- 2003
(Show Context)
Citation Context ...mation retrieval, applying a cross-media relevance model to perform image annotation and ranked retrieval, obtaining better retrieval performance than in the translation model of [2]. Lavrenko et al. =-=[4]-=- adapted the model of [3] to use continous probability density functions to describe the process of generating blob features, hoping to avoid the loss ofsinformation related to quantization; they achi... |

144 |
Image-to-word transformation based on dividing and vector quantizing images with words
- Mori, Takahashi, et al.
- 1999
(Show Context)
Citation Context ...e helpful for users wishing to search increasingly large collections of unlabelled images available on the web and elsewhere. One of the first attempts at image annotation was reported by Mori et al. =-=[1]-=-, who tiled images into grids of rectangular regions and applied a co-occurence model to words and low-level features of such tiled image regions. Since then researchers have looked at the problem in ... |

130 |
A brief survey of bandwidth selection for density estimation
- Jones, Marron, et al.
- 1996
(Show Context)
Citation Context ...class. We shall refer to kE as the EMD kernel throughout the rest of this paper. Several methods have been studied for choosing the optimal bandwidth h for a given kernel and density estimation task. =-=[19]-=- and [20] give a good overview. For this paper we use the simple method of cross-validation, choosing the bandwidth that maximizes performance on a withheld data set. The precise performance measures ... |

77 |
The earth mover’s distance is the mallows distance: some insights from statistics
- Levina, Bickel
- 2001
(Show Context)
Citation Context ... and mi is the number of points belonging to that cluster or its mass. Given two such signatures, EMD is defined as the minimum amount of work necessary to transform one signature into the other (see =-=[16, 18]-=- for details). One can create a signature for an image by grouping its colours into k clusters. Rubner et al. [16] report that using EMD on images represented with as few as 8 clusters of CIELab colou... |

61 | Projection pursuit density estimation
- Friedman, Stuetzle, et al.
- 1984
(Show Context)
Citation Context ...l=1 1 √ e 2πhl “ ” 1 tl 2 − 2 hl , (3) where t = x − x (i) , and where we set each bandwidth parameter hl by scaling the sample standard deviation of feature l by the same constant λ. Friedman et al. =-=[15]-=- point out that kernel smoothing may become less effective in high-dimensional spaces due to the problem known as the curse of dimensionality. They examine a projection pursuit method for reducing the... |

44 | Evaluation of Texture Features for Content-Based Image Retrieval
- Howarth, Rueger
- 2004
(Show Context)
Citation Context ...asses of global image features: the distribution of pixel colour in CIE space, and a subset of perceptual texture features proposed by Tamura [21] and adapted for image retrieval by Howarth and Rüger =-=[22]-=-. For each pixel in the image, we compute CIELab colour values and the coarseness, contrast and directionality texture properties obtained using a sliding window. This results in a 6-channel image rep... |

38 | The truth about Corel-evaluation in image retrieval
- Muller, Marchand-Maillet, et al.
- 2002
(Show Context)
Citation Context ...lection from an image retrieval point of view. For instance, Müller et al. observed that image retrieval performance can be substantially improved if the right image subset is selected for evaluation =-=[23]-=-. We attempted to build a more realistic dataset for our experiments by downloading 7,560 medium-resolution thumbnails of photographs from the Getty Image Archive website 1 , together with the annotat... |

31 |
On the estimation of a probability density and mode
- Parzen
- 1962
(Show Context)
Citation Context ...estimator of a distribution function is the empirical distribution function, but it is known that smoothing can improve efficiency for finite samples [12]. ‘Kernel smoothing’, first used by Parzen in =-=[13]-=-, is a general formulation of this. Where x is a vector (x1, . . . , xd) of real-valued image features, we define the kernel estimate of fw(x) = f(x|w) as fw(x) ˆ = 1 nC n� � x − xw k (i) � , (2) h i=... |

30 |
Bandwidth Selection: Classical or Plug-In
- Loader
- 1999
(Show Context)
Citation Context ... shall refer to kE as the EMD kernel throughout the rest of this paper. Several methods have been studied for choosing the optimal bandwidth h for a given kernel and density estimation task. [19] and =-=[20]-=- give a good overview. For this paper we use the simple method of cross-validation, choosing the bandwidth that maximizes performance on a withheld data set. The precise performance measures are descr... |

29 |
The truth about Corelevaluation in image retrieval
- Mueller, Marchand-Maillet, et al.
(Show Context)
Citation Context ...lection from an image retrieval point of view. For instance, Müller et al. observed that image retrieval performance can be substantially improved if the right image subset is selected for evaluation =-=[23]-=-. We attempted to build a more realistic dataset for our experiments by downloading 7,560 medium-resolution thumbnails of photographs from the Getty Image Archive website 1 , together with the annotat... |

28 | An inference network approach to image retrieval. Image and video retrieval
- Metzler, Manmatha
- 2004
(Show Context)
Citation Context ...process of generating blob features, hoping to avoid the loss ofsinformation related to quantization; they achieve substantially better retrieval performance on the same dataset. Metzler and Manmatha =-=[5]-=- likewise segmented training images, connecting them and their annotations in an inference network, whereby an unseen image is annotated by instantiating the network with its regions and propagating b... |

12 |
Nonparametric estimation of smooth distribution functions
- Reiss
- 1981
(Show Context)
Citation Context ...ric Density Estimation The simplest nonparametric estimator of a distribution function is the empirical distribution function, but it is known that smoothing can improve efficiency for finite samples =-=[12]-=-. ‘Kernel smoothing’, first used by Parzen in [13], is a general formulation of this. Where x is a vector (x1, . . . , xd) of real-valued image features, we define the kernel estimate of fw(x) = f(x|w... |

4 |
The earth-mover’s distance as a metric for image retrieval
- Rubner
- 1998
(Show Context)
Citation Context ... its most salient characteristics. This is one way of sidestepping the problem, but this paper considers another way based on comparing image signatures under the Earth Mover’s Distance (EMD) measure =-=[16]-=-, which has found several applications in image retrieval [17]. A signature is a representation of clustered data defined as s = {(c1, m1), . . . , (cd, md)}, where, for a cluster i, ci is the cluster... |

2 |
and A Torralba. Scene-centered representation from spatial envelope descriptors
- Oliva
- 2002
(Show Context)
Citation Context ... was explored by Oliva and Torralba, who showed that images can be described with basic scene labels such as ‘street’, ‘buildings’ or ‘highways’, using a selection of relevant lowlevel global filters =-=[9, 10]-=-. They further showed how simple image statistics can be used to infer the presence and absence of objects in the scene [11]. This paper follows the second approach and explores the possibility of usi... |