Results 1 - 10
of
78
Matching words and pictures
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation ..."
Abstract
-
Cited by 391 (33 self)
- Add to MetaCart
We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann’s hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation
A model for learning the semantics of pictures
- in NIPS
, 2003
"... We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, e ..."
Abstract
-
Cited by 127 (6 self)
- Add to MetaCart
We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1
Supervised learning of semantic classes for image annotation and retrieval
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to- ..."
Abstract
-
Cited by 74 (10 self)
- Add to MetaCart
Abstract—A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning. Index Terms—Content-based image retrieval, semantic image annotation and retrieval, weakly supervised learning, multiple instance learning, Gaussian mixtures, expectation-maximization, image segmentation, object recognition. 1
Automatic Multimedia Cross-modal Correlation Discovery
, 2004
"... Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph- ..."
Abstract
-
Cited by 65 (12 self)
- Add to MetaCart
Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph-based approach, "MMG", to discover such cross-modal correlations. Our
Automated image annotation using global features and robust nonparametric density estimation
- In International ACM Conference on Image and Video Retrieval (CIVR
, 2005
"... Abstract. This paper describes a simple framework for automatically annotating images using non-parametric models of distributions of image features. We show that under this framework quite simple image properties such as global colour and texture distributions provide a strong basis for reliably an ..."
Abstract
-
Cited by 48 (21 self)
- Add to MetaCart
Abstract. This paper describes a simple framework for automatically annotating images using non-parametric models of distributions of image features. We show that under this framework quite simple image properties such as global colour and texture distributions provide a strong basis for reliably annotating images. We report results on subsets of two photographic libraries, the Corel Photo Archive and the Getty Image Archive. We also show how the popular Earth Mover’s Distance measure can be effectively incorporated within this framework. 1
Formulating semantic image annotation as a supervised learning problem
- IEEE CVPR
, 2005
"... We introduce a new method to automatically annotate and retrieve images using a vocabulary of image semantics. The novel contributions include a discriminant formulation of the problem, a multiple instance learning solution that enables the estimation of concept probability distributions without pri ..."
Abstract
-
Cited by 42 (5 self)
- Add to MetaCart
We introduce a new method to automatically annotate and retrieve images using a vocabulary of image semantics. The novel contributions include a discriminant formulation of the problem, a multiple instance learning solution that enables the estimation of concept probability distributions without prior image segmentation, and a hierarchical description of the density of each image class that enables very efficient training. Compared to current methods of image annotation and retrieval, the one now proposed has significantly smaller time complexity and better recognition performance. Specifically, its recognition complexity is O(CxR), where C is the number of classes (or image annotations) and R is the number of image regions, while the best results in the literature have complexity O(TxR), where T is the number of training images. Since the number of classes grows substantially slower than that of training images, the proposed method scales better during training, and processes test images faster. This is illustrated through comparisons in terms of complexity, time, and recognition performance with current state-of-the-art methods. 1.
Using maximum entropy for automatic image annotation
- In Proc. CIVR
, 2004
"... Abstract. In this paper, we propose the use of the Maximum Entropy approach for the task of automatic image annotation. Given labeled training data, Maximum Entropy is a statistical technique which allows one to predict the probability of a label given test data. The techniques allow for relationshi ..."
Abstract
-
Cited by 42 (1 self)
- Add to MetaCart
Abstract. In this paper, we propose the use of the Maximum Entropy approach for the task of automatic image annotation. Given labeled training data, Maximum Entropy is a statistical technique which allows one to predict the probability of a label given test data. The techniques allow for relationships between features to be effectively captured. and has been successfully applied to a number of language tasks including machine translation. In our case, we view the image annotation task as one where a training data set of images labeled with keywords is provided and we need to automatically label the test images with keywords. To do this, we first represent the image using a language of visterms and then predict the probability of seeing an English word given the set of visterms forming the image. Maximum Entropy allows us to compute the probability and in addition allows for the relationships between visterms to be incorporated. The experimental results show that Maximum Entropy outperforms one of the classical translation models that has been applied to this task and the Cross Media Relevance Model. Since the Maximum Entropy model allows for the use of a large number of predicates to possibly increase performance even further, Maximum Entropy model is a promising model for the task of automatic image annotation. 1
The effects of segmentation and feature choice in a translation model of object recognition
- In IEEE Conf. on Computer Vision and Pattern Recognition
, 2003
"... We work with a model of object recognition where words must be placed on image regions. This approach means that large scale experiments are relatively easy, so we can evaluate the effects of various early and midlevel vision algorithms on recognition performance. We evaluate various image segmentat ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
We work with a model of object recognition where words must be placed on image regions. This approach means that large scale experiments are relatively easy, so we can evaluate the effects of various early and midlevel vision algorithms on recognition performance. We evaluate various image segmentation algorithms by determining word prediction accuracy for images segmented in various ways and represented by various features. We take the view that good segmentations respect object boundaries, and so word prediction should be better for a better segmentation. However, it is usually very difficult in practice to obtain segmentations that do not break up objects, so most practitioners attempt to merge segments to get better putative object representations. We demonstrate that our paradigm of word prediction easily allows us to predict potentially useful segment merges, even for segments that do not look similar (for example, merging the black and white Figure 1. Illustration of labeling. Each region is labeled with the maximally probable word, but a probability distribution over all words is available for each region.
Effective automatic image annotation via a coherent language model and active learning
- In Proceedings of the 12th annual ACM International Conference on Multimedia (MM’04
, 2004
"... Image annotations allow users to access a large image database with textual queries. There have been several studies on automatic image annotation utilizing machine learning techniques, which automatically learn statistical models from annotated images and apply them to generate annotations for unse ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Image annotations allow users to access a large image database with textual queries. There have been several studies on automatic image annotation utilizing machine learning techniques, which automatically learn statistical models from annotated images and apply them to generate annotations for unseen images. One common problem shared by most previous learning approaches for automatic image annotation is that each annotated word is predicated for an image independently from other annotated words. In this paper, we proposed a coherent language model for automatic image annotation that takes into account the word-toword correlation by estimating a coherent language model for an image. This new approach has two important advantages: 1) it is able to automatically determine the annotation length to improve the accuracy of retrieval results, and 2) it can be used with active learning to significantly reduce the required number of annotated image examples. Empirical studies with Corel dataset are presented to show the effectiveness of the coherent language model for automatic image annotation. Categories and Subject Descriptors
GCap: Graph-based Automatic Image Captioning
- IN PROC. OF THE 4TH INTERNATIONAL WORKSHOP ON MULTIMEDIA DATA AND DOCUMENT ENGINEERING (MDDE 04), IN CONJUNCTION WITH COMPUTER VISION PATTERN RECOGNITION CONFERENCE (CVPR 04
, 2004
"... Given an image, how do we automatically assign keywords to it? In this paper, we propose a novel, graph-based approach (GCap) which outperforms previously reported methods for automatic image captioning. Moreover, it is fast and scales well, with its training and testing time linear to the data set ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
Given an image, how do we automatically assign keywords to it? In this paper, we propose a novel, graph-based approach (GCap) which outperforms previously reported methods for automatic image captioning. Moreover, it is fast and scales well, with its training and testing time linear to the data set size. We report auto-captioning experiments on the "standard" Corel image database of 680 MBytes, where GCap outperforms recent, successful autocaptioning methods by up to 10 percentage points in captioning accuracy (50% relative improvement).

