Results 1 - 10
of
367
Matching words and pictures
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation ..."
Abstract
-
Cited by 391 (33 self)
- Add to MetaCart
We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation) and corresponding to particular image regions (region naming). Auto-annotation might help organize and access large collections of images. Region naming is a model of object recognition as a process of translating image regions to words, much as one might translate from one language to another. Learning the relationships between image regions and semantic correlates (words) is an interesting example of multi-modal data mining, particularly because it is typically hard to apply data mining techniques to collections of images. We develop a number of models for the joint distribution of image regions and words, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann’s hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture of latent Dirichlet allocation
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary
, 2002
"... We describe a model of object recognition as machine translation. ..."
Abstract
-
Cited by 327 (31 self)
- Add to MetaCart
We describe a model of object recognition as machine translation.
Automatic Linguistic Indexing of Pictures by a Statistical Modeling Approach
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2003
"... Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in computer vision and content-based image retrieval. In this paper, we introduce a statistical modeling approach to this problem. Categorized images are used to train a dictionary of hundreds ..."
Abstract
-
Cited by 171 (22 self)
- Add to MetaCart
Automatic linguistic indexing of pictures is an important but highly challenging problem for researchers in computer vision and content-based image retrieval. In this paper, we introduce a statistical modeling approach to this problem. Categorized images are used to train a dictionary of hundreds of statistical models each representing a concept. Images of any given concept are regarded as instances of a stochastic process that characterizes the concept. To measure the extent of association between an image and the textual description of a concept, the likelihood of the occurrence of the image based on the characterizing stochastic process is computed. A high likelihood indicates a strong association. In our experimental implementation, we focus on a particular group of stochastic processes, that is, the two-dimensional multiresolution hidden Markov models (2D MHMMs). We implemented and tested our ALIP (Automatic Linguistic Indexing of Pictures) system on a photographic image database of 600 different concepts, each with about 40 training images. The system is evaluated quantitatively using more than 4,600 images outside the training database and compared with a random annotation scheme. Experiments have demonstrated the good accuracy of the system and its high potential in linguistic indexing of photographic images.
Image Categorization by Learning and Reasoning with Regions
- Journal of Machine Learning Research
, 2004
"... Designing computer programs to automatically categorize images using low-level features is a challenging research topic in computer vision. In this paper, we present a new learning technique, which extends Multiple-Instance Learning (MIL), and its application to the problem of region-based image cat ..."
Abstract
-
Cited by 98 (7 self)
- Add to MetaCart
Designing computer programs to automatically categorize images using low-level features is a challenging research topic in computer vision. In this paper, we present a new learning technique, which extends Multiple-Instance Learning (MIL), and its application to the problem of region-based image categorization. Images are viewed as bags, each of which contains a number of instances corresponding to regions obtained from image segmentation. The standard MIL problem assumes that a bag is labeled positive if at least one of its instances is positive; otherwise, the bag is negative.
Integral histogram: A fast way to extract histograms in cartesian spaces
- in Proc. IEEE Conf. on Computer Vision and Pattern Recognition
, 2005
"... We present a novel method, which we refer as an integral histogram, to compute the histograms of all possible target regions in a Cartesian data space. Our method has three distrince advantages: 1- It is computationally superior to the conventional approach. The integral histogram method makes it po ..."
Abstract
-
Cited by 87 (6 self)
- Add to MetaCart
We present a novel method, which we refer as an integral histogram, to compute the histograms of all possible target regions in a Cartesian data space. Our method has three distrince advantages: 1- It is computationally superior to the conventional approach. The integral histogram method makes it possible to employ even an exhaustive search process in real-time, which was impractical before. 2- It can be extended to higher data dimensions, uniform and non-uniform bin formations, and multiple target scales with out sacrificing its computational advantages. 3-It enables the description of high level histogram features. We exploit the spatial arrangement of data points, and recursively propagate an aggregated histogram by starting from the origin and traversing through the remaining points along either a scan-line or a wave-front. At each step, we update a single bin using the values of integral histogram at the previously visited neighboring data points. After the integral histogram is propagated, histogram of any target region can be computed easily by using simple arithmetic operations.
Texture classification: Are filter banks necessary
- IEEE Computer Society Conference on Computer Vision and Pattern Recognition
"... We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremely compact neighbourhoods (starting from as small as 3 × 3 pixels square), and that ..."
Abstract
-
Cited by 80 (8 self)
- Add to MetaCart
We question the role that large scale filter banks have traditionally played in texture classification. It is demonstrated that textures can be classified using the joint distribution of intensity values over extremely compact neighbourhoods (starting from as small as 3 × 3 pixels square), and that this outperforms classification using filter banks with large support. We develop a novel texton based representation which is suited to modelling this joint neighbourhood distribution for MRFs. The representation is learnt from training images, and then used to classify novel images (with unknown viewpoint and lighting) into texture classes. The power of the method is demonstrated by classifying over 2800 images of all 61 textures present in the Columbia-Utrecht database. The classification performance surpasses that of recent state-of-the-art filter bank based classifiers such as Leung & Malik [IJCV 01], Cula & Dana [CVPR 01], and Varma & Zisserman [ECCV 02]. 1
Image Change Detection Algorithms: A Systematic Survey
- IEEE Transactions on Image Processing
, 2005
"... Detecting regions of change in multiple images of the same scene taken at different times is of widespread interest due to a large number of applications in diverse disciplines, including remote sensing, surveillance, medical diagnosis and treatment, civil infrastructure, and underwater sensing. T ..."
Abstract
-
Cited by 64 (0 self)
- Add to MetaCart
Detecting regions of change in multiple images of the same scene taken at different times is of widespread interest due to a large number of applications in diverse disciplines, including remote sensing, surveillance, medical diagnosis and treatment, civil infrastructure, and underwater sensing. This paper presents a systematic survey of the common processing steps and core decision rules in modern change detection algorithms, including significance and hypothesis testing, predictive models, the shading model, and background modeling. We also discuss important preprocessing methods, approaches to enforcing the consistency of the change mask, and principles for evaluating and comparing the performance of change detection algorithms. It is hoped that our classification of algorithms into a relatively small number of categories will provide useful guidance to the algorithm designer.
3D Object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints
- International Journal of Computer Vision
, 2006
"... Abstract. This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patche ..."
Abstract
-
Cited by 58 (11 self)
- Add to MetaCart
Abstract. This article introduces a novel representation for three-dimensional (3D) objects in terms of local affine-invariant descriptors of their images and the spatial relationships between the corresponding surface patches. Geometric constraints associated with different views of the same patches under affine projection are combined with a normalized representation of their appearance to guide matching and reconstruction, allowing the acquisition of true 3D affine and Euclidean models from multiple unregistered images, as well as their recognition in photographs taken from arbitrary viewpoints. The proposed approach does not require a separate segmentation stage, and it is applicable to highly cluttered scenes. Modeling and recognition results are presented.
Fast Features for Face Authentication under Illumination Direction Changes
- PATTERN RECOGNITION LETTERS
, 2003
"... In this letter we propose a facGE feature extracA-W tecracA whic utilizes polynomial clynomial derived from 2D DiscHWE Cosine Transform (DCT)cT)2:EEB8 obtained from horizontally and vertic:2) neighbouringblochb Fac authenticing2 results on the VidTIMIT database suggest that the proposed featur ..."
Abstract
-
Cited by 57 (22 self)
- Add to MetaCart
In this letter we propose a facGE feature extracA-W tecracA whic utilizes polynomial clynomial derived from 2D DiscHWE Cosine Transform (DCT)cT)2:EEB8 obtained from horizontally and vertic:2) neighbouringblochb Fac authenticing2 results on the VidTIMIT database suggest that the proposed feature set is superior (in terms of robustness to illuminationclumin anddiscAB:2)AH8# ability) to features extracs2 using four popular methods: Princs:2 Component Analysis (PCA), PCA with histogram equalizationpre-procion2AB 2D DCT and 2D Gabor wavelets; the results also suggest that histogram equalizationpre-procion2A inc-proc the error rate and o#ers no help against illuminationcuminat Moreover, the proposed feature set is over 80 times faster toc2GWW# than features based on Gabor wavelets. Further experiments on the Weizmann database also show that the proposed approac is more robust than 2D Gabor wavelets and 2D DCT coefficients.
Learning depth from single monocular images
- In NIPS 18
, 2005
"... We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their correspond ..."
Abstract
-
Cited by 55 (19 self)
- Add to MetaCart
We consider the task of depth estimation from a single monocular image. We take a supervised learning approach to this problem, in which we begin by collecting a training set of monocular images (of unstructured outdoor environments which include forests, trees, buildings, etc.) and their corresponding ground-truth depthmaps. Then, we apply supervised learning to predict the depthmap as a function of the image. Depth estimation is a challenging problem, since local features alone are insufficient to estimate depth at a point, and one needs to consider the global context of the image. Our model uses a discriminatively-trained Markov Random Field (MRF) that incorporates multiscale local- and global-image features, and models both depths at individual points as well as the relation between depths at different points. We show that, even on unstructured scenes, our algorithm is frequently able to recover fairly accurate depthmaps. 1

