Results 1 - 10
of
80
Dynamic topic models
- In ICML
, 2006
"... Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly ..."
Abstract
-
Cited by 245 (15 self)
- Add to MetaCart
Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly
Learning object categories from google’s image search
- In ICCV
, 2005
"... Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Intern ..."
Abstract
-
Cited by 154 (11 self)
- Add to MetaCart
Current approaches to object category recognition require datasets of training images to be manually prepared, with varying degrees of supervision. We present an approach that can learn an object category from just its name, by utilizing the raw output of image search engines available on the Internet. We develop a new model, TSI-pLSA, which extends pLSA (as applied to visual words) to include spatial information in a translation and scale invariant manner. Our approach can handle the high intra-class variability and large proportion of unrelated images returned by search engines. We evaluate the models on standard test sets, showing performance competitive with existing methods trained on hand prepared datasets. 1.
Learning hierarchical models of scenes, objects, and parts
- In IEEE Intl. Conf. on Computer Vision
, 2005
"... We describe a hierarchical probabilistic model for the detection and recognition of objects in cluttered, natural scenes. The model is based on a set of parts which describe the expected appearance and position, in an object centered coordinate frame, of features detected by a low-level interest ope ..."
Abstract
-
Cited by 104 (11 self)
- Add to MetaCart
We describe a hierarchical probabilistic model for the detection and recognition of objects in cluttered, natural scenes. The model is based on a set of parts which describe the expected appearance and position, in an object centered coordinate frame, of features detected by a low-level interest operator. Each object category then has its own distribution over these parts, which are shared between objects. We learn the parameters of this model via a Gibbs sampler which uses the graphical model’s structure to analytically average over many parameters. Applied to a database of images of isolated objects, the sharing of parts among objects improves detection accuracy when few training examples are available. We also extend this hierarchical framework to scenes containing multiple objects. 1.
The pyramid match kernel: Efficient learning with sets of features
- Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
Describing visual scenes using transformed dirichlet processes
- Advances in Neural Information Processing Systems 18
, 2005
"... Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach captures the intrinsic uncertainty in the number and identity o ..."
Abstract
-
Cited by 47 (6 self)
- Add to MetaCart
Motivated by the problem of learning to detect and recognize objects with minimal supervision, we develop a hierarchical probabilistic model for the spatial structure of visual scenes. In contrast with most existing models, our approach captures the intrinsic uncertainty in the number and identity of objects depicted in a given image. Our scene model is based on the transformed Dirichlet process (TDP), a novel extension of the hierarchical DP in which a set of stochastically transformed mixture components are shared between multiple groups of data. For visual scenes, mixture components describe the spatial structure of visual features in an object–centered coordinate frame, while transformations model the object positions in a particular image. Learning and inference in the TDP, which has many potential applications beyond computer vision, is based on an empirically effective Gibbs sampler. Applied to a dataset of partially labeled street scenes, we show that the TDP’s inclusion of spatial structure improves detection performance, and allows unsupervised discovery of object categories. 1
L.: Spatially coherent latent topic model for concurrent object segmentation and classification
- In: Proceedings of IEEE International Conference on Computer Vision
, 2007
"... A. Input Image B. LDA initialized topics C. LDA learned topics D. LDA learned object We present a novel generative model for simultaneously recognizing and segmenting object and scene classes. Our model is inspired by the traditional bag of words representation of texts and images as well as a numbe ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
A. Input Image B. LDA initialized topics C. LDA learned topics D. LDA learned object We present a novel generative model for simultaneously recognizing and segmenting object and scene classes. Our model is inspired by the traditional bag of words representation of texts and images as well as a number of related generative models, including probabilistic Latent Sematic Analysis (pLSA) and Latent Dirichlet Allocation (LDA). A major drawback of the pLSA and LDA models is the assumption that each patch in the image is independently generated given its corresponding latent topic. While such representation provide an efficient computational method, it lacks the power to describe the visually coherent images and scenes. Instead, we propose a spatially coherent latent topic model (Spatial-LTM). Spatial-LTM represents an image containing objects in a hierarchical way by oversegmented image regions of homogeneous appearances and the salient image patches within the regions. Only one single latent topic is assigned to the image patches within each region, enforcing the spatial coherency of the model. This idea gives rise to the following merits of Spatial-LTM: (1) Spatial-LTM provides a unified representation for spatially coherent bag of words topic models; (2) Spatial-LTM can simultaneously segment and classify objects, even in the case of occlusion and multiple instances; and (3) Spatial-LTM can be trained either unsupervised or supervised, as well as when partial object labels are provided. We verify the success of our model in a number of segmentation and classification experiments. E. Coherent regions for
Unsupervised learning of categories from sets of partially matching image features
- In CVPR
, 2006
"... We present a method to automatically learn object categories from unlabeled images. Each image is represented by an unordered set of local features, and all sets are embedded into a space where they cluster according to their partial-match feature correspondences. After efficiently computing the pai ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
We present a method to automatically learn object categories from unlabeled images. Each image is represented by an unordered set of local features, and all sets are embedded into a space where they cluster according to their partial-match feature correspondences. After efficiently computing the pairwise affinities between the input images in this space, a spectral clustering technique is used to recover the primary groupings among the images. We introduce an efficient means of refining these groupings according to intra-cluster statistics over the subsets of features selected by the partial matches between the images, and based on an optional, variable amount of user supervision. We compute the consistent subsets of feature correspondences within a grouping to infer category feature masks. The output of the algorithm is a partition of the data into a set of learned categories, and a set of classifiers trained from these ranked partitions that can recognize the categories in novel images. 1.
Simultaneous Image Classification and Annotation
"... Image classification and annotation are important problems in computer vision, but rarely considered together. Intuitively, annotations provide evidence for the class label, and the class label provides evidence for annotations. For example, an image of class highway is more likely annotated with wo ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
Image classification and annotation are important problems in computer vision, but rarely considered together. Intuitively, annotations provide evidence for the class label, and the class label provides evidence for annotations. For example, an image of class highway is more likely annotated with words “road, ” “car, ” and “traffic ” than words “fish, ” “boat, ” and “scuba. ” In this paper, we develop a new probabilistic model for jointly modeling the image, its class label, and its annotations. Our model treats the class label as a global description of the image, and treats annotation terms as local descriptions of parts of the image. Its underlying probabilistic assumptions naturally integrate these two sources of information. We derive an approximate inference and estimation algorithms based on variational methods, as well as efficient approximations for classifying and annotating new images. We examine the performance of our model on two real-world image data sets, illustrating that a single model provides competitive annotation performance, and superior classification performance. 1.
A visual vocabulary for flower classification
- In CVPR
, 2006
"... We investigate to what extent ‘bag of visual words ’ models can be used to distinguish categories which have significant visual similarity. To this end we develop and optimize a nearest neighbour classifier architecture, which is evaluated on a very challenging database of flower images. The flower ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
We investigate to what extent ‘bag of visual words ’ models can be used to distinguish categories which have significant visual similarity. To this end we develop and optimize a nearest neighbour classifier architecture, which is evaluated on a very challenging database of flower images. The flower categories are chosen to be indistinguishable on colour alone (for example), and have considerable variation in shape, scale, and viewpoint. We demonstrate that by developing a visual vocabulary that explicitly represents the various aspects (colour, shape, and texture) that distinguish one flower from another, we can overcome the ambiguities that exist between flower categories. The novelty lies in the vocabulary used for each aspect, and how these vocabularies are combined into a final classifier. The various stages of the classifier (vocabulary selection and combination) are each optimized on a validation set. Results are presented on a dataset of 1360 images consisting of 17 flower species. It is shown that excellent performance can be achieved, far surpassing standard baseline algorithms using (for example) colour cues alone. 1.
Using dependent regions for object categorization in a generative framework
- In CVPR
, 2006
"... “Bag of words ” models have enjoyed much attention and achieved good performances in recent studies of object categorization. In most of these works, local patches are modeled as basic building blocks of an image, analogous to words in text documents. In most previous works using the “bag of words ” ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
“Bag of words ” models have enjoyed much attention and achieved good performances in recent studies of object categorization. In most of these works, local patches are modeled as basic building blocks of an image, analogous to words in text documents. In most previous works using the “bag of words ” models (e.g. [4, 20, 7]), the local patches are assumed to be independent with each other. In this paper, we relax the independence assumption and model explicitly the inter-dependency of the local regions. Similarly to previous work, we represent images as a collection of patches, each of which belongs to a latent “theme ” that is shared across images as well as categories. We learn the theme distributions and patch distributions over the themes in a hierarchical structure [22]. In particular, we introduce a linkage structure over the latent themes to encode the dependencies of the patches. This structure enforces the semantic connections among the patches by facilitating better clustering of the themes. As a result, our models for object categories tend to be more discriminative than the ones obtained under the independent patch assumption. We show highly competitive categorization results on both the Caltech 4 and Caltech 101 object category datasets. By examining the distributions of the latent themes for each object category, we construct an object taxonomy using the 101 object classes from the Caltech 101 datasets. 1.

