Results 11 - 20
of
86
Scene Modeling Using Co-Clustering
"... In this paper, we propose a novel approach for scene modeling. The proposed method is able to automatically discover the intermediate semantic concepts. We utilize Maximization of Mutual Information (MMI) co-clustering approach to discover clusters of semantic concepts, which we call intermediate co ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
In this paper, we propose a novel approach for scene modeling. The proposed method is able to automatically discover the intermediate semantic concepts. We utilize Maximization of Mutual Information (MMI) co-clustering approach to discover clusters of semantic concepts, which we call intermediate concepts. Each intermediate concept corresponds to a cluster of visterms in the Bag of Visterms (BOV) paradigm for scene classification. MMI coclustering results in fewer but meaningful clusters. Unlike k-means which is used to cluster image patches based on their appearances in BOV, MMI co-clustering can group the visterms which are highly correlated to some concept. Unlike probabilistic Latent Semantic Analysis (pLSA), which can be considered as one-sided soft clustering, MMI coclustering simultaneously clusters visterms and images, so it is able to boost both clustering. In addition, the MMI coclustering is an unsupervised method. We have extensively tested our proposed approach on two challenging datasets: the fifteen scene categories and the LSCOM dataset, and promising results are obtained. 1.
Context-based object-class recognition and retrieval by generalized correlograms
- PAMI. IN PRESS (on-line at IEEE web site
, 2006
"... Abstract—We present a novel approach for retrieval of object categories based on a novel type of image representation: the Generalized Correlogram (GC). In our image representation, the object is described as a constellation of GCs, where each one encodes information about some local part and the sp ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract—We present a novel approach for retrieval of object categories based on a novel type of image representation: the Generalized Correlogram (GC). In our image representation, the object is described as a constellation of GCs, where each one encodes information about some local part and the spatial relations from this part to others (that is, the part’s context). We show how such a representation can be used with fast procedures that learn the object category with weak supervision and efficiently match the model of the object against large collections of images. In the learning stage, we show that, by integrating our representation with Boosting, the system is able to obtain a compact model that is represented by very few features, where each feature conveys key properties about the object’s parts and their spatial arrangement. In the matching step, we propose direct procedures that exploit our representation for efficiently considering spatial coherence between the matching of local parts. Combined with an appropriate data organization such as Inverted Files, we show that thousands of images can be evaluated efficiently. The framework has been applied to different standard databases, and we show that our results are favorably compared against state-of-the-art methods in both computational cost and accuracy. Index Terms—Object recognition, retrieval, Boosting, spatial pattern, contextual information. 1
3D Model based Object Class Detection in An Arbitrary View
"... In this paper, a novel object class detection method based on 3D object modeling is presented. Instead of using a complicated mechanism for relating multiple 2D training views, the proposed method establishes spatial connections between these views by mapping them directly to the surface of 3D model ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this paper, a novel object class detection method based on 3D object modeling is presented. Instead of using a complicated mechanism for relating multiple 2D training views, the proposed method establishes spatial connections between these views by mapping them directly to the surface of 3D model. The 3D shape of an object is reconstructed by using a homographic framework from a set of model views around the object and is represented by a volume consisting of binary slices. Features are computed in each 2D model view and mapped to the 3D shape model using the same homographic framework. To generalize the model for object class detection, features from supplemental views are also considered. A codebook is constructed from all of these features and then a 3D feature model is built. Given a 2D test image, correspondences between the 3D feature model and the testing view are identified by matching the detected features. Based on the 3D locations of the corresponding features, several hypotheses of viewing planes can be made. The one with the highest confidence is then used to detect the object using feature location matching. Performance of the proposed method has been evaluated by using the PASCAL VOC challenge dataset and promising results are demonstrated. 1.
Fast Nearest Neighbor Retrieval for Bregman Divergences
"... We present a data structure enabling efficient nearest neighbor (NN) retrieval for bregman divergences. The family of bregman divergences includes many popular dissimilarity measures including KL-divergence (relative entropy), Mahalanobis distance, and Itakura-Saito divergence. These divergences pre ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We present a data structure enabling efficient nearest neighbor (NN) retrieval for bregman divergences. The family of bregman divergences includes many popular dissimilarity measures including KL-divergence (relative entropy), Mahalanobis distance, and Itakura-Saito divergence. These divergences present a challenge for efficient NN retrieval because they are not, in general, metrics, for which most NN data structures are designed. The data structure introduced in this work shares the same basic structure as the popular metric ball tree, but employs convexity properties of bregman divergences in place of the triangle inequality. Experiments demonstrate speedups over brute-force search of up to several orders of magnitude. 1.
Context-aware saliency detection
- in [IEEE Conf. on Computer Vision and Pattern Recognition
, 2010
"... We propose a new type of saliency – context-aware saliency – which aims at detecting the image regions that represent the scene. This definition differs from previous definitions whose goal is to either identify fixation points or detect the dominant object. In accordance with our saliency definitio ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
We propose a new type of saliency – context-aware saliency – which aims at detecting the image regions that represent the scene. This definition differs from previous definitions whose goal is to either identify fixation points or detect the dominant object. In accordance with our saliency definition, we present a detection algorithm which is based on four principles observed in the psychological literature. The benefits of the proposed approach are evaluated in two applications where the context of the dominant objects is just as essential as the objects themselves. In image retargeting we demonstrate that using our saliency prevents distortions in the important regions. In summarization we show that our saliency helps to produce compact, appealing, and informative summaries. 1.
A linear time histogram metric for improved sift matching
- In ECCV
"... Abstract. We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract. We present a new metric between histograms such as SIFT descriptors and a linear time algorithm for its computation. It is common practice to use the L2 metric for comparing SIFT descriptors. This practice assumes that SIFT bins are aligned, an assumption which is often not correct due to quantization, distortion, occlusion etc. In this paper we present a new Earth Mover’s Distance (EMD) variant. We show that it is a metric (unlike the original EMD [1] which is a metric only for normalized histograms). Moreover, it is a natural extension of the L1 metric. Second, we propose a linear time algorithm for the computation of the EMD variant, with a robust ground distance for oriented gradients. Finally, extensive experimental results on the Mikolajczyk and Schmid dataset [2] show that our method outperforms state of the art distances. 1
Scene Classification with Low-dimensional Semantic Spaces and Weak Supervision
"... A novel approach to scene categorization is proposed. Similar to previous works of [11, 15, 3, 12], we introduce an intermediate space, based on a low dimensional semantic “theme ” image representation. However, instead of learning the themes in an unsupervised manner, they are learned with weak sup ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
A novel approach to scene categorization is proposed. Similar to previous works of [11, 15, 3, 12], we introduce an intermediate space, based on a low dimensional semantic “theme ” image representation. However, instead of learning the themes in an unsupervised manner, they are learned with weak supervision, from casual image annotations. Each theme induces a probability density on the space of low-level features, and images are represented as vectors of posterior theme probabilities. This enables an image to be associated with multiple themes, even when there are no multiple associations in the training labels. An implementation is presented and compared to various existing algorithms, on benchmark datasets. It is shown that the proposed low dimensional representation correlates well with human scene understanding, and is able to learn theme co-occurrences without explicit training. It is also shown to outperform unsupervised latent-space methods, with much smaller training complexity, and to achieve performance close to the state of the art methods, which rely on much higher-dimensional image representations. Finally a study of the effect of dimensionality on the classification performance is presented, indicating that the dimensionality of theme space grows sub-linearly with the number of scene categories. 1.
Learning class-specific affinities for image labelling
- In CVPR
, 2008
"... Spectral clustering and eigenvector-based methods have become increasingly popular in segmentation and recognition. Although the choice of the pairwise similarity metric (or affinities) greatly influences the quality of the results, this choice is typically specified outside the learning framework. ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Spectral clustering and eigenvector-based methods have become increasingly popular in segmentation and recognition. Although the choice of the pairwise similarity metric (or affinities) greatly influences the quality of the results, this choice is typically specified outside the learning framework. In this paper, we present an algorithm to learn class-specific similarity functions. Mapping our problem in a Conditional Random Fields (CRF) framework enables us to pose the task of learning affinities as parameter learning in undirected graphical models. There are two significant advances over previous work. First, we learn the affinity between a pair of data-points as a function of a pairwise feature and (in contrast with previous approaches) the classes to which these two data-points were mapped, allowing us to work with a richer class of affinities. Second, our formulation provides a principled probabilistic interpretation for learning all of the parameters that define these affinities. Using ground truth segmentations and labellings for training, we learn the parameters with the greatest discriminative power (in an MLE sense) on the training data. We demonstrate the power of this learning algorithm in the setting of joint segmentation and recognition of object classes. Specifically, even with very simple appearance features, the proposed method achieves state-of-the-art performance on standard datasets. 1.
Learning Semantic Visual Vocabularies Using Diffusion Distance
"... In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this mi ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
In this paper, we propose a novel approach for learning generic visual vocabulary. We use diffusion maps to automatically learn a semantic visual vocabulary from abundant quantized midlevel features. Each midlevel feature is represented by the vector of pointwise mutual information (PMI). In this midlevel feature space, we believe the features produced by similar sources must lie on a certain manifold. To capture the intrinsic geometric relations between features, we measure their dissimilarity using diffusion distance. The underlying idea is to embed the midlevel features into a semantic lower-dimensional space. Our goal is to construct a compact yet discriminative semantic visual vocabulary. Although the conventional approach using k-means is good for vocabulary construction, its performance is sensitive to the size of the visual vocabulary. In addition, the learnt visual words are not semantically meaningful since the clustering criterion is based on appearance similarity only. Our proposed approach can effectively overcome these problems by capturing the semantic and geometric relations of the feature space using diffusion maps. Unlike some of the supervised vocabulary construction approaches, and the unsupervised methods such as pLSA and LDA, diffusion maps can capture the local intrinsic geometric relations between the midlevel feature points on the manifold. We have tested our approach on the KTH action dataset, our own YouTube action dataset and the fifteen scene dataset, and have obtained very promising results.

