Results 1  10
of
45
Learning globallyconsistent local distance functions for shapebased image retrieval and classification
 In ICCV
, 2007
"... We address the problem of visual category recognition by learning an imagetoimage distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patchbased feat ..."
Abstract

Cited by 135 (3 self)
 Add to MetaCart
(Show Context)
We address the problem of visual category recognition by learning an imagetoimage distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patchbased feature vectors common in object recognition work as a basis for our imagetoimage distance functions. Our largemargin formulation for learning the distance functions is similar to formulations used in the machine learning literature on distance metric learning, however we differ in that we learn local distance functions— a different parameterized function for every image of our training set—whereas typically a single global distance function is learned. This was a novel approach first introduced in Frome, Singer, & Malik, NIPS 2006. In that work we learned the local distance functions independently, and the outputs of these functions could not be compared at test time without the use of additional heuristics or training. Here we introduce a different approach that has the advantage that it learns distance functions that are globally consistent in that they can be directly compared for purposes of retrieval and classification. The output of the learning algorithm are weights assigned to the image features, which is intuitively appealing in the computer vision setting: some features are more salient than others, and which are more salient depends on the category, or image, being considered. We train and test using the Caltech 101 object recognition benchmark. Using fifteen training images per category, we achieved a mean recognition rate of 63.2 % and
The pyramid match kernel: Efficient learning with sets of features
 Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract

Cited by 123 (9 self)
 Add to MetaCart
(Show Context)
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multiresolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
Active learning with gaussian processes for object categorization
 In ICCV
, 2007
"... Discriminative methods for visual object category recognition are typically nonprobabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) are powerful regression techniques with explicit uncertainty models; we show here how Gaussian Proces ..."
Abstract

Cited by 88 (14 self)
 Add to MetaCart
Discriminative methods for visual object category recognition are typically nonprobabilistic, predicting class labels but not directly providing an estimate of uncertainty. Gaussian Processes (GPs) are powerful regression techniques with explicit uncertainty models; we show here how Gaussian Processes with covariance functions defined based on a Pyramid Match Kernel (PMK) can be used for probabilistic object category recognition. The uncertainty model provided by GPs offers confidence estimates at test points, and naturally allows for an active learning paradigm in which points are optimally selected for interactive labeling. We derive a novel active category learning method based on our probabilistic regression model, and show that a significant boost in classification performance is possible, especially when the amount of training data for a category is ultimately very small. 1.
Pyramid match hashing: Sublinear time indexing over partial correspondences
 In CVPR
, 2007
"... Matching local features across images is often useful when comparing or recognizing objects or scenes, and efficient techniques for obtaining imagetoimage correspondences have been developed [6, 4, 11]. However, given a query image, searching a very large image database with such measures remains ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
(Show Context)
Matching local features across images is often useful when comparing or recognizing objects or scenes, and efficient techniques for obtaining imagetoimage correspondences have been developed [6, 4, 11]. However, given a query image, searching a very large image database with such measures remains impractical. We introduce a sublinear time randomized hashing algorithm for indexing sets of feature vectors under their partial correspondences. We develop an efficient embedding function for the normalized partial matching similarity between sets, and show how to exploit random hyperplane properties to construct hash functions that satisfy localitysensitive constraints. The result is a bounded approximate similarity search algorithm that finds (1 + ɛ)approximate nearest neighbor images in O(N 1/(1+ɛ) ) time for a database containing N images represented by (varying numbers of) local features. By design the indexing is robust to outlier features, as it favors strong onetoone matchings but does not penalize for additional distant features. We demonstrate our approach applied to image retrieval for images represented by sets of local appearance features, and show that searching over correspondences is now scalable to large image databases. 1.
Largescale multimodal semantic concept detection for consumer video
 in MIR workshop, ACM Multimedia
, 2007
"... In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of ..."
Abstract

Cited by 36 (13 self)
 Add to MetaCart
(Show Context)
In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of the art of multimedia analytics (including both audio and visual analysis) in consumer video classification and to discover new research opportunities. We investigated several statistical approaches built upon global/local visual features, audio features, and audiovisual combinations. Three multimodal fusion frameworks (ensemble, context fusion, and joint boosting) are also evaluated. Experiment results show that visual and audio models perform best for different sets of concepts. Both provide significant contributions to multimodal fusion, via expansion of the classifier pool for context fusion and the feature bases for feature sharing. The fused multimodal models are shown to significantly reduce the detection errors (compared to single modality models), resulting in a promising accuracy of 83 % over diverse concepts. To the best of our knowledge, this is the first work on systematic investigation of multimodal classification using a largescale ontology and realistic video corpus.
Multilabel Multiple Kernel Learning
"... We present a multilabel multiple kernel learning (MKL) formulation in which the data are embedded into a lowdimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernelinduced feature space and propose to learn the kernel matrix as ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
We present a multilabel multiple kernel learning (MKL) formulation in which the data are embedded into a lowdimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernelinduced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a nonsmooth minmax problem, which can be cast into a semiinfinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms. 1
VSMARTJoin: A Scalable MapReduce Framework for AllPair Similarity Joins of Multisets and Vectors
"... This work proposes VSMARTJoin, a scalable MapReducebased framework for discovering all pairs of similar entities. The VSMARTJoin framework is applicable to sets, multisets, and vectors. VSMARTJoin is motivated by the observed skew in the underlying distributions of Internet traffic, and is a f ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
This work proposes VSMARTJoin, a scalable MapReducebased framework for discovering all pairs of similar entities. The VSMARTJoin framework is applicable to sets, multisets, and vectors. VSMARTJoin is motivated by the observed skew in the underlying distributions of Internet traffic, and is a family of 2stage algorithms, where the first stage computes and joins the partial results, and the second stage computes the similarity exactly for all candidate pairs. The VSMARTJoin algorithms are very efficient and scalable in the number of entities, as well as their cardinalities. They were up to 30 times faster than the state of the art algorithm, VCL, when compared on a real dataset of a small size. We also established the scalability of the proposed algorithms by running them on a dataset of a realistic size, on which VCL never succeeded to finish. Experiments were run using real datasets of IPs and cookies, where each IP is represented as a multiset of cookies, and the goal is to discover similar IPs to identify Internet proxies. 1.
Efficiently matching sets of features with random histograms
 in ACM Multimedia
, 2008
"... As the commonly used representation of a featurerich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a contentbased search engine for featurerich data is to match featuresets efficiently. Although substantial progress has been made du ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
As the commonly used representation of a featurerich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a contentbased search engine for featurerich data is to match featuresets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inefficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single highdimensional vector to simplify the featureset matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate featureset distance measures. We evaluated the proposed approach under three different task settings, i.e. contentbased image search, image object recognition and nearduplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy comparable to the featureset matching methods, while requiring significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the stateoftheart featureset matching method.
Earth Mover Distance over HighDimensional Spaces
 ELECTRONIC COLLOQUIUM ON COMPUTATIONAL COMPLEXITY, REPORT NO. 48 (2007)
, 2007
"... The Earth Mover Distance (EMD) between two equalsize sets of points in R d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by r ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
The Earth Mover Distance (EMD) between two equalsize sets of points in R d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by recent developments in that area, we address computational problems involving EMD over highdimensional pointsets. A natural approach is to embed the EMD metric into ℓ1, and use the algorithms designed for the latter space. However, Khot and Naor [KN05] show that any embedding of EMD over the ddimensional Hamming cube into ℓ1 must incur a distortion Ω(d), thus practically losing all distance information. We circumvent this roadblock by focusing on sets with cardinalities upperbounded by a parameter s, and achieve a distortion of only O(log s · log d). Since in applications the feature sets have bounded size, the resulting distortion is much smaller than the Ω(d) lower bound. Our approach is quite general and easily extends to EMD over R d. We then provide a strong lower bound on the multiround communication complexity of estimating EMD, which in particular strengthens the known nonembeddability result of [KN05]. Our bound exhibits a smooth tradeoff between approximation and communication, and for example implies that every algorithm that estimates EMD using constant size sketches can only achieve Ω(log s) approximation.