Results 1 - 10
of
18
The pyramid match kernel: Efficient learning with sets of features
- Journal of Machine Learning Research
, 2007
"... In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kern ..."
Abstract
-
Cited by 55 (6 self)
- Add to MetaCart
In numerous domains it is useful to represent a single example by the set of the local features or parts that comprise it. However, this representation poses a challenge to many conventional machine learning techniques, since sets may vary in cardinality and elements lack a meaningful ordering. Kernel methods can learn complex functions, but a kernel over unordered set inputs must somehow solve for correspondences—generally a computationally expensive task that becomes impractical for large set sizes. We present a new fast kernel function called the pyramid match that measures partial match similarity in time linear in the number of features. The pyramid match maps unordered feature sets to multi-resolution histograms and computes a weighted histogram intersection in order to find implicit correspondences based on the finest resolution histogram cell where a matched pair first appears. We show the pyramid match yields a Mercer kernel, and we prove bounds on its error relative to the optimal partial matching cost. We demonstrate our algorithm on both classification and regression tasks, including object recognition, 3-D human pose inference, and time of publication estimation for documents, and we show that the proposed method is accurate and significantly more efficient than current approaches.
Learning globally-consistent local distance functions for shape-based image retrieval and classification
- In ICCV
, 2007
"... We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feat ..."
Abstract
-
Cited by 50 (2 self)
- Add to MetaCart
We address the problem of visual category recognition by learning an image-to-image distance function that attempts to satisfy the following property: the distance between images from the same category should be less than the distance between images from different categories. We use patch-based feature vectors common in object recognition work as a basis for our image-to-image distance functions. Our large-margin formulation for learning the distance functions is similar to formulations used in the machine learning literature on distance metric learning, however we differ in that we learn local distance functions— a different parameterized function for every image of our training set—whereas typically a single global distance function is learned. This was a novel approach first introduced in Frome, Singer, & Malik, NIPS 2006. In that work we learned the local distance functions independently, and the outputs of these functions could not be compared at test time without the use of additional heuristics or training. Here we introduce a different approach that has the advantage that it learns distance functions that are globally consistent in that they can be directly compared for purposes of retrieval and classification. The output of the learning algorithm are weights assigned to the image features, which is intuitively appealing in the computer vision setting: some features are more salient than others, and which are more salient depends on the category, or image, being considered. We train and test using the Caltech 101 object recognition benchmark. Using fifteen training images per category, we achieved a mean recognition rate of 63.2 % and
Large-scale multimodal semantic concept detection for consumer video
- in MIR workshop, ACM Multimedia
, 2007
"... In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
In this paper we present a systematic study of automatic classification of consumer videos into a large set of diverse semantic concept classes, which have been carefully selected based on user studies and extensively annotated over 1300+ videos from real users. Our goals are to assess the state of the art of multimedia analytics (including both audio and visual analysis) in consumer video classification and to discover new research opportunities. We investigated several statistical approaches built upon global/local visual features, audio features, and audio-visual combinations. Three multi-modal fusion frameworks (ensemble, context fusion, and joint boosting) are also evaluated. Experiment results show that visual and audio models perform best for different sets of concepts. Both provide significant contributions to multimodal fusion, via expansion of the classifier pool for context fusion and the feature bases for feature sharing. The fused multimodal models are shown to significantly reduce the detection errors (compared to single modality models), resulting in a promising accuracy of 83 % over diverse concepts. To the best of our knowledge, this is the first work on systematic investigation of multimodal classification using a large-scale ontology and realistic video corpus.
Pyramid match hashing: Sub-linear time indexing over partial correspondences
- In CVPR
, 2007
"... Matching local features across images is often useful when comparing or recognizing objects or scenes, and efficient techniques for obtaining image-to-image correspondences have been developed [6, 4, 11]. However, given a query image, searching a very large image database with such measures remains ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Matching local features across images is often useful when comparing or recognizing objects or scenes, and efficient techniques for obtaining image-to-image correspondences have been developed [6, 4, 11]. However, given a query image, searching a very large image database with such measures remains impractical. We introduce a sublinear time randomized hashing algorithm for indexing sets of feature vectors under their partial correspondences. We develop an efficient embedding function for the normalized partial matching similarity between sets, and show how to exploit random hyperplane properties to construct hash functions that satisfy locality-sensitive constraints. The result is a bounded approximate similarity search algorithm that finds (1 + ɛ)-approximate nearest neighbor images in O(N 1/(1+ɛ) ) time for a database containing N images represented by (varying numbers of) local features. By design the indexing is robust to outlier features, as it favors strong one-to-one matchings but does not penalize for additional distant features. We demonstrate our approach applied to image retrieval for images represented by sets of local appearance features, and show that searching over correspondences is now scalable to large image databases. 1.
Multi-label Multiple Kernel Learning
"... We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We present a multi-label multiple kernel learning (MKL) formulation in which the data are embedded into a low-dimensional space directed by the instancelabel correlations encoded into a hypergraph. We formulate the problem in the kernel-induced feature space and propose to learn the kernel matrix as a linear combination of a given collection of kernel matrices in the MKL framework. The proposed learning formulation leads to a non-smooth min-max problem, which can be cast into a semi-infinite linear program (SILP). We further propose an approximate formulation with a guaranteed error bound which involves an unconstrained convex optimization problem. In addition, we show that the objective function of the approximate formulation is differentiable with Lipschitz continuous gradient, and hence existing methods can be employed to compute the optimal solution efficiently. We apply the proposed formulation to the automated annotation of Drosophila gene expression pattern images, and promising results have been reported in comparison with representative algorithms. 1
Earth Mover Distance over High-Dimensional Spaces
- ELECTRONIC COLLOQUIUM ON COMPUTATIONAL COMPLEXITY, REPORT NO. 48 (2007)
, 2007
"... The Earth Mover Distance (EMD) between two equal-size sets of points in R d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by r ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
The Earth Mover Distance (EMD) between two equal-size sets of points in R d is defined to be the minimum cost of a bipartite matching between the two pointsets. It is a natural metric for comparing sets of features, and as such, it has received significant interest in computer vision. Motivated by recent developments in that area, we address computational problems involving EMD over high-dimensional pointsets. A natural approach is to embed the EMD metric into ℓ1, and use the algorithms designed for the latter space. However, Khot and Naor [KN05] show that any embedding of EMD over the d-dimensional Hamming cube into ℓ1 must incur a distortion Ω(d), thus practically losing all distance information. We circumvent this roadblock by focusing on sets with cardinalities upper-bounded by a parameter s, and achieve a distortion of only O(log s · log d). Since in applications the feature sets have bounded size, the resulting distortion is much smaller than the Ω(d) lower bound. Our approach is quite general and easily extends to EMD over R d. We then provide a strong lower bound on the multi-round communication complexity of estimating EMD, which in particular strengthens the known non-embeddability result of [KN05]. Our bound exhibits a smooth tradeoff between approximation and communication, and for example implies that every algorithm that estimates EMD using constant size sketches can only achieve Ω(log s) approximation.
A.: High-Dimensional Feature Matching: Employing the Concept of Meaningful Nearest Neighbors
- In: ICCV (2007
"... Matching of high-dimensional features using nearest neighbors search is an important part of image matching methods which are based on local invariant features. In this work we highlight effects pertinent to high-dimensional spaces that are significant for matching, yet have not been explicitly acco ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Matching of high-dimensional features using nearest neighbors search is an important part of image matching methods which are based on local invariant features. In this work we highlight effects pertinent to high-dimensional spaces that are significant for matching, yet have not been explicitly accounted for in previous work. In our approach, we require every nearest neighbor to be meaningful, that is, sufficiently close to a query feature such that it is an outlier to a background feature distribution. We estimate the background feature distribution from the extended neighborhood of a query feature given by its k nearest neighbors. Based on the concept of meaningful nearest neighbors, we develop a novel high-dimensional feature matching method and evaluate its performance by conducting image matching on two challenging image data sets. A superior performance in terms of accuracy is shown in comparison to several state-of-the-art approaches. Additionally, to make search for k nearest neighbors more efficient, we develop a novel approximate nearest neighbors search method based on sparse coding with an overcomplete basis set that provides a ten-fold speed-up over an exhaustive search even for high dimensional spaces and retains excellent approximation to an exact nearest neighbors search. 1.
Efficiently matching sets of features with random histograms
- in ACM Multimedia
, 2008
"... As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made du ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
As the commonly used representation of a feature-rich data object has evolved from a single feature vector to a set of feature vectors, a key challenge in building a content-based search engine for feature-rich data is to match feature-sets efficiently. Although substantial progress has been made during the past few years, existing approaches are still inefficient and inflexible for building a search engine for massive datasets. This paper presents a randomized algorithm to embed a set of features into a single high-dimensional vector to simplify the feature-set matching problem. The main idea is to project feature vectors into an auxiliary space using locality sensitive hashing and to represent a set of features as a histogram in the auxiliary space. A histogram is simply a high dimensional vector, and efficient similarity measures like L1 and L2 distances can be employed to approximate feature-set distance measures. We evaluated the proposed approach under three different task settings, i.e. content-based image search, image object recognition and near-duplicate video clip detection. The experimental results show that the proposed approach is indeed effective and flexible. It can achieve accuracy comparable to the feature-set matching methods, while requiring significantly less space and time. For object recognition with Caltech 101 dataset, our method runs 25 times faster to achieve the same precision as Pyramid Matching Kernel, the state-of-the-art feature-set matching method.
Combining brain computer interfaces with vision for object categorization
- In CVPR
, 2008
"... Human-aided computing proposes using information measured directly from the human brain in order to perform useful tasks. In this paper, we extend this idea by fusing computer vision-based processing and processing done by the human brain in order to build more effective object categorization system ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Human-aided computing proposes using information measured directly from the human brain in order to perform useful tasks. In this paper, we extend this idea by fusing computer vision-based processing and processing done by the human brain in order to build more effective object categorization systems. Specifically, we use an electroencephalograph (EEG) device to measure the subconscious cognitive processing that occurs in the brain as users see images, even when they are not trying to explicitly classify them. We present a novel framework that combines a discriminative visual category recognition system based on the Pyramid Match Kernel (PMK) with information derived from EEG measurements as users view images. We propose a fast convex kernel alignment algorithm to effectively combine the two sources of information. Our approach is validated with experiments using real-world data, where we show significant gains in classification accuracy. We analyze the properties of this information fusion method by examining the relative contributions of the two modalities, the errors arising from each source, and the stability of the combination in repeated experiments. 1.

