Results 1  10
of
316
SemiSupervised Learning Literature Survey
, 2006
"... We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter ..."
Abstract

Cited by 782 (8 self)
 Add to MetaCart
(Show Context)
We review the literature on semisupervised learning, which is an area in machine learning and more generally, artificial intelligence. There has been a whole
spectrum of interesting ideas on how to learn from both labeled and unlabeled data, i.e. semisupervised learning. This document is a chapter excerpt from the author’s
doctoral thesis (Zhu, 2005). However the author plans to update the online version frequently to incorporate the latest development in the field. Please obtain the latest
version at http://www.cs.wisc.edu/~jerryzhu/pub/ssl_survey.pdf
Local features and kernels for classification of texture and object categories: a comprehensive study
 International Journal of Computer Vision
, 2007
"... Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a largescale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations an ..."
Abstract

Cited by 653 (34 self)
 Add to MetaCart
(Show Context)
Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a largescale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the χ 2 distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several stateoftheart recognition methods on four texture and five object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which groundtruth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective for classification of texture and object images under challenging realworld conditions, including significant intraclass variations and substantial background clutter.
Spectral hashing
, 2009
"... Semantic hashing [1] seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be sho ..."
Abstract

Cited by 284 (4 self)
 Add to MetaCart
(Show Context)
Semantic hashing [1] seeks compact binary codes of datapoints so that the Hamming distance between codewords correlates with semantic similarity. In this paper, we show that the problem of finding a best code for a given dataset is closely related to the problem of graph partitioning and can be shown to be NP hard. By relaxing the original problem, we obtain a spectral method whose solutions are simply a subset of thresholded eigenvectors of the graph Laplacian. By utilizing recent results on convergence of graph Laplacian eigenvectors to the LaplaceBeltrami eigenfunctions of manifolds, we show how to efficiently calculate the code of a novel datapoint. Taken together, both learning the code and applying it to a novel point are extremely simple. Our experiments show that our codes outperform the stateofthe art.
On the Nyström Method for Approximating a Gram Matrix for Improved KernelBased Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that compu ..."
Abstract

Cited by 188 (11 self)
 Add to MetaCart
A problem for many kernelbased methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easilyinterpretable lowrank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rankk approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciouslychosen and datadependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rankk approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
RELATIVEERROR CUR MATRIX DECOMPOSITIONS
 SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract

Cited by 86 (17 self)
 Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rankk approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a lowdegree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants
Semantic hierarchies for visual object recognition
 In Proc. IEEE Conf. Computer Vision and Pattern Recognition
, 2007
"... In this paper we propose to use lexical semantic networks to extend the stateoftheart object recognition techniques. We use the semantics of image labels to integrate prior knowledge about interclass relationships into the visual appearance learning. We show how to build and train a semantic hie ..."
Abstract

Cited by 79 (0 self)
 Add to MetaCart
(Show Context)
In this paper we propose to use lexical semantic networks to extend the stateoftheart object recognition techniques. We use the semantics of image labels to integrate prior knowledge about interclass relationships into the visual appearance learning. We show how to build and train a semantic hierarchy of discriminative classifiers and how to use it to perform object detection. We evaluate how our approach influences the classification accuracy and speed on the PASCAL VOC challenge 2006 dataset, a set of challenging realworld images. We also demonstrate additional features that become available to object recognition due to the extension with semantic inference tools—we can classify highlevel categories, such as animals, and we can train part detectors, for example a window detector, by pure inference in the semantic network. 1.
A bayesian, exemplarbased approach to hierarchical shape matching
 IEEE Trans. Pattern Anal. Mach. Intell
"... Abstract—This paper presents a novel probabilistic approach to hierarchical, exemplarbased shape matching. No feature correspondence is needed among exemplars, just a suitable pairwise similarity measure. The approach uses a template tree to efficiently represent and match the variety of shape exem ..."
Abstract

Cited by 74 (8 self)
 Add to MetaCart
(Show Context)
Abstract—This paper presents a novel probabilistic approach to hierarchical, exemplarbased shape matching. No feature correspondence is needed among exemplars, just a suitable pairwise similarity measure. The approach uses a template tree to efficiently represent and match the variety of shape exemplars. The tree is generated offline by a bottomup clustering approach using stochastic optimization. Online matching involves a simultaneous coarsetofine approach over the template tree and over the transformation parameters. The main contribution of this paper is a Bayesian model to estimate the a posteriori probability of the object class, after a certain match at a node of the tree. This model takes into account object scale and saliency and allows for a principled setting of the matching thresholds such that unpromising paths in the tree traversal process are eliminated early on. The proposed approach was tested in a variety of application domains. Here, results are presented on one of the more challenging domains: realtime pedestrian detection from a moving vehicle. A significant speedup is obtained when comparing the proposed probabilistic matching approach with a manually tuned nonprobabilistic variant, both utilizing the same template tree structure. Index Terms—Hierarchical shape matching, chamfer distance, Bayesian models. 1
Names and faces in the news
 In Proc. CVPR
, 2004
"... We show quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images. Our dataset is 44,773 face images, obtained by applying a face finder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition d ..."
Abstract

Cited by 72 (2 self)
 Add to MetaCart
(Show Context)
We show quite good face clustering is possible for a dataset of inaccurately and ambiguously labelled face images. Our dataset is 44,773 face images, obtained by applying a face finder to approximately half a million captioned news images. This dataset is more realistic than usual face recognition datasets, because it contains faces captured “in the wild ” in a variety of configurations with respect to the camera, taking a variety of expressions, and under illumination of widely varying color. Each face image is associated with a set of names, automatically extracted from the associated caption. Many, but not all such sets contain the correct name. We cluster face images in appropriate discriminant coordinates. We use a clustering procedure to break ambiguities in labelling and identify incorrectly labelled faces. A merging procedure then identifies variants of names that refer to the same individual. The resulting representation can be used to label faces in news images or to organize news pictures by individuals present. An alternative view of our procedure is as a process that cleans up noisy supervised data. We demonstrate how to use entropy measures to evaluate such procedures. 1.
Maximum margin clustering made practical.
 IEEE Transactions on Neural Networks,
, 2009
"... ..."
(Show Context)
Learning spatiotemporal graphs of human activities
 In ICCV
, 2011
"... Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically handpicks the primitives, their total number, and temporal relations (e.g., allow only followedby), and then only estimates their relative significance for act ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
(Show Context)
Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically handpicks the primitives, their total number, and temporal relations (e.g., allow only followedby), and then only estimates their relative significance for activity recognition. We advance prior work by learning what activity parts and their spatiotemporal relations should be captured to represent the activity, and how relevant they are for enabling efficient inference in realistic videos. We represent videos by spatiotemporal graphs, where nodes correspond to multiscale video segments, and edges capture their hierarchical, temporal, and spatial relationships. Access to video segments is provided by our new, multiscale segmenter. Given a set of training spatiotemporal graphs, we learn their archetype graph, and pdf’s associated with model nodes and edges. The model adaptively learns from data relevant video segments and their relations, addressing the “what ” and “how. ” Inference and learning are formulated within the same framework – that of a robust, leastsquares optimization – which is invariant to arbitrary permutations of nodes in spatiotemporal graphs. The model is used for parsing new videos in terms of detecting and localizing relevant activity parts. We outperform the state of the art on benchmark Olympic and UT humaninteraction datasets, under a favorable complexityvs.accuracy tradeoff. 1.