Results 1  10
of
49
Semisupervised learning in gigantic image collections
 In Advances in Neural Information Processing Systems 22
, 2009
"... With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy labels ” may be extracted automatically from surrounding text, while for mo ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
(Show Context)
With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. “Clean labels” can be manually obtained on a small fraction, “noisy labels ” may be extracted automatically from surrounding text, while for most images there are no labels at all. Semisupervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semisupervised learning. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted LaplaceBeltrami operators. We combine this with a label sharing framework obtained from Wordnet to propagate label information to classes lacking manual annotations. Our algorithm enables us to apply semisupervised learning to a database of 80 million images with 74 thousand classes. 1.
1 Parallel Spectral Clustering in Distributed Systems
"... Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as kmeans. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform cluster ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
(Show Context)
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as kmeans. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
Sampling Methods for the Nyström Method
 JOURNAL OF MACHINE LEARNING RESEARCH
"... The Nyström method is an efficient technique to generate lowrank matrix approximations and is used in several largescale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a v ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
The Nyström method is an efficient technique to generate lowrank matrix approximations and is used in several largescale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemblebased sampling algorithms for the Nyström method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nyström method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nyström method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nyström method in comparison to the standard Nyström method.
On Samplingbased Approximate Spectral Decomposition
, 2009
"... This paper addresses the problem of approximate singular value decomposition of large dense matrices that arises naturally in many machine learning applications. We discuss two recently introduced samplingbased spectral decomposition techniques: the Nyström and the Columnsampling methods. We prese ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
(Show Context)
This paper addresses the problem of approximate singular value decomposition of large dense matrices that arises naturally in many machine learning applications. We discuss two recently introduced samplingbased spectral decomposition techniques: the Nyström and the Columnsampling methods. We present a theoretical comparison between the two methods and provide novel insights regarding their suitability for various applications. We then provide experimental results motivated by this theory. Finally, we propose an efficient adaptive sampling technique to select informative columns from the original matrix. This novel technique outperforms standard sampling methods on a variety of datasets.
Ensemble Nyström Method
"... A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on lowrank approximation of kernel matrices. We introduce a new family of algorithms based on mixtures of Nyström approximations, ensemble Nyström algorithms, that yield more ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on lowrank approximation of kernel matrices. We introduce a new family of algorithms based on mixtures of Nyström approximations, ensemble Nyström algorithms, that yield more accurate lowrank approximations than the standard Nyström method. We give a detailed study of variants of these algorithms based on simple averaging, an exponential weight method, or regressionbased methods. We also present a theoretical analysis of these algorithms, including novel error bounds guaranteeing a better convergence rate than the standard Nyström method. Finally, we report results of extensive experiments with several data sets containing up to 1M points demonstrating the significant improvement over the standard Nyström approximation.
Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling
"... The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relativeerror bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.
Inductive Hashing on Manifolds
"... Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes that preserve the Euclidean distance in the original space. Manifold learning ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
Learning based hashing methods have attracted considerable attention due to their ability to greatly increase the scale at which existing algorithms may operate. Most of these methods are designed to generate binary codes that preserve the Euclidean distance in the original space. Manifold learning techniques, in contrast, are better able to model the intrinsic structure embedded in the original highdimensional data. The complexity of these models, and the problems with outofsample data, have previously rendered them unsuitable for application to largescale embedding, however. In this work, we consider how to learn compact binary embeddings on their intrinsic manifolds. In order to address the abovementioned difficulties, we describe an efficient, inductive solution to the outofsample data problem, and a process by which nonparametric manifold learning may be used as the basis of a hashing method. Our proposed approach thus allows the development of a range of new hashing techniques exploiting the flexibility of the wide variety of manifold learning approaches available. We particularly show that hashing on the basis of tSNE [29] outperforms stateoftheart hashing methods on largescale benchmark datasets, and is very effective for image classification with very short code lengths. 1.
On landmark selection and sampling in highdimensional data analysis. arXiv:0906.4582v1[stat.ML
, 2009
"... In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the lowdimensional structure often prevalent in highdimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasi ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
In recent years, the spectral analysis of appropriately defined kernel matrices has emerged as a principled way to extract the lowdimensional structure often prevalent in highdimensional data. Here we provide an introduction to spectral methods for linear and nonlinear dimension reduction, emphasizing ways to overcome the computational limitations currently faced by practitioners with massive datasets. In particular, a data subsampling or landmark selection process is often employed to construct a kernel based on partial information, followed by an approximate spectral analysis termed the Nyström extension. We provide a quantitative framework to analyse this procedure, and use it to demonstrate algorithmic performance bounds on a range of practical approaches designed to optimize the landmark selection process. We compare the practical implications of these bounds by way of realworld examples drawn from the field of computer vision, whereby lowdimensional manifold structure is shown to emerge from highdimensional video data streams.