Results 1  10
of
11
Using the Nyström Method to Speed Up Kernel Machines
 Advances in Neural Information Processing Systems 13
, 2001
"... A major problem for kernelbased predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix ..."
Abstract

Cited by 286 (6 self)
 Add to MetaCart
A major problem for kernelbased predictors (such as Support Vector Machines and Gaussian processes) is that the amount of computation required to find the solution scales as O(n ), where n is the number of training examples. We show that an approximation to the eigendecomposition of the Gram matrix can be computed by the Nyström method (which is used for the numerical solution of eigenproblems). This is achieved by carrying out an eigendecomposition on a smaller system of size m < n, and then expanding the results back up to n dimensions. The computational complexity of a predictor using this approximation is O(m n). We report experiments on the USPS and abalone data sets and show that we can set m n without any significant decrease in the accuracy of the solution.
On Clusterings: Good, Bad and Spectral
, 2000
"... We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spe ..."
Abstract

Cited by 254 (12 self)
 Add to MetaCart
We motivate and develop a natural bicriteria measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. A simple recursive heuristic has polylogarithmic worstcase guarantees under the new measure. The main result of the paper is the analysis of a popular spectral algorithm. One variant of spectral clustering turns out to have effective worstcase guarantees
Fast Monte Carlo Algorithms for Matrices II: Computing a LowRank Approximation to a Matrix
 SIAM Journal on Computing
, 2004
"... matrix A. It is often of interest to nd a lowrank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a speci ed rank k, where k is much smaller than m and n. Methods such as the Singular Value Decomposition (SVD) may be used to nd an approximation to A which ..."
Abstract

Cited by 142 (17 self)
 Add to MetaCart
matrix A. It is often of interest to nd a lowrank approximation to A, i.e., an approximation D to the matrix A of rank not greater than a speci ed rank k, where k is much smaller than m and n. Methods such as the Singular Value Decomposition (SVD) may be used to nd an approximation to A which is the best in a well de ned sense. These methods require memory and time which are superlinear in m and n; for many applications in which the data sets are very large this is prohibitive. Two simple and intuitive algorithms are presented which, when given an m n matrix A, compute a description of a lowrank approximation D to A, and which are qualitatively faster than the SVD. Both algorithms have provable bounds for the error matrix A D . For any matrix X , let kXk and kXk 2 denote its Frobenius norm and its spectral norm, respectively. In the rst algorithm, c = O(1) columns of A are randomly chosen. If the m c matrix C consists of those c columns of A (after appropriate rescaling) then it is shown that from C C approximations to the top singular values and corresponding singular vectors may be computed. From the computed singular vectors a description D of the matrix A may be computed such that rank(D ) k and such that holds with high probability for both = 2; F . This algorithm may be implemented without storing the matrix A in Random Access Memory (RAM), provided it can make two passes over the matrix stored in external memory and use O(m + n) additional RAM memory. The second algorithm is similar except that it further approximates the matrix C by randomly sampling r = O(1) rows of C to form a r c matrix W . Thus, it has additional error, but it can be implemented in three passes over the matrix using only constant ...
The art of uninformed decisions: A primer to property testing
 Science
, 2001
"... Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size. ..."
Abstract

Cited by 128 (20 self)
 Add to MetaCart
Property testing is a new field in computational theory, that deals with the information that can be deduced from the input where the number of allowable queries (reads from the input) is significally smaller than its size.
A Spectral Algorithm for Learning Mixtures of Distributions
 Journal of Computer and System Sciences
, 2002
"... We show that a simple spectral algorithm for learning a mixture of k spherical Gaussians in R works remarkably well  it succeeds in identifying the Gaussians assuming essentially the minimum possible separation between their centers that keeps them unique (solving an open problem of [1]). The ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
We show that a simple spectral algorithm for learning a mixture of k spherical Gaussians in R works remarkably well  it succeeds in identifying the Gaussians assuming essentially the minimum possible separation between their centers that keeps them unique (solving an open problem of [1]). The sample complexity and running time are polynomial in both n and k. The algorithm also works for the more general problem of learning a mixture of "weakly isotropic" distributions (e.g. a mixture of uniform distributions on cubes).
Approximating the Minimum Spanning Tree Weight in Sublinear Time
 In Proceedings of the 28th Annual International Colloquium on Automata, Languages and Programming (ICALP
, 2001
"... We present a probabilistic algorithm that, given a connected graph G (represented by adjacency lists) of average degree d, with edge weights in the set {1,...,w}, and given a parameter 0 < ε < 1/2, estimates in time O(dwε−2 log dw ε) the weight of the minimum spanning tree of G with a relative erro ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
We present a probabilistic algorithm that, given a connected graph G (represented by adjacency lists) of average degree d, with edge weights in the set {1,...,w}, and given a parameter 0 < ε < 1/2, estimates in time O(dwε−2 log dw ε) the weight of the minimum spanning tree of G with a relative error of at most ε. Note that the running time does not depend on the number of vertices in G. We also prove a nearly matching lower bound of Ω(dwε−2) on the probe and time complexity of any approximation algorithm for MST weight. The essential component of our algorithm is a procedure for estimating in time O(dε−2 log d ε) the number of connected components of an unweighted graph to within an additive error of εn. (This becomes O(ε−2 log 1 ε) for d = O(1).) The time bound is shown to be tight up to within the log d ε factor. Our connectedcomponents algorithm picks O(1/ε2) vertices in the graph and then grows “local spanning trees” whose sizes are specified by a stochastic process. From the local information collected in this way, the algorithm is able to infer, with high confidence, an estimate of the number of connected components. We then show how estimates on the number of components in various subgraphs of G can be used to estimate the weight of its MST. 1
Polynomial Time Approximation Schemes for Geometric kClustering
 J. OF THE ACM
, 2001
"... The JohnsonLindenstrauss lemma states that n points in a high dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linea ..."
Abstract

Cited by 30 (5 self)
 Add to MetaCart
The JohnsonLindenstrauss lemma states that n points in a high dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NPhard clustering problems in the cube as well as in geometric settings. More specifically, we address the following clustering problem. Given n points in a larger set (for example, R^d) endowed with a distance function (for example, L² distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center", so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NPhard in some high dimensional geometric settings, even for k = 2. We give polynomial time approximation schemes for this problem in several settings, including the binary cube {0, 1}^d with Hamming distance, and R^d either with L¹ distance, or with L² distance, or with the square of L² distance. In all these settings, the best previous results were constant factor approximation guarantees. We note that our problem is similar in flavor to the kmedian problem (and the related facility location problem), which has been considered in graphtheoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.
Observations on the Nyström Method for Gaussian Processes
, 2002
"... A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nyström method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nyström method to the Subset of Regressors method (Poggio and Girosi, 1990; Luo and Wa ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nyström method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nyström method to the Subset of Regressors method (Poggio and Girosi, 1990; Luo and Wahba, 1997) and (2) understanding in what circumstances the Nyström approximation would be expected to provide a good approximation to exact GP regression.
Sampling subproblems of heterogeneous MaxCut problems and approximation algorithms
 In Proceedings of the 22nd Annual International Symposium on Theoretical Aspects of Computer Science
"... ABSTRACT: Recent work in the analysis of randomized approximation algorithms for NPhard optimization problems has involved approximating the solution to a problem by the solution of a related subproblem of constant size, where the subproblem is constructed by sampling elements of the original probl ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
ABSTRACT: Recent work in the analysis of randomized approximation algorithms for NPhard optimization problems has involved approximating the solution to a problem by the solution of a related subproblem of constant size, where the subproblem is constructed by sampling elements of the original problem uniformly at random. In light of interest in problems with a heterogeneous structure, for which uniform sampling might be expected to yield suboptimal results, we investigate the use of nonuniform sampling probabilities. We develop and analyze an algorithm which uses a novel sampling method to obtain improved bounds for approximating the MaxCut of a graph. In particular, we show that by judicious choice of sampling probabilities one can obtain error bounds that are superior to the ones obtained by uniform sampling, both for unweighted and weighted versions of MaxCut. Of at least as much interest as the results we derive are the techniques we use. The first technique is a method to compute a compressed approximate decomposition of a matrix as the product
A polynomial time approximation scheme for metric MINBISECTION
 ECCC
, 2002
"... We design a polynomial time approximation scheme (PTAS) for the problem of Metric MINBISECTION of dividing a given finite metric space into two halves so as to minimize the sum of distances across that partition. The method of solution depends on a new metric placement partitioning method which cou ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We design a polynomial time approximation scheme (PTAS) for the problem of Metric MINBISECTION of dividing a given finite metric space into two halves so as to minimize the sum of distances across that partition. The method of solution depends on a new metric placement partitioning method which could be also of independent interest.