Results 1 - 10
of
37
Improved approximation algorithms for large matrices via random projections
- in Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
"... Recently several results appeared that show significant reduction in time for matrix multiplication, singular value decomposition as well as linear (ℓ2) regression, all based on data dependent random sampling. Our key idea is that low dimensional embeddings can be used to eliminate data dependence a ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
Recently several results appeared that show significant reduction in time for matrix multiplication, singular value decomposition as well as linear (ℓ2) regression, all based on data dependent random sampling. Our key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation. Our main contribution is summarized as follows. • Independent of the recent results of Har-Peled and of Deshpande and Vempala, one of the first – and to the best of our knowledge the most efficient – relative-error (1 + ɛ) ‖A − Ak‖F approximation algorithms for the singular value decomposition of an m × n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O M k + (n + m)k2 ɛ ɛ2) log 1 δ • The first o(nd 2) time (1+ɛ) relative-error approximation algorithm for n×d linear (ℓ2) regression. • A matrix multiplication algorithm that easily applies to implicitly given matrices. 1
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an nn Gram matrix G such that compu ..."
Abstract
-
Cited by 56 (6 self)
- Add to MetaCart
A problem for many kernel-based methods is that the amount of computation required to find the solution scales as O(n³), where n is the number of training examples. We develop and analyze an algorithm to compute an easily-interpretable low-rank approximation to an nn Gram matrix G such that computations of interest may be performed more rapidly. The approximation is of the form G k = CW , where C is a matrix consisting of a small number c of columns of G and W k is the best rank-k approximation to W , the matrix formed by the intersection between those c columns of G and the corresponding c rows of G. An important aspect of the algorithm is the probability distribution used to randomly sample the columns; we will use a judiciously-chosen and data-dependent nonuniform probability distribution. Let F denote the spectral norm and the Frobenius norm, respectively, of a matrix, and let G k be the best rank-k approximation to G. We prove that by choosing O(k/# ) columns both in expectation and with high probability, for both # = 2, F , and for all k : 0 rank(W ). This approximation can be computed using O(n) additional space and time, after making two passes over the data from external storage. The relationships between this algorithm, other related matrix decompositions, and the Nyström method from integral equation theory are discussed.
Geometric approximation via coresets
- Combinatorial and Computational Geometry, MSRI
, 2005
"... Abstract. The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem o ..."
Abstract
-
Cited by 47 (7 self)
- Add to MetaCart
Abstract. The paradigm of coresets has recently emerged as a powerful tool for efficiently approximating various extent measures of a point set P. Using this paradigm, one quickly computes a small subset Q of P, called a coreset, that approximates the original set P and and then solves the problem on Q using a relatively inefficient algorithm. The solution for Q is then translated to an approximate solution to the original point set P. This paper describes the ways in which this paradigm has been successfully applied to various optimization and extent measure problems. 1.
A randomized algorithm for a tensor-based generalization of the Singular Value Decomposition
- In Linear
, 2005
"... ~A ..."
RELATIVE-ERROR CUR MATRIX DECOMPOSITIONS
- SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rank-k approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants
Subspace sampling and relative-error matrix approximation: Column-based methods
- In Proc. of the 10th RANDOM
, 2006
"... Abstract. Given an m×n matrix A and an integer k less than the rank of A, the “best ” rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, i ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Abstract. Given an m×n matrix A and an integer k less than the rank of A, the “best ” rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a “small ” (i.e., a low-degree polynomial in k,1/ɛ, andlog(1/δ)) number of actual columns of A such that � A − CC + A � �F ≤ (1 + ɛ) �A − Ak � F with probability at least 1−δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of “subspace sampling, ” so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.
Coresets for weighted facilities and their applications
- In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS’06
, 2006
"... We develop efficient (1 + ε)-approximation algorithms for generalized facility location problems. Such facilities are not restricted to being points in R d, and can represent more complex structures such as linear facilities (lines in R d, j-dimensional flats), etc. We introduce coresets for weighte ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
We develop efficient (1 + ε)-approximation algorithms for generalized facility location problems. Such facilities are not restricted to being points in R d, and can represent more complex structures such as linear facilities (lines in R d, j-dimensional flats), etc. We introduce coresets for weighted (point) facilities. These prove to be useful for such generalized facility location problems, and provide efficient algorithms for their construction. Applications include: k-mean and k-median generalizations, i.e., find k lines that minimize the sum (or sum of squares) of the distances from each input point to its nearest line. Other applications are generalizations of linear regression problems to multiple regression lines, new SVD/PCA generalizations, and many more. The results significantly improve on previous work, which deals efficiently only with special cases. Open source code for the algorithms in this paper is also available. 1
A randomized algorithm for the approximation of matrices
- In review. Yale CS research report YALEU/DCS/RR-1361
, 2006
"... Abstract. Given an m×n matrix A and a positive integer k, we describe a randomized procedure for the approximation of A with a matrix Z of rank k. The procedure relies on applying A T to a collection of l random vectors, where l is an integer equal to or slightly greater than k; the scheme is effici ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract. Given an m×n matrix A and a positive integer k, we describe a randomized procedure for the approximation of A with a matrix Z of rank k. The procedure relies on applying A T to a collection of l random vectors, where l is an integer equal to or slightly greater than k; the scheme is efficient whenever A and A T can be applied rapidly to arbitrary vectors. The discrepancy between A and Z is of the same order as √ lm times the (k + 1) st greatest singular value σk+1 of A, with negligible probability of even moderately large deviations. The actual estimates derived in the paper are fairly complicated, but are simpler when l − k is a fixed small nonnegative integer. For example, according to one of our estimates for l − k = 20, the probability that the spectral norm �A − Z � is greater than 10 p (k + 20) m σk+1 is less than 10 −17. The paper contains a number of estimates for �A − Z�, including several that are stronger (but more detailed) than the preceding example; some of the estimates are effectively independent of m. Thus, given a matrix A of limited numerical rank, such that both A and A T can be applied rapidly to arbitrary vectors, the scheme provides a simple, efficient means for constructing an accurate approximation to a singular value decomposition of A. Furthermore, the algorithm presented here operates reliably independently of the structure of the matrix A. The results are illustrated via several numerical examples.
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
Large-Scale Manifold Learning
"... This paper examines the problem of extracting lowdimensional manifold structure given millions of highdimensional face images. Specifically, we address the computational challenges of nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nod ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper examines the problem of extracting lowdimensional manifold structure given millions of highdimensional face images. Specifically, we address the computational challenges of nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. Since most manifold learning techniques rely on spectral decomposition, we first analyze two approximate spectral decomposition techniques for large dense matrices (Nyström and Column-sampling), providing the first direct theoretical and empirical comparison between these techniques. We next show extensive experiments on learning low-dimensional embeddings for two large face datasets: CMU-PIE (35 thousand faces) and a web dataset (18 million faces). Our comparisons show that the Nyström approximation is superior to the Column-sampling method. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMU-PIE dataset. 1.

