Results 1 - 10
of
53
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract
-
Cited by 253 (6 self)
- Add to MetaCart
(Show Context)
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract
-
Cited by 74 (13 self)
- Add to MetaCart
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix ..."
Abstract
-
Cited by 53 (11 self)
- Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
Revisiting the Nyström method for improved large-scale machine learning
"... We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error. 1.
Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression
, 2012
"... Low-distortion embeddings are critical building blocks for developing random sampling and random projection algo-rithms for common linear algebra problems. We show that, given a matrix A ∈ Rn×d with n d and a p ∈ [1, 2), with a constant probability, we can construct a low-distortion em-bedding matr ..."
Abstract
-
Cited by 26 (4 self)
- Add to MetaCart
Low-distortion embeddings are critical building blocks for developing random sampling and random projection algo-rithms for common linear algebra problems. We show that, given a matrix A ∈ Rn×d with n d and a p ∈ [1, 2), with a constant probability, we can construct a low-distortion em-bedding matrix Π ∈ RO(poly(d))×n that embeds Ap, the `p subspace spanned by A’s columns, into (RO(poly(d)), ‖ · ‖p); the distortion of our embeddings is only O(poly(d)), and we can compute ΠA in O(nnz(A)) time, i.e., input-sparsity time. Our result generalizes the input-sparsity time `2 sub-space embedding by Clarkson and Woodruff [STOC’13]; and for completeness, we present a simpler and improved analy-sis of their construction for `2. These input-sparsity time `p embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as (1 ± )-distortion `p subspace embed-ding and relative-error `p regression. For `2, we show that a (1 + )-approximate solution to the `2 regression problem specified by the matrix A and a vector b ∈ Rn can be com-puted in O(nnz(A) + d3 log(d/)/2) time; and for `p, via a subspace-preserving sampling procedure, we show that a (1 ± )-distortion embedding of Ap into RO(poly(d)) can be computed in O(nnz(A) · logn) time, and we also show that a (1 + )-approximate solution to the `p regression problem minx∈Rd ‖Ax − b‖p can be computed in O(nnz(A) · logn + poly(d) log(1/)/2) time. Moreover, we can also improve the embedding dimension or equivalently the sample size to O(d3+p/2 log(1/)/2) without increasing the complexity.
Simple and deterministic matrix sketching
- CoRR
"... We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix A ∈ R n×m one after the other in a streaming fashion. For ℓ = ⌈1/ε ⌉ it maintains a sketch matrix B ∈ R ℓ×m such that for any unit vector x ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
(Show Context)
We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix A ∈ R n×m one after the other in a streaming fashion. For ℓ = ⌈1/ε ⌉ it maintains a sketch matrix B ∈ R ℓ×m such that for any unit vector x ‖Ax ‖ 2 ≥ ‖Bx ‖ 2 ≥ ‖Ax ‖ 2 − ε‖A ‖ 2 f. Sketch updates per row in A require amortized O(mℓ) operations. This gives the first algorithm whose error guaranty decreases proportional to 1/ℓ using O(mℓ) space. Prior art algorithms produce bounds proportional to 1 / √ ℓ. Our experiments corroborate that the faster convergence rate is observed in practice. The presented algorithm also stands out in that it is: deterministic, simple to implement, and elementary to prove. Regardless of streaming aspects, the algorithm can be used to compute a 1+ε ′ approximation to the best rank k approximation of any matrix A ∈ R n×m. This requires O(mnℓ ′ ) operations and O(mℓ ′ ) space where ℓ ′ =
Dense fast random projections and Lean Walsh transforms
- In Proceedings of the 12th International Workshop on Randomization and Computation (RANDOM
, 2008
"... Random projection methods give distributions over k × d matrices such that if a matrix Ψ (chosen according to the distribution) is applied to a finite set of vectors xi ∈ R d the resulting vectors Ψxi ∈ R k approximately preserve the original metric with constant probability. First, we show that any ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
(Show Context)
Random projection methods give distributions over k × d matrices such that if a matrix Ψ (chosen according to the distribution) is applied to a finite set of vectors xi ∈ R d the resulting vectors Ψxi ∈ R k approximately preserve the original metric with constant probability. First, we show that any matrix (composed with a random ±1 diagonal matrix) is a good random projector for a subset of vectors in R d. Second, we describe a family of tensor product matrices which we term Lean Walsh. We show that using Lean Walsh matrices as random projections outperforms, in terms or running time, the best known current result (due to Matousek) under comparable assumptions.
Random projections for the nonnegative least-squares problem
- LINEAR ALGEBRA AND ITS APPLICATIONS
, 2009
"... ..."