Results 1  10
of
53
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract

Cited by 253 (6 self)
 Add to MetaCart
(Show Context)
Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing lowrank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired lowrank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel twostage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract

Cited by 74 (13 self)
 Add to MetaCart
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel twostage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciouslychosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic columnselection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rankk approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix ..."
Abstract

Cited by 53 (11 self)
 Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix approximation as well as in largescale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relativeerror approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relativeerror approximation to an underconstrained leastsquares approximation problem, or, relatedly, it may be viewed as an application of JohnsonLindenstrauss type ideas. Several practicallyimportant extensions of our basic result are also described, including the approximation of socalled crossleverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
Revisiting the Nyström method for improved largescale machine learning
"... We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the lowrank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worstcase theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additiveerror bounds for spectral and Frobenius norm error and relativeerror bounds for trace norm error. 1.
Lowdistortion subspace embeddings in inputsparsity time and applications to robust linear regression
, 2012
"... Lowdistortion embeddings are critical building blocks for developing random sampling and random projection algorithms for common linear algebra problems. We show that, given a matrix A ∈ Rn×d with n d and a p ∈ [1, 2), with a constant probability, we can construct a lowdistortion embedding matr ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
Lowdistortion embeddings are critical building blocks for developing random sampling and random projection algorithms for common linear algebra problems. We show that, given a matrix A ∈ Rn×d with n d and a p ∈ [1, 2), with a constant probability, we can construct a lowdistortion embedding matrix Π ∈ RO(poly(d))×n that embeds Ap, the `p subspace spanned by A’s columns, into (RO(poly(d)), ‖ · ‖p); the distortion of our embeddings is only O(poly(d)), and we can compute ΠA in O(nnz(A)) time, i.e., inputsparsity time. Our result generalizes the inputsparsity time `2 subspace embedding by Clarkson and Woodruff [STOC’13]; and for completeness, we present a simpler and improved analysis of their construction for `2. These inputsparsity time `p embeddings are optimal, up to constants, in terms of their running time; and the improved running time propagates to applications such as (1 ± )distortion `p subspace embedding and relativeerror `p regression. For `2, we show that a (1 + )approximate solution to the `2 regression problem specified by the matrix A and a vector b ∈ Rn can be computed in O(nnz(A) + d3 log(d/)/2) time; and for `p, via a subspacepreserving sampling procedure, we show that a (1 ± )distortion embedding of Ap into RO(poly(d)) can be computed in O(nnz(A) · logn) time, and we also show that a (1 + )approximate solution to the `p regression problem minx∈Rd ‖Ax − b‖p can be computed in O(nnz(A) · logn + poly(d) log(1/)/2) time. Moreover, we can also improve the embedding dimension or equivalently the sample size to O(d3+p/2 log(1/)/2) without increasing the complexity.
Simple and deterministic matrix sketching
 CoRR
"... We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix A ∈ R n×m one after the other in a streaming fashion. For ℓ = ⌈1/ε ⌉ it maintains a sketch matrix B ∈ R ℓ×m such that for any unit vector x ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
(Show Context)
We adapt a well known streaming algorithm for approximating item frequencies to the matrix sketching setting. The algorithm receives the rows of a large matrix A ∈ R n×m one after the other in a streaming fashion. For ℓ = ⌈1/ε ⌉ it maintains a sketch matrix B ∈ R ℓ×m such that for any unit vector x ‖Ax ‖ 2 ≥ ‖Bx ‖ 2 ≥ ‖Ax ‖ 2 − ε‖A ‖ 2 f. Sketch updates per row in A require amortized O(mℓ) operations. This gives the first algorithm whose error guaranty decreases proportional to 1/ℓ using O(mℓ) space. Prior art algorithms produce bounds proportional to 1 / √ ℓ. Our experiments corroborate that the faster convergence rate is observed in practice. The presented algorithm also stands out in that it is: deterministic, simple to implement, and elementary to prove. Regardless of streaming aspects, the algorithm can be used to compute a 1+ε ′ approximation to the best rank k approximation of any matrix A ∈ R n×m. This requires O(mnℓ ′ ) operations and O(mℓ ′ ) space where ℓ ′ =
Dense fast random projections and Lean Walsh transforms
 In Proceedings of the 12th International Workshop on Randomization and Computation (RANDOM
, 2008
"... Random projection methods give distributions over k × d matrices such that if a matrix Ψ (chosen according to the distribution) is applied to a finite set of vectors xi ∈ R d the resulting vectors Ψxi ∈ R k approximately preserve the original metric with constant probability. First, we show that any ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Random projection methods give distributions over k × d matrices such that if a matrix Ψ (chosen according to the distribution) is applied to a finite set of vectors xi ∈ R d the resulting vectors Ψxi ∈ R k approximately preserve the original metric with constant probability. First, we show that any matrix (composed with a random ±1 diagonal matrix) is a good random projector for a subset of vectors in R d. Second, we describe a family of tensor product matrices which we term Lean Walsh. We show that using Lean Walsh matrices as random projections outperforms, in terms or running time, the best known current result (due to Matousek) under comparable assumptions.
Random projections for the nonnegative leastsquares problem
 LINEAR ALGEBRA AND ITS APPLICATIONS
, 2009
"... ..."