Results 1  10
of
58
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix ..."
Abstract

Cited by 48 (11 self)
 Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared rownorms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recentlypopular problems such as matrix completion and Nyströmbased lowrank matrix approximation as well as in largescale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relativeerror approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relativeerror approximation to an underconstrained leastsquares approximation problem, or, relatedly, it may be viewed as an application of JohnsonLindenstrauss type ideas. Several practicallyimportant extensions of our basic result are also described, including the approximation of socalled crossleverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings
, 2012
"... An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support ..."
Abstract

Cited by 29 (7 self)
 Add to MetaCart
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 nonzero entries per column. This improves the previously best known bound in [ClarksonWoodruff, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with s = 1 [NelsonNguy ˜ ên, 2012]. We also give two OSE’s, which we call Oblivious Sparse NormApproximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/ε2) and s = polylog(d)/ε, or m = O(d1+γ /ε2) and s = O(1/ε) for any constant γ> 0. 1 This m is nearly optimal since m ≥ d is required simply to ensure no nonzero vector of W lands in the kernel of Π. These are the first constructions with m = o(d 2) to have s = o(d). In fact, our OSNAPs are nothing more than the sparse JohnsonLindenstrauss matrices of [KaneNelson, SODA 2012]. Our analyses all yield OSE’s that are sampled using either O(1)wise or O(log d)wise
Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling
"... The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
The CUR matrix decomposition and the Nyström approximation are two important lowrank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relativeerror bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.
Tail bounds for all eigenvalues of a sum of random matrices
, 2011
"... This work introduces the minimax Laplace transform method, a modification of the cumulantbased matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random selfadjoint matrices. This machinery is used to derive eigenvalue ana ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This work introduces the minimax Laplace transform method, a modification of the cumulantbased matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random selfadjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherencelike quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(ε−2κ2` ` log p) samples, where κ ` = λ1(C)/λ`(C), are sufficient to ensure that the dominant ` eigenvalues of the covariance matrix of a N (0,C) random vector are estimated to within a factor of 1 ± ε with high probability.
Sharp analysis of lowrank kernel matrix approximations
 JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 30 (2013) 1–25
, 2013
"... We consider supervised learning problems within the positivedefinite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinitedimensional feature spaces, a common practical limiting difficulty is the necessity of c ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
We consider supervised learning problems within the positivedefinite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinitedimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n 2). Lowrank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p 2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of nonparametric estimators. This result enables simple algorithms that have subquadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worstcase situations.
Sketched SVD: Recovering spectral features from compressive measurements. ArXiv eprints
, 2012
"... ar ..."
(Show Context)
Efficient Algorithms and Error Analysis for the Modified Nyström Method
"... Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Many kernel methods suffer from high time and space complexities and are thus prohibitive in bigdata applications. To tackle the computational challenge, the Nyström method has been extensively used to reduce time and space complexities by sacrificing some accuracy. The Nyström method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nyström method called the modified Nyström method has demonstrated significant improvement over the standard Nyström method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nyström method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the stateoftheart algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nyström method is exact under certain conditions, and we establish a lower error bound for the modified Nyström method. 1
Sketching as a tool for numerical linear algebra
 Foundations and Trends in Theoretical Computer Science
"... ar ..."
(Show Context)
Matching the universal barrier without paying the costs : Solving linear programs with Õ( √ rank) linear system solves
 CoRR
"... In this paper we present a new algorithm for solving linear programs that requires only Õ( rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints and n variables and L is the bit complexity of a linear program. Each iteration of our method consists of solving ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper we present a new algorithm for solving linear programs that requires only Õ( rank(A)L) iterations where A is the constraint matrix of a linear program with m constraints and n variables and L is the bit complexity of a linear program. Each iteration of our method consists of solving Õ(1) linear systems and additional nearly linear time computation. Our method improves upon the previous best iteration bound by factor of Ω̃((m / rank(A))1/4) for methods with polynomial time computable iterations and by Ω̃((m / rank(A))1/2) for methods which solve at most Õ(1) linear systems in each iteration. Our method is parallelizable and amenable to linear algebraic techniques for accelerating the linear system solver. As such, up to polylogarithmic factors we either match or improve upon the best previous running times for solving linear programs in both depth and work for different ratios of m and rank(A). Moreover, our method matches up to polylogarithmic factors a theoretical limit established by Nesterov and Nemirovski in 1994 regarding the use of a “universal barrier ” for interior point methods, thereby resolving a longstanding open question regarding the running time of polynomial time interior point methods for linear programming. 1
Subspace Embeddings and `pRegression Using Exponential Random Variables
, 2014
"... Oblivious lowdistortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 ≤ p < ∞, given a matrix M ∈ Rn×d with n d, with constant probability we can choose a matrix Π with max(1, n1−2/p)poly(d) rows and n columns so that simultan ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Oblivious lowdistortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 ≤ p < ∞, given a matrix M ∈ Rn×d with n d, with constant probability we can choose a matrix Π with max(1, n1−2/p)poly(d) rows and n columns so that simultaneously for all x ∈ Rd, ‖Mx‖p ≤ ‖ΠMx‖ ∞ ≤ poly(d)‖Mx‖p. Importantly, ΠM can be computed in the optimal O(nnz(M)) time, where nnz(M) is the number of nonzero entries of M. This generalizes all previous oblivious subspace embeddings which required p ∈ [1, 2] due to their use of pstable random variables. Using our matrices Π, we also improve the best known distortion of oblivious subspace embeddings of `1 into `1 with Õ(d) target dimension in O(nnz(M)) time from Õ(d3) to Õ(d2), which can further be improved to Õ(d3/2) log1/2 n if d = Ω(log n), answering a question of Meng and Mahoney (STOC, 2013). We apply our results to `pregression, obtaining a (1 + )approximation in O(nnz(M) log n) + poly(d/) time, improving the best known poly(d/) factors for every p ∈ [1,∞) \ {2}. If one is just interested in a poly(d) rather than a (1 + )approximation to `pregression, a corollary of our results is that for all p ∈ [1,∞) we can solve the `pregression problem without using general convex programming, that is, since our subspace embeds into ` ∞ it suffices to solve a linear programming problem. Finally, we give the first protocols for the distributed `pregression problem for every p ≥ 1 which are nearly optimal in communication and computation. 1