• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Randomized algorithms for matrices and data (2011)

by M W Mahoney
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 60
Next 10 →

Fast Approximation of Matrix Coherence and Statistical Leverage

by Petros Drineas, Malik Magdon-ismail, Michael W. Mahoney, David P. Woodruff, Mehryar Mohri
"... The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix ..."
Abstract - Cited by 53 (11 self) - Add to MetaCart
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
(Show Context)

Citation Context

... development of improved worst-case randomized matrix algorithms that are also amenable to high-quality numerical implementation and that are useful to domain scientists [19, 30, 11, 18, 40, 20]; see =-=[29]-=- for a detailed discussion. The näıve and best previously existing algorithm to compute these scores would compute an orthogonal basis for the dominant part of the spectrum of A, e.g., the basis prov...

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

by Jelani Nelson, Huy L. Nguy , 2012
"... An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support ..."
Abstract - Cited by 32 (7 self) - Add to MetaCart
An oblivious subspace embedding (OSE) given some parameters ε, d is a distribution D over matrices Π ∈ R m×n such that for any linear subspace W ⊆ R n with dim(W) = d it holds that PΠ∼D(∀x ∈ W ‖Πx‖2 ∈ (1 ± ε)‖x‖2)> 2/3. We show an OSE exists with m = O(d 2 /ε 2) and where every Π in the support of D has exactly s = 1 non-zero entries per column. This improves the previously best known bound in [Clarkson-Woodruff, arXiv abs/1207.6365]. Our quadratic dependence on d is optimal for any OSE with s = 1 [Nelson-Nguy ˜ ên, 2012]. We also give two OSE’s, which we call Oblivious Sparse Norm-Approximating Projections (OSNAPs), that both allow the parameter settings m = Õ(d/ε2) and s = polylog(d)/ε, or m = O(d1+γ /ε2) and s = O(1/ε) for any constant γ> 0. 1 This m is nearly optimal since m ≥ d is required simply to ensure no non-zero vector of W lands in the kernel of Π. These are the first constructions with m = o(d 2) to have s = o(d). In fact, our OSNAPs are nothing more than the sparse Johnson-Lindenstrauss matrices of [Kane-Nelson, SODA 2012]. Our analyses all yield OSE’s that are sampled using either O(1)-wise or O(log d)wise

Improving CUR Matrix Decomposition and the Nyström Approximation via Adaptive Sampling

by Shusen Wang, Mehryar Mohri
"... The CUR matrix decomposition and the Nyström approximation are two important low-rank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number ..."
Abstract - Cited by 17 (4 self) - Add to MetaCart
The CUR matrix decomposition and the Nyström approximation are two important low-rank matrix approximation techniques. The Nyström method approximates a symmetric positive semidefinite matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nyström approximation. In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nyström algorithms with expected relative-error bounds. The proposed CUR and Nyström algorithms also have low time complexity and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nyström method and the ensemble Nyström method. The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices.

Sharp analysis of low-rank kernel matrix approximations

by Francis Bach - JMLR: WORKSHOP AND CONFERENCE PROCEEDINGS VOL 30 (2013) 1–25 , 2013
"... We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of c ..."
Abstract - Cited by 13 (1 self) - Add to MetaCart
We consider supervised learning problems within the positive-definite kernel framework, such as kernel ridge regression, kernel logistic regression or the support vector machine. With kernels leading to infinite-dimensional feature spaces, a common practical limiting difficulty is the necessity of computing the kernel matrix, which most frequently leads to algorithms with running time at least quadratic in the number of observations n, i.e., O(n 2). Low-rank approximations of the kernel matrix are often considered as they allow the reduction of running time complexities to O(p 2 n), where p is the rank of the approximation. The practicality of such methods thus depends on the required rank p. In this paper, we show that in the context of kernel ridge regression, for approximations based on a random subset of columns of the original kernel matrix, the rank p may be chosen to be linear in the degrees of freedom associated with the problem, a quantity which is classically used in the statistical analysis of such methods, and is often seen as the implicit number of parameters of non-parametric estimators. This result enables simple algorithms that have sub-quadratic running time complexity, but provably exhibit the same predictive performance than existing algorithms, for any given problem instance, and not only for worst-case situations.

Tail bounds for all eigenvalues of a sum of random matrices

by Alex Gittens, Joel A. Tropp , 2011
"... This work introduces the minimax Laplace transform method, a modification of the cumulant-based matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices. This machinery is used to derive eigenvalue ana ..."
Abstract - Cited by 12 (2 self) - Add to MetaCart
This work introduces the minimax Laplace transform method, a modification of the cumulant-based matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherence-like quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(ε−2κ2` ` log p) samples, where κ ` = λ1(C)/λ`(C), are sufficient to ensure that the dominant ` eigenvalues of the covariance matrix of a N (0,C) random vector are estimated to within a factor of 1 ± ε with high probability.

Sketching as a tool for numerical linear algebra

by David P. Woodruff - Foundations and Trends in Theoretical Computer Science
"... ar ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

... While we do discuss this kind of sampling to some extent, our main focus will be on sketching. The reader is encouraged to look at the survey by Mahoney for more details on sampling-based approaches =-=[85]-=-. See also [78] and [31] for state of the art subspace embeddings based on this approach. Returning to Definition 2, the first usage of this in the numerical linear algebra community, to the best of o...

Dimensionality reduction for k-means clustering and low rank approximation

by Michael B. Cohen, Sam Elder, Cameron Musco, Christopher Musco, Madalina Persu , 2014
"... ar ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
Abstract not found

Sketched SVD: Recovering spectral features from compressive measurements. ArXiv eprints

by Anna C. Gilbert, Jae Young Park, Michael B. Wakin , 2012
"... ar ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...teration. 3.4 Randomized Algorithms for Linear Algebra In a similar vein, there have been a large number of results on what we will refer to as randomized algorithms for linear algebra. The monograph =-=[25]-=- covers a number of these methods and references. There are several lines of work that are closely related to our results. The first involves the spectral analysis of random matrices and the applicati...

Uniform sampling for matrix approximation

by Michael B. Cohen, Yin Tat Lee, Cameron Musco, Christopher Musco, Richard Peng, Aaron Sidford - In Proceedings of the 6th Annual Conference on Innovations in Theoretical Computer Science (ITCS , 2015
"... ar ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
Abstract not found

Efficient Algorithms and Error Analysis for the Modified Nyström Method

by Shusen Wang, Zhihua Zhang
"... Many kernel methods suffer from high time and space complexities and are thus prohibitive in big-data applications. To tackle the computation-al challenge, the Nyström method has been ex-tensively used to reduce time and space complex-ities by sacrificing some accuracy. The Nyström method speedups ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Many kernel methods suffer from high time and space complexities and are thus prohibitive in big-data applications. To tackle the computation-al challenge, the Nyström method has been ex-tensively used to reduce time and space complex-ities by sacrificing some accuracy. The Nyström method speedups computation by constructing an approximation of the kernel matrix using only a few columns of the matrix. Recently, a variant of the Nyström method called the modified Nyström method has demonstrated significant improvement over the standard Nyström method in approximation accuracy, both theoretically and empirically. In this paper, we propose two algorithms that make the modified Nyström method practical. First, we devise a simple column selection algorithm with a provable error bound. Our algorithm is more efficient and easier to implement than and nearly as accurate as the state-of-the-art algorithm. Second, with the selected columns at hand, we propose an algorithm that computes the approximation in lower time complexity than the approach in the previous work. Furthermore, we prove that the modified Nyström method is exact under certain conditions, and we establish a lower error bound for the modified Nyström method. 1
(Show Context)

Citation Context

...r et al. (2005); Kumar et al. (2012); Jin et al. (2012), etc. Very recently, Gittens and Mahoney (2013) established the first relative-error bound which is more interesting than additive-error bound (=-=Mahoney, 2011-=-). 996 Efficient Algorithms and Error Analysis for the Modified Nyström Method However, the approximation quality cannot be arbitrarily improved by devising a very good sampling technique. As shown t...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University