Results 1 - 10
of
86
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract
-
Cited by 253 (6 self)
- Add to MetaCart
(Show Context)
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract
-
Cited by 74 (13 self)
- Add to MetaCart
(Show Context)
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel two-stage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciously-chosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic column-selection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rank-k approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
Fast Approximation of Matrix Coherence and Statistical Leverage
"... The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix ..."
Abstract
-
Cited by 53 (11 self)
- Add to MetaCart
(Show Context)
The statistical leverage scores of a matrix A are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix completion and Nyström-based low-rank matrix approximation as well as in large-scale statistical data analysis applications more generally; moreover, they are of interest since they define the key structural nonuniformity that must be dealt with in developing fast randomized matrix algorithms. Our main result is a randomized algorithm that takes as input an arbitrary n×d matrix A, with n ≫ d, and that returns as output relative-error approximations to all n of the statistical leverage scores. The proposed algorithm runs (under assumptions on the precise values of n and d) in O(nd logn) time, as opposed to the O(nd 2) time required by the naïve algorithm that involves computing an orthogonal basis for the range of A. Our analysis may be viewed in terms of computing a relative-error approximation to an underconstrained least-squares approximation problem, or, relatedly, it may be viewed as an application of Johnson-Lindenstrauss type ideas. Several practically-important extensions of our basic result are also described, including the approximation of so-called cross-leverage scores, the extension of these ideas to matrices with n≈d, and the extension to streaming environments.
Faster least squares approximation
- Numerische Mathematik
"... Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in O(nd 2) time, where n is the number of constraints and d is the number of variables. We present two rand ..."
Abstract
-
Cited by 53 (13 self)
- Add to MetaCart
(Show Context)
Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. Methods dating back to Gauss and Legendre find a solution in O(nd 2) time, where n is the number of constraints and d is the number of variables. We present two randomized algorithms that provide very accurate relative-error approximations to the solution of a least squares approximation problem more rapidly than existing exact algorithms. Both of our algorithms preprocess the data with a randomized Hadamard transform. One then uniformly randomly samples constraints and solves the smaller problem on those constraints, and the other performs a sparse random projection and solves the smaller problem on those projected coordinates. In both cases, the solution to the smaller problem provides a relative-error approximation to the exact solution and can be computed in o(nd 2) time. 1
BLENDENPIK: SUPERCHARGING LAPACK'S LEAST-SQUARES SOLVER
"... Several innovative random-sampling and random-mixing techniques for solving problems in linear algebra have been proposed in the last decade, but they have not yet made a significant impact on numerical linear algebra. We show that by using an high quality implementation of one of these techniques w ..."
Abstract
-
Cited by 41 (4 self)
- Add to MetaCart
(Show Context)
Several innovative random-sampling and random-mixing techniques for solving problems in linear algebra have been proposed in the last decade, but they have not yet made a significant impact on numerical linear algebra. We show that by using an high quality implementation of one of these techniques we obtain a solver that performs extremely well in the traditional yardsticks of numerical linear algebra: it is significantly faster than high-performance implementations of existing state-of-the-art algorithms, and it is numerically backward stable. More speci cally, we describe a least-square solver for dense highly overdetermined systems that achieves residuals similar to those of direct QR factorization based solvers (lapack), outperforms lapack by large factors, and scales significantly better than any QR-based solver.
Revisiting the Nyström method for improved large-scale machine learning
"... We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and pro ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
(Show Context)
We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds—e.g., improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error. 1.
Colibri: Fast Mining of Large Static and Dynamic Graphs
"... Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have ..."
Abstract
-
Cited by 30 (7 self)
- Add to MetaCart
Low-rank approximations of the adjacency matrix of a graph are essential in finding patterns (such as communities) and detecting anomalies. Additionally, it is desirable to track the low-rank structure as the graph evolves over time, efficiently and within limited storage. Real graphs typically have thousands or millions of nodes, but are usually very sparse. However, standard decompositions such as SVD do not preserve sparsity. This has led to the development of methods such as CUR and CMD, which seek a nonorthogonal basis by sampling the columns and/or rows of the sparse matrix. However, these approaches will typically produce overcomplete bases, which wastes both space and time. In this paper we propose the family of Colibri methods to deal with these challenges. Our version for static graphs, Colibri-S, iteratively finds a nonredundant basis and we prove that it has no loss of accuracy compared to the best competitors (CUR and CMD), while achieving significant savings in space and time: on real data, Colibri-S requires much less space and is orders of magnitude faster (in proportion to the square of the number of non-redundant columns). Additionally, we propose an efficient update algorithm for dynamic, time-evolving graphs, Colibri-D. Our evaluation on a large, real network traffic dataset shows that Colibri-D is over 100 times faster than the best published competitor (CMD).
Sampling Methods for the Nyström Method
- JOURNAL OF MACHINE LEARNING RESEARCH
"... The Nyström method is an efficient technique to generate low-rank matrix approximations and is used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a v ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
The Nyström method is an efficient technique to generate low-rank matrix approximations and is used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nyström method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nyström method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nyström method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nyström method in comparison to the standard Nyström method.
Unsupervised Feature Selection for Principal Components Analysis [Extended Abstract]
"... Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
(Show Context)
Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the so-called Column Subset Selection Problem (CSSP). In words, the CSSP seeks the“best”subset of exactly k columns from an m×n data matrix A, and has been extensively studied in the Numerical Linear Algebra community. We present a novel two-stage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm significantly improves upon the best previously-existing results [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selection strategy in three application domains of modern statistical data analysis: finance, document-term data, and genetics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an object-feature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k “eigenfeatures.”