Results 1 - 10
of
132
Exact Matrix Completion via Convex Optimization
, 2008
"... We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfe ..."
Abstract
-
Cited by 873 (26 self)
- Add to MetaCart
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n 1.2 r log n for some positive numerical constant C, then with very high probability, most n × n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract
-
Cited by 253 (6 self)
- Add to MetaCart
(Show Context)
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
Improved approximation algorithms for large matrices via random projections.
- In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
, 2006
"... ..."
(Show Context)
Graph sparsification by effective resistances
- SIAM J. Comput
"... We present a nearly-linear time algorithm that produces high-quality sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ǫ> 0, we produce a weighted subgraph H = (V, ˜ E, ˜w) of G such that | ˜ E | = O(n log n/ǫ 2) and for all vectors x ∈ R V (1 − ǫ) ∑ ..."
Abstract
-
Cited by 143 (9 self)
- Add to MetaCart
(Show Context)
We present a nearly-linear time algorithm that produces high-quality sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ǫ> 0, we produce a weighted subgraph H = (V, ˜ E, ˜w) of G such that | ˜ E | = O(n log n/ǫ 2) and for all vectors x ∈ R V (1 − ǫ) ∑ (x(u) − x(v)) 2 wuv ≤ ∑ (x(u) − x(v)) 2 ˜wuv ≤ (1 + ǫ) ∑ (x(u) − x(v)) 2 wuv. (1) uv∈E uv ∈ ˜ E This improves upon the sparsifiers constructed by Spielman and Teng, which had O(n log c n) edges for some large constant c, and upon those of Benczúr and Karger, which only satisfied (1) for x ∈ {0, 1} V. We conjecture the existence of sparsifiers with O(n) edges, noting that these would generalize the notion of expander graphs, which are constant-degree sparsifiers for the complete graph. A key ingredient in our algorithm is a subroutine of independent interest: a nearly-linear time algorithm that builds a data structure from which we can query the approximate effective resistance between any two vertices in a graph in O(log n) time. uv∈E
On the conditioning of random subdictionaries
- Appl. Comput. Harmonic Anal
"... Abstract. An important problem in the theory of sparse approximation is to identify wellconditioned subsets of vectors from a general dictionary. In most cases, current results do not apply unless the number of vectors is smaller than the square root of the ambient dimension, so these bounds are too ..."
Abstract
-
Cited by 96 (8 self)
- Add to MetaCart
(Show Context)
Abstract. An important problem in the theory of sparse approximation is to identify wellconditioned subsets of vectors from a general dictionary. In most cases, current results do not apply unless the number of vectors is smaller than the square root of the ambient dimension, so these bounds are too weak for many applications. This paper shatters the square-root bottleneck by focusing on random subdictionaries instead of arbitrary subdictionaries. It provides explicit bounds on the extreme singular values of random subdictionaries that hold with overwhelming probability. The results are phrased in terms of the coherence and spectral norm of the dictionary, which capture information about its global geometry. The proofs rely on standard tools from the area of Banach space probability. As an application, the paper shows that the conditioning of a subdictionary is the major obstacle to the uniqueness of sparse representations and the success of ℓ1 minimization techniques for signal recovery. Indeed, if a fixed subdictionary is well conditioned and its cardinality is slightly smaller than the ambient dimension, then a random signal formed from this subdictionary almost surely has no other representation that is equally sparse. Moreover, with overwhelming probability, the maximally sparse representation can be identified via ℓ1 minimization. Note that the results in this paper are not directly comparable with recent work on subdictionaries of random dictionaries. 1.
Optimal rates of convergence for covariance matrix estimation
- Ann. Statist
, 2010
"... Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates ..."
Abstract
-
Cited by 88 (19 self)
- Add to MetaCart
Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems. 1. Introduction. Suppose
RELATIVE-ERROR CUR MATRIX DECOMPOSITIONS
- SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract
-
Cited by 86 (17 self)
- Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rank-k approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants
Multi-View Clustering via Canonical Correlation Analysis
"... Clustering data in high-dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, bef ..."
Abstract
-
Cited by 76 (6 self)
- Add to MetaCart
(Show Context)
Clustering data in high-dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means (in order for the algorithm to be be successful). Here, we show how using multiple views of the data can relax these stringent requirements. We use Canonical Correlation Analysis (CCA) to project the data in each view to a lower-dimensional subspace. Under the assumption that conditioned on the cluster label the views are uncorrelated, we show that the separation conditions required for the algorithm to be successful are rather mild (significantly weaker than those of prior results in the literature). We provide results for mixture of The multi-view approach to learning is one in which we have ‘views ’ of the data (sometimes in a rather abstract sense) and, if we understand the underlying relationship between these views, the hope is that this relationship can be used to alleviate the difficulty of a learning problem of interest [BM98, KF07, AZ07]. In this work, we explore how having ‘two views ’ of the data makes