• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Sampling from large matrices: An approach through geometric functional analysis (0)

by M Rudelson, R Vershynin
Venue:J. Assoc. Comput. Mach
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 132
Next 10 →

Exact Matrix Completion via Convex Optimization

by Emmanuel J. Candès, Benjamin Recht , 2008
"... We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfe ..."
Abstract - Cited by 873 (26 self) - Add to MetaCart
We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys m ≥ C n 1.2 r log n for some positive numerical constant C, then with very high probability, most n × n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.

Introduction to the non-asymptotic analysis of random matrices

by Roman Vershynin , 2010
"... ..."
Abstract - Cited by 361 (21 self) - Add to MetaCart
Abstract not found

USER-FRIENDLY TAIL BOUNDS FOR SUMS OF RANDOM MATRICES

by Joel A. Tropp , 2011
"... ..."
Abstract - Cited by 254 (5 self) - Add to MetaCart
Abstract not found

FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS

by N. Halko, P. G. Martinsson, J. A. Tropp
"... Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract - Cited by 253 (6 self) - Add to MetaCart
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
(Show Context)

Citation Context

... the algorithm of [46] requires only a constant number of passes over the data. Rudelson and Vershynin later showed that the same type of column sampling method also yields spectral-norm error bounds =-=[116]-=-. The techniques in their paper have been very influential; their work has found other applications in randomized regression [52], sparse approximation [133], and compressive sampling [19]. Deshpande ...

Improved approximation algorithms for large matrices via random projections.

by T Sarlos - In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science , 2006
"... ..."
Abstract - Cited by 168 (4 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...oof of inequalities (2) and (3) works unchanged for any matrix S such that |1 − σ 2 i (SU)| = o(1) and UST Sw ≈ U T w. Thus combining the above with Rudelson’s and Vershynin’s proof of Theorem 1.1 in =-=[49]-=- for bounding the singular values and Lemma 8 in appendix A.2 of [23] for bounding the norm of the approximate matrix product we have the following claim for sampling ℓ2 regression. Claim 13 Let r > 0...

Graph sparsification by effective resistances

by Daniel A. Spielman, Nikhil Srivastava - SIAM J. Comput
"... We present a nearly-linear time algorithm that produces high-quality sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ǫ> 0, we produce a weighted subgraph H = (V, ˜ E, ˜w) of G such that | ˜ E | = O(n log n/ǫ 2) and for all vectors x ∈ R V (1 − ǫ) ∑ ..."
Abstract - Cited by 143 (9 self) - Add to MetaCart
We present a nearly-linear time algorithm that produces high-quality sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ǫ> 0, we produce a weighted subgraph H = (V, ˜ E, ˜w) of G such that | ˜ E | = O(n log n/ǫ 2) and for all vectors x ∈ R V (1 − ǫ) ∑ (x(u) − x(v)) 2 wuv ≤ ∑ (x(u) − x(v)) 2 ˜wuv ≤ (1 + ǫ) ∑ (x(u) − x(v)) 2 wuv. (1) uv∈E uv ∈ ˜ E This improves upon the sparsifiers constructed by Spielman and Teng, which had O(n log c n) edges for some large constant c, and upon those of Benczúr and Karger, which only satisfied (1) for x ∈ {0, 1} V. We conjecture the existence of sparsifiers with O(n) edges, noting that these would generalize the notion of expander graphs, which are constant-degree sparsifiers for the complete graph. A key ingredient in our algorithm is a subroutine of independent interest: a nearly-linear time algorithm that builds a data structure from which we can query the approximate effective resistance between any two vertices in a graph in O(log n) time. uv∈E
(Show Context)

Citation Context

...e conclude with the conjecture that sparsifiers with O(n) edges exist. 1.2 Prior Work In addition to the graph sparsifiers of [4] and [23], there is a large body of work on sparse [3, 2] and low-rank =-=[13, 2, 22, 10, 9]-=- approximations for general matrices. The algorithms in this literature provide guarantees of the form ‖A − Ã‖2 ≤ ǫ, where A is the original matrix and à is obtained by entrywise or columnwise samplin...

On the conditioning of random subdictionaries

by Joel A. Tropp - Appl. Comput. Harmonic Anal
"... Abstract. An important problem in the theory of sparse approximation is to identify wellconditioned subsets of vectors from a general dictionary. In most cases, current results do not apply unless the number of vectors is smaller than the square root of the ambient dimension, so these bounds are too ..."
Abstract - Cited by 96 (8 self) - Add to MetaCart
Abstract. An important problem in the theory of sparse approximation is to identify wellconditioned subsets of vectors from a general dictionary. In most cases, current results do not apply unless the number of vectors is smaller than the square root of the ambient dimension, so these bounds are too weak for many applications. This paper shatters the square-root bottleneck by focusing on random subdictionaries instead of arbitrary subdictionaries. It provides explicit bounds on the extreme singular values of random subdictionaries that hold with overwhelming probability. The results are phrased in terms of the coherence and spectral norm of the dictionary, which capture information about its global geometry. The proofs rely on standard tools from the area of Banach space probability. As an application, the paper shows that the conditioning of a subdictionary is the major obstacle to the uniqueness of sparse representations and the success of ℓ1 minimization techniques for signal recovery. Indeed, if a fixed subdictionary is well conditioned and its cardinality is slightly smaller than the ambient dimension, then a random signal formed from this subdictionary almost surely has no other representation that is equally sparse. Moreover, with overwhelming probability, the maximally sparse representation can be identified via ℓ1 minimization. Note that the results in this paper are not directly comparable with recent work on subdictionaries of random dictionaries. 1.
(Show Context)

Citation Context

...us or irrelevant. In this work, the term “restriction” always refers to a coordinate restriction. 4.2. Key Technical Results. Our most important theorem is adapted from work of Rudelson and Vershynin =-=[RV07]-=-, who build essentially on earlier work of Rudelson [Rud99]. This theorem gives information about the spectral norm of a matrix that has been restricted to a random collection of columns. Theorem 8 (S...

Optimal rates of convergence for covariance matrix estimation

by T. Tony Cai, Cun-hui Zhang, Harrison, H. Zhou - Ann. Statist , 2010
"... Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates ..."
Abstract - Cited by 88 (19 self) - Add to MetaCart
Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems. 1. Introduction. Suppose

RELATIVE-ERROR CUR MATRIX DECOMPOSITIONS

by Petros Drineas, Michael W. Mahoney, S. Muthukrishnan - SIAM J. MATRIX ANAL. APPL , 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract - Cited by 86 (17 self) - Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rank-k approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a low-degree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants

Multi-View Clustering via Canonical Correlation Analysis

by Kamalika Chaudhuri, Sham M. Kakade
"... Clustering data in high-dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, bef ..."
Abstract - Cited by 76 (6 self) - Add to MetaCart
Clustering data in high-dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Such techniques typically require stringent requirements on the separation between the cluster means (in order for the algorithm to be be successful). Here, we show how using multiple views of the data can relax these stringent requirements. We use Canonical Correlation Analysis (CCA) to project the data in each view to a lower-dimensional subspace. Under the assumption that conditioned on the cluster label the views are uncorrelated, we show that the separation conditions required for the algorithm to be successful are rather mild (significantly weaker than those of prior results in the literature). We provide results for mixture of The multi-view approach to learning is one in which we have ‘views ’ of the data (sometimes in a rather abstract sense) and, if we understand the underlying relationship between these views, the hope is that this relationship can be used to alleviate the difficulty of a learning problem of interest [BM98, KF07, AZ07]. In this work, we explore how having ‘two views ’ of the data makes
(Show Context)

Citation Context

... Vempala, 2008) — the implied separation in their work is rather large and grows with decreasing wmin, the minimum mixing weight. To get our improved sample complexity bounds, we use a result due to (=-=Rudelson & Vershynin, 2007-=-) which may be of independent interest. We stress that our improved results are really due to the multi-view condition. Had we simply combined the data from both views, and applied previous algorithms...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University