Results 1 - 10
of
218
The Convex Geometry of Linear Inverse Problems
, 2010
"... In applications throughout science and engineering one is often faced with the challenge of solving an ill-posed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constr ..."
Abstract
-
Cited by 189 (20 self)
- Add to MetaCart
In applications throughout science and engineering one is often faced with the challenge of solving an ill-posed inverse problem, where the number of available measurements is smaller than the dimension of the model to be estimated. However in many practical situations of interest, models are constrained structurally so that they only have a few degrees of freedom relative to their ambient dimension. This paper provides a general framework to convert notions of simplicity into convex penalty functions, resulting in convex optimization solutions to linear, underdetermined inverse problems. The class of simple models considered are those formed as the sum of a few atoms from some (possibly infinite) elementary atomic set; examples include well-studied cases such as sparse vectors (e.g., signal processing, statistics) and low-rank matrices (e.g., control, statistics), as well as several others including sums of a few permutations matrices (e.g., ranked elections, multiobject tracking), low-rank tensors (e.g., computer vision, neuroscience), orthogonal matrices (e.g., machine learning), and atomic measures (e.g., system identification). The convex programming formulation is based on minimizing the norm induced by the convex hull of the atomic set; this norm is referred to as the atomic norm. The facial
Structured variable selection with sparsity-inducing norms
, 2011
"... We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1-norm and the group ℓ1-norm by allowing the subsets to ov ..."
Abstract
-
Cited by 187 (27 self)
- Add to MetaCart
(Show Context)
We consider the empirical risk minimization problem for linear supervised learning, with regularization by structured sparsity-inducing norms. These are defined as sums of Euclidean norms on certain subsets of variables, extending the usual ℓ1-norm and the group ℓ1-norm by allowing the subsets to overlap. This leads to a specific set of allowed nonzero patterns for the solutions of such problems. We first explore the relationship between the groups defining the norm and the resulting nonzero patterns, providing both forward and backward algorithms to go back and forth from groups to patterns. This allows the design of norms adapted to specific prior knowledge expressed in terms of nonzero patterns. We also present an efficient active set algorithm, and analyze the consistency of variable selection for least-squares linear regression in low and high-dimensional settings.
Nuclear norm penalization and optimal rates for noisy low rank matrix completion.
- Annals of Statistics,
, 2011
"... AbstractThis paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1 × m2 matrix A0 corrupted by noise are observed. We propose a new nuclear norm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for ..."
Abstract
-
Cited by 107 (7 self)
- Add to MetaCart
(Show Context)
AbstractThis paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1 × m2 matrix A0 corrupted by noise are observed. We propose a new nuclear norm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting m1m2 n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A0, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A0 with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A0 and the aim is to find the best trace regression model approximating the data.
Adaptive forward-backward greedy algorithm for learning sparse representations
- IEEE Trans. Inform. Theory
, 2011
"... Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that ..."
Abstract
-
Cited by 101 (9 self)
- Add to MetaCart
(Show Context)
Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
Minimax rates of estimation for high-dimensional linear regression over -balls
, 2009
"... Abstract—Consider the high-dimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in either-loss and-prediction loss, assuming tha ..."
Abstract
-
Cited by 97 (19 self)
- Add to MetaCart
(Show Context)
Abstract—Consider the high-dimensional linear regression model,where is an observation vector, is a design matrix with, is an unknown regression vector, and is additive Gaussian noise. This paper studies the minimax rates of convergence for estimating in either-loss and-prediction loss, assuming that belongs to an-ball for some.Itisshown that under suitable regularity conditions on the design matrix, the minimax optimal rate in-loss and-prediction loss scales as. The analysis in this paper reveals that conditions on the design matrix enter into the rates for-error and-prediction error in complementary ways in the upper and lower bounds. Our proofs of the lower bounds are information theoretic in nature, based on Fano’s inequality and results on the metric entropy of the balls, whereas our proofs of the upper bounds are constructive, involving direct analysis of least squares over-balls. For the special case, corresponding to models with an exact sparsity constraint, our results show that although computationally efficient-based methods can achieve the minimax rates up to constant factors, they require slightly stronger assumptions on the design matrix than optimal algorithms involving least-squares over the-ball. Index Terms—Compressed sensing, minimax techniques, regression analysis. I.
Estimation of (near) low-rank matrices with noise and high-dimensional scaling
"... We study an instance of high-dimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider an M-e ..."
Abstract
-
Cited by 95 (14 self)
- Add to MetaCart
We study an instance of high-dimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” low-rank, meaning that it can be well-approximated by a matrix with low rank. We consider an M-estimator based on regularization by the traceornuclearnormovermatrices, andanalyze its performance under high-dimensional scaling. We provide non-asymptotic bounds on the Frobenius norm error that hold for a generalclassofnoisyobservationmodels,and apply to both exactly low-rank and approximately low-rank matrices. We then illustrate their consequences for a number of specific learning models, including low-rank multivariate or multi-task regression, system identification in vector autoregressive processes, and recovery of low-rank matrices from random projections. Simulations show excellent agreement with the high-dimensional scaling of the error predicted by our theory. 1.
Stable principal component pursuit
- In Proc. of International Symposium on Information Theory
, 2010
"... We consider the problem of recovering a target matrix that is a superposition of low-rank and sparse components, from a small set of linear measurements. This problem arises in compressed sensing of structured high-dimensional signals such as videos and hyperspectral images, as well as in the analys ..."
Abstract
-
Cited by 94 (3 self)
- Add to MetaCart
(Show Context)
We consider the problem of recovering a target matrix that is a superposition of low-rank and sparse components, from a small set of linear measurements. This problem arises in compressed sensing of structured high-dimensional signals such as videos and hyperspectral images, as well as in the analysis of transformation invariant low-rank structure recovery. We analyze the performance of the natural convex heuristic for solving this problem, under the assumption that measurements are chosen uniformly at random. We prove that this heuristic exactly recovers low-rank and sparse terms, provided the number of observations exceeds the number of intrinsic degrees of freedom of the component signals by a polylogarithmic factor. Our analysis introduces several ideas that may be of independent interest for the more general problem of compressed sensing and decomposing superpositions of multiple structured signals. 1
Restricted strong convexity and weighted matrix completion: Optimal bounds with noise
, 2012
"... We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong ..."
Abstract
-
Cited by 84 (10 self)
- Add to MetaCart
We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, it satisfies a form of restricted strong convexity with respect to weighted Frobenius norm. Using this property, we obtain as corollaries a number of error bounds on matrix completion in the weighted Frobenius norm under noisy sampling and for both exact and near low-rank matrices. Our results are based on measures of the “spikiness” and “low-rankness” of matrices that are less restrictive than the incoherence conditions imposed in previous work. Our technique involves an M-estimator that includes controls on both the rank and spikiness of the solution, and we establish non-asymptotic error bounds in weighted Frobenius norm for recovering matrices lying with ℓq-“balls ” of bounded spikiness. Using information-theoretic methods, we show that no algorithm can achieve better estimates (up to a logarithmic factor) over these same sets, showing that our conditions on matrices and associated rates are essentially optimal.