Results 1 - 10
of
39
A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers
"... ..."
Self-concordant analysis for logistic regression
"... Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensio ..."
Abstract
-
Cited by 46 (14 self)
- Add to MetaCart
(Show Context)
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2-norm and regularization by the ℓ1-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression. 1
Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. Preprint. Available at http://arxiv.org/abs/1202.1212
"... Abstract. This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. Wedemonstrate thatasingle convexprogram gives anaccurate estimate of the signal, or coefficient vector, for both of these models. We show that an s-sparse signal in R n can be a ..."
Abstract
-
Cited by 44 (4 self)
- Add to MetaCart
(Show Context)
Abstract. This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. Wedemonstrate thatasingle convexprogram gives anaccurate estimate of the signal, or coefficient vector, for both of these models. We show that an s-sparse signal in R n can be accurately estimated from m = O(slog(n/s)) single-bit measurements using a simple convex program. This remains true even if each measurement bit is flipped with probability nearly 1/2. Worst-case (adversarial) noise can also be accounted for, and uniform results that hold for all sparse inputs are derived as well. In the terminology of sparse logistic regression, we show that O(slog(2n/s)) Bernoulli trials are sufficient to estimate a coefficient vector in R n which is approximately s-sparse. Moreover, the same convex program works for virtually all generalized linear models, in which the link function may be unknown. To our knowledge, these are the first results that tie together the theory of sparse logistic regression to 1-bit compressed sensing. Our results apply to general signal structures aside from sparsity; one only needs to know the size of the set K where signals reside. The size is given by the mean width of K, a computable quantity whose square serves as a robust extension of the dimension. 1.
Recovering sparse signals with a certain family of non-convex penalties and DC programming
- IEEE TRANSACTIONS ON SIGNAL PROCESSING
"... This paper considers the problem of recovering a sparse signal representation according to a signal dictionary. This problem could be formalized as a penalized least-squares problem in which sparsity is usually induced by a ℓ1-norm penalty on the coefficients. Such an approach known as the Lasso or ..."
Abstract
-
Cited by 42 (7 self)
- Add to MetaCart
This paper considers the problem of recovering a sparse signal representation according to a signal dictionary. This problem could be formalized as a penalized least-squares problem in which sparsity is usually induced by a ℓ1-norm penalty on the coefficients. Such an approach known as the Lasso or Basis Pursuit Denoising has been shown to perform reasonably well in some situations. However, it was also proved that non-convex penalties like the pseudo ℓq-norm with q < 1 or SCAD penalty are able to recover sparsity in a more efficient way than the Lasso. Several algorithms have been proposed for solving the resulting non-convex least-squares problem. This paper proposes a generic algorithm to address such a sparsity recovery problem for some class of non-convex penalties. Our main contribution is that the proposed methodology is based on an iterative algorithm which solves at each iteration a convex weighted Lasso problem. It relies on the family of non-convex penalties which can be decomposed as a difference of convex functions. This allows us to apply difference of convex functions programming which is a generic and principled way for solving non-smooth and non-convex optimization problem. We also show that several algorithms in the literature dealing with non-convex penalties are particular instances of our algorithm. Experimental results demonstrate the effectiveness of the proposed generic framework compared to existing algorithms, including iterative reweighted least-squares methods.
Greedy sparsity-constrained optimization
- in Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, IEEE, 2011
"... Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications d ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications discrepancy is measured in more general forms such as loglikelihood. Regularization by ℓ1-norm has been shown to induce sparse solutions, but their sparsity level can be merely suboptimal. In this paper we present a greedy algorithm, dubbed Gradient Support Pursuit (GraSP), for sparsity-constrained optimization. Quantifiable guarantees are provided for GraSP when cost functions have the “Stable Hessian Property”. I.
Sparsistent learning of varying-coefficient models with structural changes.
- Advances in Neural Information Processing Systems (NIPS),
, 2009
"... Abstract To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we inve ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
(Show Context)
Abstract To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model -piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models.
On Time Varying Undirected Graphs
"... The time-varying multivariate Gaussian distribution and the undirected graph associated with it, as introduced in Zhou et al. (2008), provide a useful statistical framework for modeling complex dynamic networks. In many application domains, it is of high importance to estimate the graph structure of ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The time-varying multivariate Gaussian distribution and the undirected graph associated with it, as introduced in Zhou et al. (2008), provide a useful statistical framework for modeling complex dynamic networks. In many application domains, it is of high importance to estimate the graph structure of the model consistently for the purpose of scientific discovery. In this paper, we show that under suitable technical conditions, the structure of the undirected graphical model can be consistently estimated in the high dimensional setting, when the dimensionality of the model is allowed to diverge with the sample size. The model selection consistency is shown for the procedure proposed in Zhou et al. (2008) and for the modified neighborhood selection procedure of Meinshausen and Bühlmann (2006). 1
DIMENSION REDUCTION AND VARIABLE SELECTION IN CASE CONTROL STUDIES VIA REGULARIZED LIKELIHOOD OPTIMIZATION
"... Abstract. Dimension reduction and variable selection are performed routinely in case-control studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via ℓ1 penalized likelihood optimizat ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. Dimension reduction and variable selection are performed routinely in case-control studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via ℓ1 penalized likelihood optimization. We show that the optimizers of the ℓ1 penalized retrospective likelihood coincide with the optimizers of the ℓ1 penalized prospective likelihood. This extends the results of Prentice and Pyke (1979), obtained for non-regularized likelihoods. We establish both the sup-norm consistency of the odds ratio, after model selection, and the consistency of subset selection of our estimators. The novelty of our theoretical results consists in the study of these properties under the case-control sampling scheme. Our results hold for selection performed over a large collection of candidate variables, with cardinality allowed to depend and be greater than the sample size. We complement our theoretical results with a novel approach of determining data driven tuning parameters, based on the bisection method. The resulting procedure offers significant computational savings when compared with grid search based methods. All our numerical experiments support strongly our theoretical findings. 1.
On model selection consistency of M-estimators with geometrically decomposable penalties
- Advances in Neural Information Processing Systems
, 2013
"... Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penal-ties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irre ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penal-ties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geomet-rically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1