Results 1  10
of
39
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Selfconcordant analysis for logistic regression
"... Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensio ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
(Show Context)
Most of the nonasymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closedform expressions. In this paper, we use and extend tools from the convex optimization literature, namely selfconcordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2norm and regularization by the ℓ1norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for leastsquares regression. 1
Robust 1bit compressed sensing and sparse logistic regression: A convex programming approach. Preprint. Available at http://arxiv.org/abs/1202.1212
"... Abstract. This paper develops theoretical results regarding noisy 1bit compressed sensing and sparse binomial regression. Wedemonstrate thatasingle convexprogram gives anaccurate estimate of the signal, or coefficient vector, for both of these models. We show that an ssparse signal in R n can be a ..."
Abstract

Cited by 44 (4 self)
 Add to MetaCart
(Show Context)
Abstract. This paper develops theoretical results regarding noisy 1bit compressed sensing and sparse binomial regression. Wedemonstrate thatasingle convexprogram gives anaccurate estimate of the signal, or coefficient vector, for both of these models. We show that an ssparse signal in R n can be accurately estimated from m = O(slog(n/s)) singlebit measurements using a simple convex program. This remains true even if each measurement bit is flipped with probability nearly 1/2. Worstcase (adversarial) noise can also be accounted for, and uniform results that hold for all sparse inputs are derived as well. In the terminology of sparse logistic regression, we show that O(slog(2n/s)) Bernoulli trials are sufficient to estimate a coefficient vector in R n which is approximately ssparse. Moreover, the same convex program works for virtually all generalized linear models, in which the link function may be unknown. To our knowledge, these are the first results that tie together the theory of sparse logistic regression to 1bit compressed sensing. Our results apply to general signal structures aside from sparsity; one only needs to know the size of the set K where signals reside. The size is given by the mean width of K, a computable quantity whose square serves as a robust extension of the dimension. 1.
Recovering sparse signals with a certain family of nonconvex penalties and DC programming
 IEEE TRANSACTIONS ON SIGNAL PROCESSING
"... This paper considers the problem of recovering a sparse signal representation according to a signal dictionary. This problem could be formalized as a penalized leastsquares problem in which sparsity is usually induced by a ℓ1norm penalty on the coefficients. Such an approach known as the Lasso or ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
This paper considers the problem of recovering a sparse signal representation according to a signal dictionary. This problem could be formalized as a penalized leastsquares problem in which sparsity is usually induced by a ℓ1norm penalty on the coefficients. Such an approach known as the Lasso or Basis Pursuit Denoising has been shown to perform reasonably well in some situations. However, it was also proved that nonconvex penalties like the pseudo ℓqnorm with q < 1 or SCAD penalty are able to recover sparsity in a more efficient way than the Lasso. Several algorithms have been proposed for solving the resulting nonconvex leastsquares problem. This paper proposes a generic algorithm to address such a sparsity recovery problem for some class of nonconvex penalties. Our main contribution is that the proposed methodology is based on an iterative algorithm which solves at each iteration a convex weighted Lasso problem. It relies on the family of nonconvex penalties which can be decomposed as a difference of convex functions. This allows us to apply difference of convex functions programming which is a generic and principled way for solving nonsmooth and nonconvex optimization problem. We also show that several algorithms in the literature dealing with nonconvex penalties are particular instances of our algorithm. Experimental results demonstrate the effectiveness of the proposed generic framework compared to existing algorithms, including iterative reweighted leastsquares methods.
Greedy sparsityconstrained optimization
 in Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, IEEE, 2011
"... Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications d ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications discrepancy is measured in more general forms such as loglikelihood. Regularization by ℓ1norm has been shown to induce sparse solutions, but their sparsity level can be merely suboptimal. In this paper we present a greedy algorithm, dubbed Gradient Support Pursuit (GraSP), for sparsityconstrained optimization. Quantifiable guarantees are provided for GraSP when cost functions have the “Stable Hessian Property”. I.
Sparsistent learning of varyingcoefficient models with structural changes.
 Advances in Neural Information Processing Systems (NIPS),
, 2009
"... Abstract To estimate the changing structure of a varyingcoefficient varyingstructure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we inve ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
(Show Context)
Abstract To estimate the changing structure of a varyingcoefficient varyingstructure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a subfamily of this model piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a twostage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of timevarying probabilistic graphical models.
On Time Varying Undirected Graphs
"... The timevarying multivariate Gaussian distribution and the undirected graph associated with it, as introduced in Zhou et al. (2008), provide a useful statistical framework for modeling complex dynamic networks. In many application domains, it is of high importance to estimate the graph structure of ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The timevarying multivariate Gaussian distribution and the undirected graph associated with it, as introduced in Zhou et al. (2008), provide a useful statistical framework for modeling complex dynamic networks. In many application domains, it is of high importance to estimate the graph structure of the model consistently for the purpose of scientific discovery. In this paper, we show that under suitable technical conditions, the structure of the undirected graphical model can be consistently estimated in the high dimensional setting, when the dimensionality of the model is allowed to diverge with the sample size. The model selection consistency is shown for the procedure proposed in Zhou et al. (2008) and for the modified neighborhood selection procedure of Meinshausen and Bühlmann (2006). 1
DIMENSION REDUCTION AND VARIABLE SELECTION IN CASE CONTROL STUDIES VIA REGULARIZED LIKELIHOOD OPTIMIZATION
"... Abstract. Dimension reduction and variable selection are performed routinely in casecontrol studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via ℓ1 penalized likelihood optimizat ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. Dimension reduction and variable selection are performed routinely in casecontrol studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via ℓ1 penalized likelihood optimization. We show that the optimizers of the ℓ1 penalized retrospective likelihood coincide with the optimizers of the ℓ1 penalized prospective likelihood. This extends the results of Prentice and Pyke (1979), obtained for nonregularized likelihoods. We establish both the supnorm consistency of the odds ratio, after model selection, and the consistency of subset selection of our estimators. The novelty of our theoretical results consists in the study of these properties under the casecontrol sampling scheme. Our results hold for selection performed over a large collection of candidate variables, with cardinality allowed to depend and be greater than the sample size. We complement our theoretical results with a novel approach of determining data driven tuning parameters, based on the bisection method. The resulting procedure offers significant computational savings when compared with grid search based methods. All our numerical experiments support strongly our theoretical findings. 1.
On model selection consistency of Mestimators with geometrically decomposable penalties
 Advances in Neural Information Processing Systems
, 2013
"... Penalized Mestimators are used in diverse areas of science and engineering to fit highdimensional models with some lowdimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irre ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
Penalized Mestimators are used in diverse areas of science and engineering to fit highdimensional models with some lowdimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of Mestimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1