• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1 + ℓ2 penalization. (2008)

by Florentina Bunea
Venue:Electronic Journal of Statistics,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 39
Next 10 →

A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers

by Sahand Negahban, Pradeep Ravikumar, Martin J. Wainwright, Bin Yu
"... ..."
Abstract - Cited by 218 (32 self) - Add to MetaCart
Abstract not found

Self-concordant analysis for logistic regression

by Francis Bach
"... Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensio ..."
Abstract - Cited by 46 (14 self) - Add to MetaCart
Most of the non-asymptotic theoretical work in regression is carried out for the square loss, where estimators can be obtained through closed-form expressions. In this paper, we use and extend tools from the convex optimization literature, namely self-concordant functions, to provide simple extensions of theoretical results for the square loss to the logistic loss. We apply the extension techniques to logistic regression with regularization by the ℓ2-norm and regularization by the ℓ1-norm, showing that new results for binary classification through logistic regression can be easily derived from corresponding results for least-squares regression. 1
(Show Context)

Citation Context

...ions have to be introduced regarding second and third 1hal-00426227, version 1 - 23 Oct 2009 derivatives; this makes the derived results much more complicated than the ones for closedform estimators =-=[3, 4, 5]-=-. A similar situation occurs in convex optimization, for the study of Newton’s method for obtaining solutions of unconstrained optimization problems. It is known to be locally quadratically convergent...

Robust 1-bit compressed sensing and sparse logistic regression: A convex programming approach. Preprint. Available at http://arxiv.org/abs/1202.1212

by Yaniv Plan, Roman Vershynin
"... Abstract. This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. Wedemonstrate thatasingle convexprogram gives anaccurate estimate of the signal, or coefficient vector, for both of these models. We show that an s-sparse signal in R n can be a ..."
Abstract - Cited by 44 (4 self) - Add to MetaCart
Abstract. This paper develops theoretical results regarding noisy 1-bit compressed sensing and sparse binomial regression. Wedemonstrate thatasingle convexprogram gives anaccurate estimate of the signal, or coefficient vector, for both of these models. We show that an s-sparse signal in R n can be accurately estimated from m = O(slog(n/s)) single-bit measurements using a simple convex program. This remains true even if each measurement bit is flipped with probability nearly 1/2. Worst-case (adversarial) noise can also be accounted for, and uniform results that hold for all sparse inputs are derived as well. In the terminology of sparse logistic regression, we show that O(slog(2n/s)) Bernoulli trials are sufficient to estimate a coefficient vector in R n which is approximately s-sparse. Moreover, the same convex program works for virtually all generalized linear models, in which the link function may be unknown. To our knowledge, these are the first results that tie together the theory of sparse logistic regression to 1-bit compressed sensing. Our results apply to general signal structures aside from sparsity; one only needs to know the size of the set K where signals reside. The size is given by the mean width of K, a computable quantity whose square serves as a robust extension of the dimension. 1.
(Show Context)

Citation Context

...reted as the generalized linear model, so our results can be readily used for various problems in sparse binomial regression. Some of the recent work in sparse binomial regression includes the papers =-=[21, 6, 27, 2, 25, 20, 15]-=-. Let us point to the most directly comparable results. In [2, 6, 15, 21] the authors propose to estimate the coefficient vector (which in our notation is x) by minimizing the negative log-likelihood ...

Recovering sparse signals with a certain family of non-convex penalties and DC programming

by Gilles Gasso, Alain Rakotomamonjy, Stéphane Canu - IEEE TRANSACTIONS ON SIGNAL PROCESSING
"... This paper considers the problem of recovering a sparse signal representation according to a signal dictionary. This problem could be formalized as a penalized least-squares problem in which sparsity is usually induced by a ℓ1-norm penalty on the coefficients. Such an approach known as the Lasso or ..."
Abstract - Cited by 42 (7 self) - Add to MetaCart
This paper considers the problem of recovering a sparse signal representation according to a signal dictionary. This problem could be formalized as a penalized least-squares problem in which sparsity is usually induced by a ℓ1-norm penalty on the coefficients. Such an approach known as the Lasso or Basis Pursuit Denoising has been shown to perform reasonably well in some situations. However, it was also proved that non-convex penalties like the pseudo ℓq-norm with q < 1 or SCAD penalty are able to recover sparsity in a more efficient way than the Lasso. Several algorithms have been proposed for solving the resulting non-convex least-squares problem. This paper proposes a generic algorithm to address such a sparsity recovery problem for some class of non-convex penalties. Our main contribution is that the proposed methodology is based on an iterative algorithm which solves at each iteration a convex weighted Lasso problem. It relies on the family of non-convex penalties which can be decomposed as a difference of convex functions. This allows us to apply difference of convex functions programming which is a generic and principled way for solving non-smooth and non-convex optimization problem. We also show that several algorithms in the literature dealing with non-convex penalties are particular instances of our algorithm. Experimental results demonstrate the effectiveness of the proposed generic framework compared to existing algorithms, including iterative reweighted least-squares methods.

Robust Lasso with missing and grossly corrupted observations

by Nam H. Nguyen, Trac D. Tran , 2011
"... ..."
Abstract - Cited by 21 (1 self) - Add to MetaCart
Abstract not found

Greedy sparsity-constrained optimization

by Sohail Bahmani, Petros Boufounos, Bhiksha Raj - in Signals, Systems and Computers (ASILOMAR), 2011 Conference Record of the Forty Fifth Asilomar Conference on, IEEE, 2011
"... Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications d ..."
Abstract - Cited by 20 (4 self) - Add to MetaCart
Abstract—Finding optimal sparse solutions to estimation problems, particularly in underdetermined regimes has recently gained much attention. Most existing literature study linear models in which the squared error is used as the measure of discrepancy to be minimized. However, in many applications discrepancy is measured in more general forms such as loglikelihood. Regularization by ℓ1-norm has been shown to induce sparse solutions, but their sparsity level can be merely suboptimal. In this paper we present a greedy algorithm, dubbed Gradient Support Pursuit (GraSP), for sparsity-constrained optimization. Quantifiable guarantees are provided for GraSP when cost functions have the “Stable Hessian Property”. I.

Sparsistent learning of varying-coefficient models with structural changes.

by Mladen Kolar , Le Song , Eric P Xing - Advances in Neural Information Processing Systems (NIPS), , 2009
"... Abstract To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we inve ..."
Abstract - Cited by 19 (1 self) - Add to MetaCart
Abstract To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model -piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models.
(Show Context)

Citation Context

...)k,l. 5A2: We assume there is a constant 0 < d ≤ 1 such that ( { P max k∈SB j ,l̸=k |σ j kl | ≤ d ∣ ∣ SBj ∣ }) = 1. (15) The assumption A2 is a mild version of the mutual coherence condition used in =-=[7]-=-, which is necessary for identification of the relevant covariates in each segment. Let ˆγj, k = 1,..., ˆ Bn denote the Lasso estimates for each segment obtained by minimizing (8). Theorem 2 Let A2 be...

On Time Varying Undirected Graphs

by Mladen Kolar, Eric P. Xing
"... The time-varying multivariate Gaussian distribution and the undirected graph associated with it, as introduced in Zhou et al. (2008), provide a useful statistical framework for modeling complex dynamic networks. In many application domains, it is of high importance to estimate the graph structure of ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
The time-varying multivariate Gaussian distribution and the undirected graph associated with it, as introduced in Zhou et al. (2008), provide a useful statistical framework for modeling complex dynamic networks. In many application domains, it is of high importance to estimate the graph structure of the model consistently for the purpose of scientific discovery. In this paper, we show that under suitable technical conditions, the structure of the undirected graphical model can be consistently estimated in the high dimensional setting, when the dimensionality of the model is allowed to diverge with the sample size. The model selection consistency is shown for the procedure proposed in Zhou et al. (2008) and for the modified neighborhood selection procedure of Meinshausen and Bühlmann (2006). 1

DIMENSION REDUCTION AND VARIABLE SELECTION IN CASE CONTROL STUDIES VIA REGULARIZED LIKELIHOOD OPTIMIZATION

by Florentina Bunea, Adrian Barbu
"... Abstract. Dimension reduction and variable selection are performed routinely in case-control studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via ℓ1 penalized likelihood optimizat ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Abstract. Dimension reduction and variable selection are performed routinely in case-control studies, but the literature on the theoretical aspects of the resulting estimates is scarce. We bring our contribution to this literature by studying estimators obtained via ℓ1 penalized likelihood optimization. We show that the optimizers of the ℓ1 penalized retrospective likelihood coincide with the optimizers of the ℓ1 penalized prospective likelihood. This extends the results of Prentice and Pyke (1979), obtained for non-regularized likelihoods. We establish both the sup-norm consistency of the odds ratio, after model selection, and the consistency of subset selection of our estimators. The novelty of our theoretical results consists in the study of these properties under the case-control sampling scheme. Our results hold for selection performed over a large collection of candidate variables, with cardinality allowed to depend and be greater than the sample size. We complement our theoretical results with a novel approach of determining data driven tuning parameters, based on the bisection method. The resulting procedure offers significant computational savings when compared with grid search based methods. All our numerical experiments support strongly our theoretical findings. 1.

On model selection consistency of M-estimators with geometrically decomposable penalties

by Jason D. Lee, Yuekai Sun, Jonathan E. Taylor - Advances in Neural Information Processing Systems , 2013
"... Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penal-ties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irre ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penal-ties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geomet-rically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
(Show Context)

Citation Context

...sistency. The model selection consistency of penalized M-estimators has also been extensively studied. The most commonly studied problems are (i) the lasso [30, 26], (ii) GLM’s with the lasso penalty =-=[4, 19, 28]-=-, (iii) covariance estimation [15, 12, 20] and (more generally) structure learning [6, 14]. There are also general results concerning M-estimators with sparsity inducing penalties [29, 16, 11, 22, 8, ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University