Results 1  10
of
28
Computational methods for sparse solution of linear inverse problems
, 2009
"... The goal of sparse approximation problems is to represent a target signal approximately as a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major practical algorithms for sparse approximation. Specific attention is paid to computational issues, ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
The goal of sparse approximation problems is to represent a target signal approximately as a linear combination of a few elementary signals drawn from a fixed collection. This paper surveys the major practical algorithms for sparse approximation. Specific attention is paid to computational issues, to the circumstances in which individual methods tend to perform well, and to the theoretical guarantees available. Many fundamental questions in electrical engineering, statistics, and applied mathematics can be posed as sparse approximation problems, making these algorithms versatile and relevant to a wealth of applications.
Lange K: Genomewide Association Analysis by Lasso Penalized Logistic Regression
 Bioinformatics
"... Motivation: In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations. Method: The present paper evaluates the performance of las ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Motivation: In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations. Method: The present paper evaluates the performance of lasso penalized logistic regression in casecontrol disease gene mapping with a large number of SNP (single nucleotide polymorphisms) predictors. The strength of the lasso penalty can be tuned to select a predetermined number of the most relevant SNPs and other predictors. For a given value of the tuning constant, the penalized likelihood is quickly maximized by cyclic coordinate ascent. Once the most potent marginal predictors are identified, their twoway and higherorder interactions can also be examined by lasso penalized logistic regression. Results: This strategy is tested on both simulated and real data. Our findings on coeliac disease replicate the previous single SNP results and shed light on possible interactions among the SNPs. Availability: The software discussed is available in Mendel 9.0 at the
Accelerated blockcoordinate relaxation for regularized optimization
 SIAM J. Optim
"... Abstract. We discuss minimization of a smooth function to which is added a separable regularization function that induces structure in the solution. A blockcoordinate relaxation approach with proximal linearized subproblems yields convergence to critical points, while identification of the optimal ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
Abstract. We discuss minimization of a smooth function to which is added a separable regularization function that induces structure in the solution. A blockcoordinate relaxation approach with proximal linearized subproblems yields convergence to critical points, while identification of the optimal manifold (under a nondegeneracy condition) allows acceleration techniques to be applied on a reduced space. The work is motivated by experience with an algorithm for regularized logistic regression, and computational results for the algorithm on problems of this type are presented. Key words. regularized optimization, block coordinate relaxation, active manifold identification AMS subject classifications. 49K40, 49M27, 90C31 1. Introduction. We
Lifted coordinate descent for learning with tracenorm regularization
 AISTATS
, 2012
"... We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeated ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We consider the minimization of a smooth loss with tracenorm regularization, which is a natural objective in multiclass and multitask learning. Even though the problem is convex, existing approaches rely on optimizing a nonconvex variational bound, which is not guaranteed to converge, or repeatedly perform singularvalue decomposition, which prevents scaling beyond moderate matrix sizes. We lift the nonsmooth convex problem into an infinitely dimensional smooth problem and apply coordinate descent to solve it. We prove that our approach converges to the optimum, and is competitive or outperforms state of the art. 1
A proximal method for composite minimization
, 2008
"... Abstract. We consider minimization of functions that are compositions of proxregular functions with smooth vector functions. A wide variety of important optimization problems can be formulated in this way. We describe a subproblem constructed from a linearized approximation to the objective and a r ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract. We consider minimization of functions that are compositions of proxregular functions with smooth vector functions. A wide variety of important optimization problems can be formulated in this way. We describe a subproblem constructed from a linearized approximation to the objective and a regularization term, investigating the properties of local solutions of this subproblem and showing that they eventually identify a manifold containing the solution of the original problem. We propose an algorithmic framework based on this subproblem and prove a global convergence result.
Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning
"... Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, sparsity) in the solution, the solution often lies on a lowdimensional manifold along which the regularizer is smooth. This paper shows that a regularized dual averaging algorithm can identify this manifold with high probability. This observation motivates an algorithmic strategy in which, once a nearoptimal manifold is identified, we switch to an algorithm that searches only in this manifold, which typically has much lower intrinsic dimension than the full space, thus converging quickly to a nearoptimal point with the desired structure. Computational results are presented to illustrate these claims. 1.
A coordinate gradient descent method for ℓ1regularized convex minimization
 Department of Mathematics, National University of Singapore
, 2008
"... In applications such as signal processing and statistics, many problems involve finding sparse solutions to underdetermined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing ℓ1regularized linear least squa ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In applications such as signal processing and statistics, many problems involve finding sparse solutions to underdetermined linear systems of equations. These problems can be formulated as a structured nonsmooth optimization problems, i.e., the problem of minimizing ℓ1regularized linear least squares problems. In this paper, we propose a block coordinate gradient descent method (abbreviated as CGD) to solve the more general ℓ1regularized convex minimization problems, i.e., the problem of minimizing an ℓ1regularized convex smooth function. We establish a Qlinear convergence rate for our method when the coordinate block is chosen by a GaussSouthwelltype rule to ensure sufficient descent. We propose efficient implementations of the CGD method and report numerical results for solving largescale ℓ1regularized linear least squares problems arising in compressed sensing and image deconvolution as well as largescale ℓ1regularized logistic regression problems for feature selection in data classification. Comparison with several stateoftheart algorithms specifically designed for solving largescale ℓ1regularized linear least squares or logistic regression problems suggests that an efficiently implemented CGD method may outperform these algorithms despite the fact that the CGD method is not specifically designed just to solve these special classes of problems. Key words. Coordinate gradient descent, Qlinear convergence, ℓ1regularization, compressed sensing, image deconvolution, linear least squares, logistic regression, convex optimization
Learning HigherOrder Graph Structure with Features by Structure Penalty
"... In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P(Y  X) is determined by functions of X that characterize th ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P(Y  X) is determined by functions of X that characterize the (higherorder) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
Multivariate Bernoulli Distribution Models
, 2012
"... First and most importantly, I would like to express my deepest gratitude toward my advisor Professor Grace Wahba. Her guidance and encouragement through my PhD study into various statistical machine learning methods is the key factor to the success of this dissertation. Grace is a brilliant and pass ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
First and most importantly, I would like to express my deepest gratitude toward my advisor Professor Grace Wahba. Her guidance and encouragement through my PhD study into various statistical machine learning methods is the key factor to the success of this dissertation. Grace is a brilliant and passionate statistician, and her insightful ideas in both statistical theories and applications inspire me. It is a great honor and privilege to have the opportunity to work closely and learn from her. This work is also the product of collaboration with a number of researchers. In particular, I would like to thank Professor Stephen Wright from Department of Computer Science for his guidance in computation. Without him, the proposed models in this thesis would not be solved with efficient optimization techniques. In addition, I am grateful to other professors in my thesis committee. I benefit from Professor Sündüz Keles ’ expertise in biostatistics and her valuable ideas in the Thursday group. On the other hand, Professor Peter Qian and Sijian Wang raised questions with deep perception and helped greatly improve the thesis. I am also greatly influenced by Professor Karl Rohe and Xinwei Deng for their improving suggestion to this work. ii I want to thank Xiwen Ma and Shilin Ding for their effort on our collaborative
060095. LASSOPatternsearch Algorithm By
, 2008
"... The LASSOPatternsearch Algorithm and its variant the Grouped LASSOPatternsearch Algorithm are proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The LASSOPatternsearch Algorithm and its variant the Grouped LASSOPatternsearch Algorithm are proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. Both methods are designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. In the LASSOPatternsearch Algorithm, a LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act as a model selector, is used at both steps. We applied the method to myopia data from the populationbased Beaver Dam Eye Study, exposing physiologically interesting interacting risk factors. We then