Results 1  10
of
21
Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
"... The ℓ1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm f ..."
Abstract

Cited by 44 (9 self)
 Add to MetaCart
(Show Context)
The ℓ1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized logdeterminant program. In contrast to other stateoftheart methods that largely use first order gradient information, our algorithm is based on Newton’s method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other stateoftheart methods. 1
Sparse inverse covariance selection via alternating linearization methods
"... Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse co ..."
Abstract

Cited by 34 (4 self)
 Add to MetaCart
(Show Context)
Gaussian graphical models are of great interest in statistical learning. Because the conditional independencies between different nodes correspond to zero entries in the inverse covariance matrix of the Gaussian distribution, one can learn the structure of the graph by estimating a sparse inverse covariance matrix from sample data, by solving a convex maximum likelihood problem with an ℓ1regularization term. In this paper, we propose a firstorder method based on an alternating linearization technique that exploits the problem’s special structure; in particular, the subproblems solved in each iteration have closedform solutions. Moreover, our algorithm obtains an ϵoptimal solution in O(1/ϵ) iterations. Numerical experiments on both synthetic and real data from gene association networks show that a practical version of this algorithm outperforms other competitive algorithms. 1
NewtonLike Methods for Sparse Inverse Covariance Estimation
, 2012
"... We propose two classes of secondorder optimization methods for solving the sparse inverse covariance estimation problem. The first approach, which we call the NewtonLASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fa ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
We propose two classes of secondorder optimization methods for solving the sparse inverse covariance estimation problem. The first approach, which we call the NewtonLASSO method, minimizes a piecewise quadratic model of the objective function at every iteration to generate a step. We employ the fast iterative shrinkage thresholding method (FISTA) to solve this subproblem. The second approach, which we call the OrthantBased Newton method, is a twophase algorithm that first identifies an orthant face and then minimizes a smooth quadratic approximation of the objective function using the conjugate gradient method. These methods exploit the structure of the Hessian to efficiently compute the search direction and to avoid explicitly storing the Hessian. We show that quasiNewton methods are also effective in this context, and describe a limited memory BFGS variant of the orthantbased Newton method. We present numerical results that suggest that all the techniques described in this paper have attractive properties and constitute useful tools for solving the sparse inverse covariance estimation problem. Comparisons with the method implemented in the QUIC software package [1] are presented. 1
Penalty Decomposition Methods for l0Norm Minimization ∗
, 2010
"... In this paper we consider general l0norm minimization problems, that is, the problems with l0norm appearing in either objective function or constraint. In particular, we first reformulate the l0norm constrained problem as an equivalent rank minimization problem and then apply the penalty decompos ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
(Show Context)
In this paper we consider general l0norm minimization problems, that is, the problems with l0norm appearing in either objective function or constraint. In particular, we first reformulate the l0norm constrained problem as an equivalent rank minimization problem and then apply the penalty decomposition (PD) method proposed in [33] to solve the latter problem. By utilizing the special structures, we then transform all matrix operations of this method to vector operations and obtain a PD method that only involves vector operations. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD method satisfies a firstorder optimality condition that is generally stronger than one natural optimality condition. We further extend the PD method to solve the problem with the l0norm appearing in objective function. Finally, we test the performance of our PD methods by applying them to compressed sensing, sparse logistic regression and sparse inverse covariance selection. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed. Key words: l0norm minimization, penalty decomposition methods, compressed sensing, sparse logistic regression, sparse inverse covariance selection 1
Sparse approximation via penalty decomposition methods
 SIAM J. Optim
, 2013
"... In this paper we consider sparse approximation problems, that is, general l0 minimization problems with the l0“norm ” of a vector being a part of constraints or objective function. In particular, we first study the firstorder optimality conditions for these problems. We then propose penalty deco ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In this paper we consider sparse approximation problems, that is, general l0 minimization problems with the l0“norm ” of a vector being a part of constraints or objective function. In particular, we first study the firstorder optimality conditions for these problems. We then propose penalty decomposition (PD) methods for solving them in which a sequence of penalty subproblems are solved by a block coordinate descent (BCD) method. Under some suitable assumptions, we establish that any accumulation point of the sequence generated by the PD methods satisfies the firstorder optimality conditions of the problems. Furthermore, for the problems in which the l0 part is the only nonconvex part, we show that such an accumulation point is a local minimizer of the problems. In addition, we show that any accumulation point of the sequence generated by the BCD method is a block coordinate minimizer of the penalty subproblem. Moreover, for the problems in which the l0 part is the only nonconvex part, we establish that such an accumulation point is a local minimizer of the penalty subproblem. Finally, we test the performance of our PD methods by applying them to sparse logistic regression, sparse inverse covariance selection, and compressed sensing problems. The computational results demonstrate that when solutions of same cardinality are sought, our approach applied to the l0based models generally has better solution quality and/or speed than the existing approaches that are applied to the corresponding l1based models. Key words: l0 minimization, penalty decomposition methods, block coordinate descent method, compressed sensing, sparse logistic regression, sparse inverse covariance selection 1
Large Scale Distributed Sparse Precision Estimation
"... We consider the problem of sparse precision matrix estimation in high dimensions using the CLIME estimator, which has several desirable theoretical properties. We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We consider the problem of sparse precision matrix estimation in high dimensions using the CLIME estimator, which has several desirable theoretical properties. We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the objective and optimality conditions. Further, we develop a large scale distributed framework for the computations, which scales to millions of dimensions and trillions of parameters, using hundreds of cores. The proposed framework solves CLIME in columnblocks and only involves elementwise operations and parallel matrix multiplications. We evaluate our algorithm on both sharedmemory and distributedmemory architectures, which can use block cyclic distribution of data and parameters to achieve load balance and improve the efficiency in the use of memory hierarchies. Experimental results show that our algorithm is substantially more scalable than stateoftheart methods and scales almost linearly with the number of cores. 1
Alternating proximal gradient method for convex minimization
"... Abstract. In this paper, we propose an alternating proximal gradient method that solves convex minimization problems with three or more separable blocks in the objective function. Our method is based on the framework of alternating direction method of multipliers. The main computational effort in ea ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we propose an alternating proximal gradient method that solves convex minimization problems with three or more separable blocks in the objective function. Our method is based on the framework of alternating direction method of multipliers. The main computational effort in each iteration of the proposed method is to compute the proximal mappings of the involved convex functions. The global convergence result of the proposed method is established. We show that many interesting problems arising from machine learning, statistics, medical imaging and computer vision can be solved by the proposed method. Numerical results on problems such as latent variable graphical model selection, stable principal component pursuit and compressive principal component pursuit are presented.
A DivideandConquer Procedure for Sparse Inverse Covariance Estimation
"... We consider the composite logdeterminant optimization problem, arising from the ℓ1 regularized Gaussian maximum likelihood estimator of a sparse inverse covariance matrix, in a highdimensional setting with a very large number of variables. Recent work has shown this estimator to have strong statis ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We consider the composite logdeterminant optimization problem, arising from the ℓ1 regularized Gaussian maximum likelihood estimator of a sparse inverse covariance matrix, in a highdimensional setting with a very large number of variables. Recent work has shown this estimator to have strong statistical guarantees in recovering the true structure of the sparse inverse covariance matrix, or alternatively the underlying graph structure of the corresponding Gaussian Markov Random Field, even in very highdimensional regimes with a limited number of samples. In this paper, we are concerned with the computational cost in solving the above optimization problem. Our proposed algorithm partitions the problem into smaller subproblems, and uses the solutions of the subproblems to build a good approximation for the original problem. Our key idea for the divide step to obtain a subproblem partition is as follows: we first derive a tractable bound on the quality of the approximate solution obtained from solving the corresponding subdivided problems. Based on this bound, we propose a clustering algorithm that attempts to minimize this bound, in order to find effective partitions of the variables. For the conquer step, we use the approximate solution, i.e., solution resulting from solving the subproblems, as an initial point to solve the original problem, and thereby achieve a much faster computational procedure. 1
A PROXIMAL POINT ALGORITHM FOR LOGDETERMINANT OPTIMIZATION WITH GROUP LASSO REGULARIZATION
"... We consider the covariance selection problem where variables are clustered into groups and the inverse covariance matrix is expected to have a blockwise sparse structure. This problem is realized via penalizing the maximum likelihood estimation of the inverse covariance matrix by group Lasso regul ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We consider the covariance selection problem where variables are clustered into groups and the inverse covariance matrix is expected to have a blockwise sparse structure. This problem is realized via penalizing the maximum likelihood estimation of the inverse covariance matrix by group Lasso regularization. We propose to solve the resulting logdeterminant optimization problem by the classical proximal point algorithm (PPA). At each iteration, as it is difficult to update the primal variables directly, we first solve the dual subproblem by a NewtonCG method and then update the primal variables by explicit formulas based on the computed dual variables. We also propose to accelerate the PPA by an inexact generalized Newton’s method when the iterate is close to the solution. Theoretically, we prove that, at the optimal solution, the negative definiteness of the generalized Hessian matrices of the dual objective function is equivalent to the constraint nondegeneracy condition for the primal problem. Global and local convergence results are also presented for the proposed PPA. Moreover, based on the augmented Lagrangian function of the dual problem we derive an alternating direction method (ADM), which is easily implementable, and demonstrated to be efficient for some random problems. Numerical results, including comparisons with the ADM, are presented to demonstrate that the proposed NewtonCG based PPA is stable, efficient and, in particular, outperforms the ADM, especially when higher accuracy is required.