Results 1  10
of
53
Model selection through sparse maximum likelihood estimation
 Journal of Machine Learning Research
, 2008
"... We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memor ..."
Abstract

Cited by 337 (2 self)
 Add to MetaCart
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive ℓ1norm penalized regression. Our second algorithm, based on Nesterov’s first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright and Jordan, 2006), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.
ℓ1 Trend Filtering
, 2007
"... The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on HodrickPrescott (HP) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., ..."
Abstract

Cited by 46 (6 self)
 Add to MetaCart
The problem of estimating underlying trends in time series data arises in a variety of disciplines. In this paper we propose a variation on HodrickPrescott (HP) filtering, a widely used method for trend estimation. The proposed ℓ1 trend filtering method substitutes a sum of absolute values (i.e., an ℓ1norm) for the sum of squares used in HP filtering to penalize variations in the estimated trend. The ℓ1 trend filtering method produces trend estimates that are piecewise linear, and therefore is well suited to analyzing time series with an underlying piecewise linear trend. The kinks, knots, or changes in slope, of the estimated trend can be interpreted as abrupt changes or events in the underlying dynamics of the time series. Using specialized interiorpoint methods, ℓ1 trend filtering can be carried out with not much more effort than HP filtering; in particular, the number of arithmetic operations required grows linearly with the number of data points. We describe the method and some of its basic properties, and give some illustrative examples. We show how the method is related to ℓ1 regularization based methods in sparse signal recovery and feature selection, and list some extensions of the basic method.
Estimation of sparse binary pairwise Markov networks using pseudolikelihood
 J
"... We consider the problems of estimating the parameters as well as the structure of binaryvalued Markov networks. For maximizing the penalized loglikelihood, we implement an approximate procedure based on the pseudolikelihood of Besag (1975) and generalize it to a fast exact algorithm. The exact al ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
We consider the problems of estimating the parameters as well as the structure of binaryvalued Markov networks. For maximizing the penalized loglikelihood, we implement an approximate procedure based on the pseudolikelihood of Besag (1975) and generalize it to a fast exact algorithm. The exact algorithm starts with the pseudolikelihood solution and then adjusts the pseudolikelihood criterion so that each additional iterations moves it closer to the exact solution. Our results show that this procedure is faster than the competing exact method proposed by Lee, Ganapathi, and Koller (2006a). However, we also find that the approximate pseudolikelihood as well as the approaches of Wainwright et al. (2006), when implemented using the coordinate descent procedure of Friedman, Hastie, and Tibshirani (2008b), are much faster than the exact methods, and only slightly less accurate.
Smooth optimization approach for sparse covariance selection
 SIAM J. Optim
"... In this paper we first study a smooth optimization approach for solving a class of nonsmooth strictly concave maximization problems whose objective functions admit smooth convex minimization reformulations. In particular, we apply Nesterov’s smooth optimization technique [19, 21] to their dual coun ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
(Show Context)
In this paper we first study a smooth optimization approach for solving a class of nonsmooth strictly concave maximization problems whose objective functions admit smooth convex minimization reformulations. In particular, we apply Nesterov’s smooth optimization technique [19, 21] to their dual counterparts that are smooth convex problems. It is shown that the resulting approach has O(1 / √ ǫ) iteration complexity for finding an ǫoptimal solution to both primal and dual problems. We then discuss the application of this approach to sparse covariance selection that is approximately solved as a l1norm penalized maximum likelihood estimation problem, and also propose a variant of this approach which has substantially outperformed the latter one in our computational experiments. We finally compare the performance of these approaches with other firstorder methods, namely, Nesterov’s O(1/ǫ) smooth approximation scheme and blockcoordinate descent method studied in [9, 15] for sparse covariance selection on a set of randomly generated instances. It shows that our smooth optimization approach substantially outperforms the first method above, and moreover, its variant substantially outperforms both methods above.
An inexact interior point method for l1regularized sparse covariance selection
, 2010
"... Sparse covariance selection problems can be formulated as logdeterminant (logdet) semidefinite programming (SDP) problems with large numbers of linear constraints. Standard primaldual interiorpoint methods that are based on solving the Schur complement equation would encounter severe computation ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
Sparse covariance selection problems can be formulated as logdeterminant (logdet) semidefinite programming (SDP) problems with large numbers of linear constraints. Standard primaldual interiorpoint methods that are based on solving the Schur complement equation would encounter severe computational bottlenecks if they are applied to solve these SDPs. In this paper, we consider a customized inexact primaldual pathfollowing interiorpoint algorithm for solving large scale logdet SDP problems arising from sparse covariance selection problems. Our inexact algorithm solve the large and illconditioned linear system of equations in each iteration by a preconditioned iterative solver. By exploiting the structures in sparse covariance selection problems, we are able to design highly effective preconditioners to efficiently solve the large and illconditioned linear systems. Numerical experiments on both synthetic and real covariance selection problems show that our algorithm is highly efficient and outperforms other existing algorithms.
ALTERNATING DIRECTION METHODS FOR SPARSE COVARIANCE SELECTION
, 2009
"... The mathematical model of the widelyused sparse covariance selection problem (SCSP) is an NPhard combinatorial problem, whereas it can be well approximately by a convex relaxation problem whose maximum likelihood estimation is penalized by the L1 norm. This convex relaxation problem, however, is ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
The mathematical model of the widelyused sparse covariance selection problem (SCSP) is an NPhard combinatorial problem, whereas it can be well approximately by a convex relaxation problem whose maximum likelihood estimation is penalized by the L1 norm. This convex relaxation problem, however, is still numerically challenging, especially for largescale cases. Recently, some efficient firstorder methods inspired by Nesterov’s work have been proposed to solve the convex relaxation problem of SCSP. This paper is to apply the wellknown alternating direction method (ADM), which is also a firstorder method, to solve the convex relaxation of SCSP. Due to the full exploitation to the separable structure of a simple reformulation of the convex relaxation problem, the ADM approach is very efficient for solving largescale SCSP. Our preliminary numerical results show that the ADM approach substantially outperforms existing firstorder methods for SCSP.
Blockcoordinate gradient descent method for linearly constrained nonsmooth separable optimization
 Journal of Optimization Theory and Applications
, 2009
"... In honor of Professor Paul Tseng, who went missing while on a kayak trip in Jinsha river, China, on August 13, 2009, for his contributions to the theory and algorithms for largescale optimization We consider a class of unconstrained nonsmooth convex optimization problems, in which the objective fun ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
(Show Context)
In honor of Professor Paul Tseng, who went missing while on a kayak trip in Jinsha river, China, on August 13, 2009, for his contributions to the theory and algorithms for largescale optimization We consider a class of unconstrained nonsmooth convex optimization problems, in which the objective function is the sum of a convex smooth function on an open subset of matrices and a separable convex function on a set of matrices. This problem includes the covariance selection estimation problem that can be expressed as an `1penalized maximum likelihood estimation problem. In this paper, we propose a block coordinate gradient descent method (abbreviated as BCGD) for solving this class of nonsmooth separable problems with the coordinate block chosen by a GaussSeidel rule. The method is simple, highly parallelizable, and suited for largescale problems. We establish global convergence and, under a local Lipschizian error bound assumption, linear rate of convergence for this method. For the covariance selection estimation problem, the method can terminate in O(n3/) iterations with an optimal solution. We compare the performance of the BCGD method with firstorder methods studied in [11, 12] for solving the covariance selection problem on randomly generated instances. Our numerical experience suggests that the BCGD method can be efficient for largescale covariance selection problems with constraints. Key words. Block coordinate gradient descent, complexity, convex optimization, covariance selection, global convergence, linear rate convergence, `1penalization, maximum likelihood estimation
Industrial Engineering,
"... Recent advances in neuroimaging techniques provide great potentials for effective diagnosis of Alzheimer’s disease (AD), the most common form of dementia. Previous studies have shown that AD is closely related to the alternation in the functional brain network, i.e., the functional connectivity amon ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
Recent advances in neuroimaging techniques provide great potentials for effective diagnosis of Alzheimer’s disease (AD), the most common form of dementia. Previous studies have shown that AD is closely related to the alternation in the functional brain network, i.e., the functional connectivity among different brain regions. In this paper, we consider the problem of learning functional brain connectivity from neuroimaging, which holds great promise for identifying imagebased markers used to distinguish Normal Controls (NC), patients with Mild Cognitive Impairment (MCI), and patients with AD. More specifically, we study sparse inverse covariance estimation (SICE), also known as exploratory Gaussian graphical models, for brain connectivity modeling. In particular, we apply SICE to learn and analyze functional brain connectivity patterns from different subject groups, based on a key property of SICE, called the “monotone property ” we established in this paper. Our experimental results on neuroimaging PET data of 42 AD, 116 MCI, and 67 NC subjects reveal several interesting connectivity patterns consistent with literature findings, and also some new patterns that can help the knowledge discovery of AD. 1
Adaptive firstorder methods for general sparse inverse covariance selection
 SIAM J. Matrix Anal. Appl
, 2010
"... In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical model whose conditional independence is assumed to be partially known. Similarly as in [5], we formulate it as an l1norm penalized maximum likelihood estimation problem. Further, we propose an algorithm framewor ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical model whose conditional independence is assumed to be partially known. Similarly as in [5], we formulate it as an l1norm penalized maximum likelihood estimation problem. Further, we propose an algorithm framework, and develop two firstorder methods, that is, the adaptive spectral projected gradient (ASPG) method and the adaptive Nesterov’s smooth (ANS) method, for solving this estimation problem. Finally, we compare the performance of these two methods on a set of randomly generated instances. Our computational results demonstrate that both methods are able to solve problems of size at least a thousand and number of constraints of nearly a half million within a reasonable amount of time, and the ASPG method generally outperforms the ANS method. Key words: Sparse inverse covariance selection, adaptive spectral projected gradient method, adaptive Nesterov’s smooth method
Sparse Gaussian Graphical Models with Unknown Block Structure
"... Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior tha ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Recent work has shown that one can learn the structure of Gaussian Graphical Models by imposing an L1 penalty on the precision matrix, and then using efficient convex optimization methods to find the penalized maximum likelihood estimate. This is similar to performing MAP estimation with a prior that prefers sparse graphs. In this paper, we use the stochastic block model as a prior. This prefer graphs that are blockwise sparse, but unlike previous work, it does not require that the blocks or groups be specified a priori. The resulting problem is no longer convex, but we devise an efficient variational Bayes algorithm to solve it. We show that our method has better test set likelihood on two different datasets (motion capture and gene expression) compared to independent L1, and can match the performance of group L1 using manually created groups. 1.