Results 1  10
of
74
Model selection through sparse maximum likelihood estimation
 Journal of Machine Learning Research
, 2008
"... We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memor ..."
Abstract

Cited by 158 (1 self)
 Add to MetaCart
We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added ℓ1norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive ℓ1norm penalized regression. Our second algorithm, based on Nesterov’s first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright and Jordan, 2006), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.
An interiorpoint method for largescale l1regularized logistic regression
 Journal of Machine Learning Research
, 2007
"... Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand ..."
Abstract

Cited by 153 (6 self)
 Add to MetaCart
Logistic regression with ℓ1 regularization has been proposed as a promising method for feature selection in classification problems. In this paper we describe an efficient interiorpoint method for solving largescale ℓ1regularized logistic regression problems. Small problems with up to a thousand or so features and examples can be solved in seconds on a PC; medium sized problems, with tens of thousands of features and examples, can be solved in tens of seconds (assuming some sparsity in the data). A variation on the basic method, that uses a preconditioned conjugate gradient method to compute the search step, can solve very large problems, with a million features and examples (e.g., the 20 Newsgroups data set), in a few minutes, on a PC. Using warmstart techniques, a good approximation of the entire regularization path can be computed much more efficiently than by solving a family of problems independently.
TreeGuided Group Lasso for MultiTask Regression with Structured Sparsity
"... We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each outp ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
We consider the problem of learning a sparse multitask regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available as prior knowledge, we formulate this problem as a new multitask regularized regression called treeguided group lasso. Our structured regularization is based on a grouplasso penalty, where groups are defined with respect to the tree structure. We describe a systematic weighting scheme for the groups in the penalty such that each output variable is penalized in a balanced manner even if the groups overlap. We present an efficient optimization method that can handle a largescale problem. Using simulated and yeast datasets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns compared to other methods for multitask learning. 1.
HIGHDIMENSIONAL ISING MODEL SELECTION USING ℓ1REGULARIZED LOGISTIC REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is ..."
Abstract

Cited by 40 (13 self)
 Add to MetaCart
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is analyzed under highdimensional scaling, in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n = Ω(d 2 log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.
Sparse Signal Recovery Using Markov Random Fields
"... Compressive Sensing (CS) combines sampling and compression into a single subNyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov ..."
Abstract

Cited by 39 (10 self)
 Add to MetaCart
Compressive Sensing (CS) combines sampling and compression into a single subNyquist linear measurement process for sparse and compressible signals. In this paper, we extend the theory of CS to include signals that are concisely represented in terms of a graphical model. In particular, we use Markov Random Fields (MRFs) to represent sparse signals whose nonzero coefficients are clustered. Our new modelbased recovery algorithm, dubbed Lattice Matching Pursuit (LaMP), stably recovers MRFmodeled signals using many fewer measurements and computations than the current stateoftheart algorithms. 1
Reconstruction of Markov random fields from samples: Some easy observations and algorithms
, 2008
"... Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the und ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
Markov random fields are used to model high dimensional distributions in a number of applied areas. Much recent interest has been devoted to the reconstruction of the dependency structure from independent samples from the Markov random fields. We analyze a simple algorithm for reconstructing the underlying graph defining a Markov random field on n nodes and maximum degree d given observations. We show that under mild nondegeneracy conditions it reconstructs the generating graph with high probability using Θ(d log n) samples which is optimal up to a multiplicative constant. Our results seem to be the first results for general models that guarantee that the generating model is reconstructed. Furthermore, we provide explicit O(n d+2 log n) running time bound. In cases where the measure on the graph has correlation decay, the running time is O(n 2 log n) for all fixed d. We also discuss the effect of observing noisy samples and show that as long as the noise level is low, our algorithm is effective. On the other hand, we construct an example where large noise implies nonidentifiability even for generic noise and interactions. Finally, we briefly show that in some cases, models with hidden nodes can also be recovered. 1
Projected Subgradient Methods for Learning Sparse Gaussians
"... Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient meth ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the ℓ1regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains—biological network analysis and a 2Dshape modeling image task. 1
Recovering timevarying networks of dependencies in social and biological studies
 Proc. Nat. Acad. Sci
, 2009
"... A plausible representation of the relational information among entities in dynamic systems such as a living cell or a social community is a stochastic network that is topologically rewiring and semantically evolving over time. While there is a rich literature in modeling static or temporally invaria ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
A plausible representation of the relational information among entities in dynamic systems such as a living cell or a social community is a stochastic network that is topologically rewiring and semantically evolving over time. While there is a rich literature in modeling static or temporally invariant networks, little has been done toward recovering the network structure when the networks are not observable in a dynamic context. In this paper, we present a new machine learning method called TESLA, which builds on a temporally smoothed l1regularized logistic regression formalism that can be cast as a standard convexoptimization problem and solved efficiently using generic solvers scalable to large networks. We report promising results on recovering simulated timevarying networks, and on reverse engineering the latent sequence of temporally rewiring political and academic social networks from longitudinal data, and the evolving gene networks over more than 4000 genes during the life cycle of Drosophila melanogaster from a microarray time course at a resolution limited only by sample frequency.
Estimation of sparse binary pairwise Markov networks using pseudolikelihood
 J
"... We consider the problems of estimating the parameters as well as the structure of binaryvalued Markov networks. For maximizing the penalized loglikelihood, we implement an approximate procedure based on the pseudolikelihood of Besag (1975) and generalize it to a fast exact algorithm. The exact al ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
We consider the problems of estimating the parameters as well as the structure of binaryvalued Markov networks. For maximizing the penalized loglikelihood, we implement an approximate procedure based on the pseudolikelihood of Besag (1975) and generalize it to a fast exact algorithm. The exact algorithm starts with the pseudolikelihood solution and then adjusts the pseudolikelihood criterion so that each additional iterations moves it closer to the exact solution. Our results show that this procedure is faster than the competing exact method proposed by Lee, Ganapathi, and Koller (2006a). However, we also find that the approximate pseudolikelihood as well as the approaches of Wainwright et al. (2006), when implemented using the coordinate descent procedure of Friedman, Hastie, and Tibshirani (2008b), are much faster than the exact methods, and only slightly less accurate.
On the Convexity of Latent Social Network Inference
"... In many realworld scenarios, it is nearly impossible to collect explicit social network data. In such cases, whole networks must be inferred from underlying observations. Here, we formulate the problem of inferring latent social networks based on network diffusion or disease propagation data. We co ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
In many realworld scenarios, it is nearly impossible to collect explicit social network data. In such cases, whole networks must be inferred from underlying observations. Here, we formulate the problem of inferring latent social networks based on network diffusion or disease propagation data. We consider contagions propagating over the edges of an unobserved social network, where we only observe the times when nodes became infected, but not who infected them. Given such node infection times, we then identify the optimal network that best explains the observed data. We present a maximum likelihood approach based on convex programming with a l1like penalty term that encourages sparsity. Experiments on real and synthetic data reveal that our method nearperfectly recovers the underlying network structure as well as the parameters of the contagion propagation model. Moreover, our approach scales well as it can infer optimal networks on thousand nodes in a matter of minutes. 1