Results 1  10
of
106
Firstorder methods for sparse covariance selection
 SIAM Journal on Matrix Analysis and Applications
"... Abstract. Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the ..."
Abstract

Cited by 56 (1 self)
 Add to MetaCart
Abstract. Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the sample variables. We first formulate a convex relaxation of this combinatorial problem, we then detail two efficient firstorder algorithms with low memory requirements to solve largescale, dense problem instances.
Convex optimization techniques for fitting sparse gaussian graphical models
 In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... We consider the problem of fitting a largescale covariance matrix to multivariate Gaussian data in such a way that the inverse is sparse, thus providing model selection. Beginning with a dense empirical covariance matrix, we solve a maximum likelihood problem with an l1norm penalty term added to e ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
We consider the problem of fitting a largescale covariance matrix to multivariate Gaussian data in such a way that the inverse is sparse, thus providing model selection. Beginning with a dense empirical covariance matrix, we solve a maximum likelihood problem with an l1norm penalty term added to encourage sparsity in the inverse. For models with tens of nodes, the resulting problem can be solved using standard interiorpoint algorithms for convex optimization, but these methods scale poorly with problem size. We present two new algorithms aimed at solving problems with a thousand nodes. The first, based on Nesterov’s firstorder algorithm, yields a rigorous complexity estimate for the problem, with a much better dependence on problem size than interiorpoint methods. Our second algorithm uses block coordinate descent, updating row/columns of the covariance matrix sequentially. Experiments with genomic data show that our method is able to uncover biologically interpretable connections among genes. 1.
Projected Subgradient Methods for Learning Sparse Gaussians
"... Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient meth ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the ℓ1regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains—biological network analysis and a 2Dshape modeling image task. 1
A robust procedure for gaussian graphical model search from microarray data with p larger than n
 Journal of Machine Learning Research
, 2006
"... Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a conse ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are fullorder partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of fullorder partial correlations does not exist. In this paper we consider limitedorder partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limitedorder partial correlations, that we call the nonrejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.
Temporal Causal Modeling with Graphical Granger Methods
 In Proceedings of the 13th Int. Conference on Knowledge Discovery and Data Mining, 66 – 75: Association for Computing Machinery
, 2007
"... The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical mod ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of “Granger causality”, based on the intuition that a cause helps predict its effects in the future, has gained attention in many domains involving time series data analysis. With the surge of interest in model selection methodologies for regression, such as the Lasso, as practical alternatives to solving structural learning of graphical models, the question arises whether and how to combine these two notions into a practically viable approach for temporal causal modeling. In this paper, we examine a host of related
Featureinclusion stochastic search for Gaussian graphical models
 J. Comp. Graph. Statist
, 2008
"... We describe a serial algorithm called featureinclusion stochastic search, or FINCS, that uses online estimates of edgeinclusion probabilities to guide Bayesian model determination in Gaussian graphical models. FINCS is compared to MCMC, to Metropolisbased search methods, and to the popular lasso; ..."
Abstract

Cited by 20 (3 self)
 Add to MetaCart
We describe a serial algorithm called featureinclusion stochastic search, or FINCS, that uses online estimates of edgeinclusion probabilities to guide Bayesian model determination in Gaussian graphical models. FINCS is compared to MCMC, to Metropolisbased search methods, and to the popular lasso; it is found to be superior along a variety of dimensions, leading to better sets of discovered models, greater speed and stability, and reasonable estimates of edgeinclusion probabilities. We illustrate FINCS on an example involving mutualfund data, where we compare the modelaveraged predictive performance of models discovered with FINCS to those discovered by competing methods. Some key words: Covariance selection; Metropolis algorithm; lasso; Bayesian model selection; hyperinverse Wishart distribution
Sparse Statistical Modelling in Gene Expression Genomics
, 2006
"... The concept of sparsity is more and more central to practical data analysis and inference with increasingly highdimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for largescale regression, ANOVA and latent f ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
The concept of sparsity is more and more central to practical data analysis and inference with increasingly highdimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for largescale regression, ANOVA and latent factor models, we have extended traditional Bayesian “variable selection” priors and modelling ideas to new hierarchical sparsity priors that are providing substantial practical gains in addressing false discovery and isolating significant genespecific parameters/effects in highly multivariate studies involving thousands of genes. We discuss and review these developments, in the contexts of multivariate regression, ANOVA and latent factor models for multivariate gene expression data arising in either observational or designed experimental studies. The development includes the use of sparse regression components to provide genesample specific normalisation/correction based on control and housekeeping factors, an important general issue and one that can be critical and critically misleading if ignored in many gene expression studies. Two rich data sets are used to provide context and illustration. The first data set arises from a gene expression experiment designed to investigate the transcriptional response in terms of responsive gene subsets and their expression signatures to interventions that upregulate a series of key oncogenes. The second data set is observational, breast cancer tumourderived data evaluated utilising a sparse latent factor model to define and isolate factors underlying the hugely complex patterns of association in gene expression patterns. We also mention software that implements these and other models and methods in one comprehensive framework.
Efficient markov network structure discovery using independence tests
 In Proc SIAM Data Mining
, 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the GrowShrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s wellknown properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition
Shotgun stochastic search for “large p” regression
 Journal of the American Statistical Association
, 2007
"... Model search in regression with very large numbers of candidate predictors raises challenges for both model specification and computation, and standard approaches such as Markov chain Monte Carlo (MCMC) and stepwise methods are often infeasible or ineffective. We describe a novel shotgun stochastic ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
Model search in regression with very large numbers of candidate predictors raises challenges for both model specification and computation, and standard approaches such as Markov chain Monte Carlo (MCMC) and stepwise methods are often infeasible or ineffective. We describe a novel shotgun stochastic search (SSS) approach that explores “interesting” regions of the resulting, very highdimensional model spaces to quickly identify regions of high posterior probability over models. We describe algorithmic and modeling aspects, priors over the model space that induce sparsity and parsimony over and above the traditional dimension penalization implicit in Bayesian and likelihood analyses, and parallel computation using cluster computers. We discuss an example from gene expression cancer genomics, comparisons with MCMC and other methods, and theoretical and simulationbased aspects of performance characteristics in largescale regression model search. We also provide software implementing the methods.