Results 1 - 10
of
74
First-order methods for sparse covariance selection
- SIAM Journal on Matrix Analysis and Applications
"... Abstract. Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Abstract. Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the sample variables. We first formulate a convex relaxation of this combinatorial problem, we then detail two efficient first-order algorithms with low memory requirements to solve large-scale, dense problem instances.
Convex optimization techniques for fitting sparse gaussian graphical models
- In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... We consider the problem of fitting a large-scale covariance matrix to multivariate Gaussian data in such a way that the inverse is sparse, thus providing model selection. Beginning with a dense empirical covariance matrix, we solve a maximum likelihood problem with an l1-norm penalty term added to e ..."
Abstract
-
Cited by 31 (0 self)
- Add to MetaCart
We consider the problem of fitting a large-scale covariance matrix to multivariate Gaussian data in such a way that the inverse is sparse, thus providing model selection. Beginning with a dense empirical covariance matrix, we solve a maximum likelihood problem with an l1-norm penalty term added to encourage sparsity in the inverse. For models with tens of nodes, the resulting problem can be solved using standard interior-point algorithms for convex optimization, but these methods scale poorly with problem size. We present two new algorithms aimed at solving problems with a thousand nodes. The first, based on Nesterov’s first-order algorithm, yields a rigorous complexity estimate for the problem, with a much better dependence on problem size than interior-point methods. Our second algorithm uses block coordinate descent, updating row/columns of the covariance matrix sequentially. Experiments with genomic data show that our method is able to uncover biologically interpretable connections among genes. 1.
A robust procedure for gaussian graphical model search from microarray data with p larger than n
- Journal of Machine Learning Research
, 2006
"... Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a conse ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.
Projected Subgradient Methods for Learning Sparse Gaussians
"... Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a high-dimensional space. Our approach uses the ℓ1-norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient meth ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a high-dimensional space. Our approach uses the ℓ1-norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the ℓ1-regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains—biological network analysis and a 2D-shape modeling image task. 1
Feature-inclusion stochastic search for gaussian graphical models
, 2007
"... We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be sup ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
We describe a serial algorithm called feature-inclusion stochastic search, or FINCS, that uses online estimates of edge-inclusion probabilities to inform the process of Bayesian model determination in Gaussian graphical models. FINCS is compared to Metropolis-based search methods and found to be superior along a variety of dimensions, leading to more accurate and less volatile estimates of edge-inclusion probabilities and greater speed in finding good models. Though FINCS is conceived as a method for characterizing model uncertainty in moderate-dimensional problems, we also find that it performs well as a stochastic hill-climber in bigger problems. We illustrate its use on an example involving mutual-fund data, where we compare the model-averaged predictive performance of models discovered with FINCS to those discovered with the Metropolis algorithm.
Understanding the use of unlabelled data in predictive modelling
- Statistical Science
, 2006
"... The incorporation of unlabelled data in statistical machine learning methods for prediction, including regression and classification, has demonstrated the potential for improved accuracy in prediction in a number of recent examples. The statistical basis for this semi-supervised analysis does not, h ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
The incorporation of unlabelled data in statistical machine learning methods for prediction, including regression and classification, has demonstrated the potential for improved accuracy in prediction in a number of recent examples. The statistical basis for this semi-supervised analysis does not, however, appear to have been well delineated in the literature to date. Nor, perhaps, are statisticians as fully engaged in the vigourous research in this area of machine learning as might be desired. Much of the theoretical work in the literature has focused, for ex-ample, on geometric and structural properties of the unlabeled data in the context of particular algorithms, rather than probabilistic and statistical questions. This paper overviews the fun-damental statistical foundations for predictive modelling and the general questions associated with unlabelled data, highlighting the relevance of venerable concepts of sampling design and prior specification. This theory, illustrated with a series of simple but central examples, shows precisely when, why and how unlabelled data matter.
ABSTRACT Temporal Causal Modeling with Graphical Granger Methods
"... The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical mod ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of “Granger causality”, based on the intuition that a cause helps predict its effects in the future, has gained attention in many domains involving time series data analysis. With the surge of interest in model selection methodologies for regression, such as the Lasso, as practical alternatives to solving structural learning of graphical models, the question arises whether and how to combine these two notions into a practically viable approach for temporal causal modeling. In this paper, we examine a host of related
Sparse Statistical Modelling in Gene Expression Genomics
, 2006
"... The concept of sparsity is more and more central to practical data analysis and inference with increasingly high-dimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for large-scale regression, ANOVA and latent f ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The concept of sparsity is more and more central to practical data analysis and inference with increasingly high-dimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for large-scale regression, ANOVA and latent factor models, we have extended traditional Bayesian “variable selection” priors and modelling ideas to new hierarchical sparsity priors that are providing substantial practical gains in addressing false discovery and isolating significant gene-specific parameters/effects in highly multivariate studies involving thousands of genes. We discuss and review these developments, in the contexts of multivariate regression, ANOVA and latent factor models for multivariate gene expression data arising in either observational or designed experimental studies. The development includes the use of sparse regression components to provide gene-sample specific normalisation/correction based on control and housekeeping factors, an important general issue and one that can be critical- and critically misleading if ignored- in many gene expression studies. Two rich data sets are used to provide context and illustration. The first data set arises from a gene expression experiment designed to investigate the transcriptional response- in terms of responsive gene subsets and their expression signatures- to interventions that up-regulate a series of key oncogenes. The second data set is observational, breast cancer tumour-derived data evaluated utilising a sparse latent factor model to define and isolate factors underlying the hugely complex patterns of association in gene expression patterns. We also mention software that implements these and other models and methods in one comprehensive framework.
Efficient markov network structure discovery using independence tests
- In Proc SIAM Data Mining
, 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s well-known properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition

