Results 11  20
of
47
Assessing the validity domains of graphical Gaussian models in order to infer relationships among components of complex biological systems
, 2008
"... Abstract. The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. So, various methods were recently developed in this purpose. While most of them combine different types of data and ¡em¿a priori¡/em ¿ knowledge, ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. So, various methods were recently developed in this purpose. While most of them combine different types of data and ¡em¿a priori¡/em ¿ knowledge, methods based on Graphical Gaussian Models are capable of learning the network directly from raw data. They consider the fullorder partial correlations which are partial correlations between two variables given the remaining ones, for modelling direct links between variables. Statistical methods were developed for estimating these links when the number of observations is larger than the number of variables. However, the rapid advance of new technologies that allow to simultaneous measure genome expression, led to largescale datasets where the number of variables is far larger than the number of observations. To get round this dimensionality problem, different strategies and new statistical methods were proposed. In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationship between two variables is very small in regards to the number of possible relationships, ¡em¿p(p1)/2¡/em¿. In the biological context, this assumption is not always satisfied over the whole graph. So it is essential to precisely know the behaviour of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wideranging simulated datasets. We then illustrated our results using recently published biological data.
Penalized Likelihood Methods for Estimation of sparse high dimensional directed acyclic graphs
, 2010
"... Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system o ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NPhard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of both the lasso, as well as the adaptive lasso penalties in high dimensional sparse settings, and propose an errorbased choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions. Simulation studies indicate that the correct ordering of the variables becomes less critical in estimation of high dimensional sparse networks.
Learning gaussian graphical models of gene networks with false discovery rate control
 In 6th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
"... Abstract. In many cases what matters is not whether a false discovery is made or not but the expected proportion of false discoveries among all the discoveries made, i.e. the socalled false discovery rate (FDR). We present an algorithm aiming at controlling the FDR of edges when learning Gaussian g ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. In many cases what matters is not whether a false discovery is made or not but the expected proportion of false discoveries among all the discoveries made, i.e. the socalled false discovery rate (FDR). We present an algorithm aiming at controlling the FDR of edges when learning Gaussian graphical models (GGMs). The algorithm is particularly suitable when dealing with more nodes than samples, e.g. when learning GGMs of gene networks from gene expression data. We illustrate this on the Rosetta compendium [8]. 1
Hifh dimensional sparse covariance estimation via directed acyclic graphs
, 2009
"... We present a graphbased technique for estimating sparse covariance matrices and their inverses from highdimensional data. The method is based on learning a directed acyclic graph (DAG) and estimating parameters of a multivariate Gaussian distribution based on a DAG. For inferring the underlying DA ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We present a graphbased technique for estimating sparse covariance matrices and their inverses from highdimensional data. The method is based on learning a directed acyclic graph (DAG) and estimating parameters of a multivariate Gaussian distribution based on a DAG. For inferring the underlying DAG we use the PCalgorithm [27] and for estimating the DAGbased covariance matrix and its inverse, we use a Cholesky decomposition approach which provides a positive (semi)definite sparse estimate. We present a consistency result in the highdimensional framework and we compare our method with the Glasso [12, 8, 2] for simulated and real data.
Structure learning with independent nonidentically distributed data
"... There are well known algorithms for learning the structure of directed and undirected graphical models from data, but nearly all assume that the data consists of a single i.i.d. sample. In contexts such as fMRI analysis, data may consist of an ensemble of independent samples from a common data gener ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
There are well known algorithms for learning the structure of directed and undirected graphical models from data, but nearly all assume that the data consists of a single i.i.d. sample. In contexts such as fMRI analysis, data may consist of an ensemble of independent samples from a common data generating mechanism which may not have identical distributions. Pooling such data can result in a number of well known statistical problems so each sample must be analyzed individually, which offers no increase in power due to the presence of multiple samples. We show how existing constraint based methods can be modified to learn structure from the aggregate of such data in a statistically sound manner. The prescribed method is simple to implement and based on existing statistical methods employed in metaanalysis and other areas, but works surprisingly well in this context where there are increased concerns due to issues such as retesting. We report results for directed models, but the method given is just as applicable to undirected models. 1.
Automated search for causal relations: Theory and practice
 Heuristics, Probability and Causality: A Tribute to Judea Pearl
, 2010
"... The rapid spread of interest in the last two decades in principled methods of search or estimation of causal relations has been driven in part by technological developments, especially the changing nature of modern data collection and storage techniques, and the increases in the speed and storage ca ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The rapid spread of interest in the last two decades in principled methods of search or estimation of causal relations has been driven in part by technological developments, especially the changing nature of modern data collection and storage techniques, and the increases in the speed and storage capacities of computers. Statistics books from 30 years
Machine Learning in cancer research: implications for personalised medicine
"... Abstract. Driven by the growing demand of personalization of medical procedures, databased, computeraided cancer research in human patients is advancing at an accelerating pace, providing a broadening landscape of opportunity for Machine Learning methods. This landscape can be observed from the wi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Driven by the growing demand of personalization of medical procedures, databased, computeraided cancer research in human patients is advancing at an accelerating pace, providing a broadening landscape of opportunity for Machine Learning methods. This landscape can be observed from the widereaching view of population studies down to the genotype detail. In this brief paper, we provide a sweeping glimpse, by no means exhaustive, of the stateoftheart in this field at the different scales of data measurement and analysis. 1
Graphical model, messagepassing algorithms and convex optimization. [Online]. Available: http://www.eecs.berkeley. edu/ ∼wainwrig/Talks/A GraphModel Tutorial.pdf
, 2003
"... Graphical models provide a framework for describing statistical dependencies in (possibly large) collections of random variables. At their core lie various correspondences between the conditional independence properties of a random vector, and the structure of an underlying graph used to represent i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Graphical models provide a framework for describing statistical dependencies in (possibly large) collections of random variables. At their core lie various correspondences between the conditional independence properties of a random vector, and the structure of an underlying graph used to represent its distribution. They
Covariance Estimation
, 801
"... Abstract: The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract: The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlationbased version of the method exhibits better rates in the operator norm. The estimator is required to be positive definite, but we avoid having to use semidefinite programming by reparameterizing the objective function