Results 1  10
of
167
ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context
 BMC Bioinformatics
, 2006
"... * To whom correspondence should be addressed ..."
Tractable learning of large bayes net structures from sparse data
, 2004
"... statistics for creating the global Bayes Net. This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: ho ..."
Abstract

Cited by 29 (4 self)
 Add to MetaCart
statistics for creating the global Bayes Net. This paper addresses three questions. Is it useful to attempt to learn a Bayesian network structure with hundreds of thousands of nodes? How should such structure search proceed practically? The third question arises out of our approach to the second: how can Frequent Sets (Agrawal et al., 1993), which are extremely popular in the area of descriptive data mining, be turned into a probabilistic model? Large sparse datasets with hundreds of thousands of records and attributes appear in social networks, warehousing, supermarket transactions and web logs. The complexity of structural search made learning of factored probabilistic models on such datasets unfeasible. We propose to use Frequent Sets to significantly speed up the structural search. Unlike previous approaches, we not only cache nway sufficient statistics, but also exploit their local structure. We also present an empirical evaluation of our algorithm applied to several massive datasets.
A robust procedure for gaussian graphical model search from microarray data with p larger than n
 Journal of Machine Learning Research
, 2006
"... Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a conse ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are fullorder partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of fullorder partial correlations does not exist. In this paper we consider limitedorder partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limitedorder partial correlations, that we call the nonrejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.
Generating realistic in silico gene networks for performance assessment of reverse engineering methods
 J. Comput. Biol
"... This eprint is identical in content to the postprint of this article, which is available at www.liebertonline.com/cmb. Related articles are available ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This eprint is identical in content to the postprint of this article, which is available at www.liebertonline.com/cmb. Related articles are available
BeliefPropagation for Weighted bMatchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions
 in arXiv, http://www.arxiv.org/abs/0709.1190v1
, 2007
"... We consider the general problem of finding the minimum weight bmatching on arbitrary graphs. We prove that, whenever the linear programming (LP) relaxation of the problem has no fractional solutions, then the belief propagation (BP) algorithm converges to the correct solution. This result is notabl ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
We consider the general problem of finding the minimum weight bmatching on arbitrary graphs. We prove that, whenever the linear programming (LP) relaxation of the problem has no fractional solutions, then the belief propagation (BP) algorithm converges to the correct solution. This result is notable in several regards: (1) It is one of a very small number of proofs showing correctness of BP without any constraint on the graph structure. (2) Instead of showing that BP leads to a PTAS, we give a finite bound for the number of iterations after which BP has converged to the exact solution. (3) Variants of the proof work for both synchronous and asynchronous BP; to the best of our knowledge, it is the first proof of convergence and correctness of an asynchronous BP algorithm for a combinatorial optimization problem. (4) It works for both ordinary bmatchings and the more difficult case of perfect bmatchings. (5) Together with the recent work of Sanghavi, Malioutov and Wilskly [41] they are the first complete proofs showing that tightness of LP implies correctness of BP. 1
Identifying functional modules in proteinprotein interaction networks: an integrated exact approach
 Bioinformatics
, 2008
"... Motivation: With the exponential growth of expression and proteinprotein interaction (PPI) data, the frontier of research in system biology shifts more and more to the integrated analysis of these large datasets. Of particular interest is the identification of functional modules in PPI networks, sha ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Motivation: With the exponential growth of expression and proteinprotein interaction (PPI) data, the frontier of research in system biology shifts more and more to the integrated analysis of these large datasets. Of particular interest is the identification of functional modules in PPI networks, sharing common cellular function beyond the scope of classical pathways, by means of detecting differentially expressed regions in PPI networks. This requires on the one hand an adequate scoring of the nodes in the network to be identified and on the other hand the availability of an effective algorithm to find the maximally scoring network regions. Various heuristic approaches have been proposed in the literature. Results: Here we present the first exact solution for this problem, which is based on integer linear programming and its connection to the wellknown prizecollecting Steiner tree problem from Operations
Entropy Inference and the JamesStein Estimator, with Application to Nonlinear Gene Association Networks
"... We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly effic ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and datagenerating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropybased geneassociation network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, JamesStein estimator, “small n, large p ” setting, mutual information, gene association network
Multiple testing and error control in Gaussian graphical model selection
 Statistical Science
"... Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of cond ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of conditional independences that is imposed on the variables ’ joint distribution. Focusing on Gaussian models, we review classical graphical models. For these models the defining conditional independences are equivalent to vanishing of certain (partial) correlation coefficients associated with individual edges that are absent from the graph. Hence, Gaussian graphical model selection can be performed by multiple testing of hypotheses about vanishing (partial) correlation coefficients. We show and exemplify how this approach allows one to perform model selection while controlling error rates for incorrect edge inclusion. Key words and phrases: Acyclic directed graph, Bayesian network, bidirected graph, chain graph, concentration graph, covariance graph, DAG, graphical model, multiple testing, undirected graph. 1.