Results 1  10
of
48
Growing Bayesian Network Models of Gene Networks from Seed Genes
, 2005
"... Motivation: For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene expression data is problematic: Most gene expression databases contain measurements for thousands of gene ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
Motivation: For the last few years, Bayesian networks (BNs) have received increasing attention from the computational biology community as models of gene networks, though learning them from gene expression data is problematic: Most gene expression databases contain measurements for thousands of genes, but the existing algorithms for learning BNs from data do not scale to such highdimensional databases. This means that the user has to decide in advance which genes are included in the learning process, typically no more than a few hundreds, and which genes are excluded from it. This is not a trivial decision. We propose an alternative approach to overcome this problem. Results: We propose a new algorithm for learning BN models of gene networks from gene expression data. Our algorithm receives a seed gene S and a positive integer R from the user, and returns a BN for those genes that depend on S such that less than R other genes mediate the dependency. Our algorithm grows the BN, which initially only contains S, by repeating the following step R+1 times and, then, pruning some genes: Find the parents and children of all the genes in the BN and add them to it. Intuitively, our algorithm provides the user with a window of radius R around S to look at the BN model of a gene network without having to exclude any gene in advance. We prove that our algorithm is correct under the faithfulness assumption. We evaluate our algorithm on simulated and biological data (Rosetta compendium) with satisfactory results. Contact:
Applying dynamic bayesian networks to perturbed gene expression data
 BMC bioinformatics
, 2006
"... Abstract Motivation: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayes ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Abstract Motivation: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the object of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks apply to time series microarray data. Results: We extend the framework of dynamic Bayesian networks in order to handle perturbations. A new discretization method, specialized for datasets from time series perturbations experiments, is also introduced. We compare networks inferred from realistic simulations data by our method and by dynamic Bayesian networks learning techniques. We conclude that application of our method substantially improves inferring. 1 Introduction As most genetic regulatory systems involve many components connected through complex networks of interactions, formal methods and computer tools for modeling and simulating are needed. Therefore, various formalisms were proposed to describe genetic regulatory systems, including Boolean networks and their generalizations, ordinary and partial differential equations, stochastic equations and Bayesian networks (see [4] for a review). While differential and stochastic equations describe the biophysical processes at a very refined level of detail and prove useful in simulations of well studied systems, Bayesian networks appear attractive in the field of inferring the regulatory network structure from gene expression data. The reason is that their learning techniques have solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way.
Finding Optimal Gene Networks Using Biological Constraints
 Genome Informatics
, 2003
"... The accurate estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Since the problem of estimating gene networks is NPhard and exhibits a search space of superexponential size, researchers are using heuristic algorithms for this task. ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
(Show Context)
The accurate estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Since the problem of estimating gene networks is NPhard and exhibits a search space of superexponential size, researchers are using heuristic algorithms for this task. However, little can be said about the accuracy of heuristic estimations. In order to overcome this problem, we present a general approach to reduce the search space to a biologically meaningful subspace and to find optimal solutions within the subspace in linear time. We show the e#ectiveness of this approach in application to yeast and Bacillus subtilis data.
Improving the scalability of optimal Bayesian network learning with externalmemory frontier breadthfirst branch and bound search
 IN PROCEEDINGS OF THE 27TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
"... Previous work has shown that the problem of learning the optimal structure of a Bayesian network can be formulated as a shortest path finding problem in a graph and solved using A* search. In this paper, we improve the scalability of this approach by developing a memoryefficient heuristic search ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
Previous work has shown that the problem of learning the optimal structure of a Bayesian network can be formulated as a shortest path finding problem in a graph and solved using A* search. In this paper, we improve the scalability of this approach by developing a memoryefficient heuristic search algorithm for learning the structure of a Bayesian network. Instead of using A*, we propose a frontier breadthfirst branch and bound search that leverages the layered structure of the search graph of this problem so that no more than two layers of the graph, plus solution reconstruction information, need to be stored in memory at a time. To further improve scalability, the algorithm stores most of the graph in external memory, such as hard disk, when it does not fit in RAM. Experimental results show that the resulting algorithm solves significantly larger problems than the current state of the art.
Finding Optimal Bayesian Network Given a SuperStructure
"... Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independenc ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the searchandscore phase. Subsequently, we theorize the structural constraint by introducing the concept of superstructure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a superstructure constrained optimal search (COS): its time complexity is upper bounded by O(γm n), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree ˜m and sparse structures allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound superstructure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the tradeoff between speed and accuracy. For incomplete superstructures, a greedily postprocessed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, superstructure, connected 1.
Learning Optimal Bayesian Networks: A Shortest Path Perspective
, 2013
"... In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving the shortest path problem, and the design of heuristic functions for guiding the search. This paper introduces several techniques for addressing the issues. One is an A * search algorithm that learns an optimal Bayesian network structure by only searching the most promising part of the solution space. The others are mainly two heuristic functions. The first heuristic function represents a simple relaxation of the acyclicity constraint of a Bayesian network. Although admissible and consistent, the heuristic may introduce too much relaxation and result in a loose bound. The second heuristic function reduces the amount of relaxation by avoiding directed cycles within some groups of variables. Empirical results show that these methods constitute a promising approach to learning optimal Bayesian network structures.
An improved admissible heuristic for learning optimal Bayesian networks
 IN PROCEEDINGS OF THE 28TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI12
, 2012
"... Recently two search algorithms, A* and breadthfirst branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recently two search algorithms, A* and breadthfirst branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper.
Evaluating Anytime Algorithms for Learning Optimal Bayesian Networks
 In Proceedings of the 29th Conference on Uncertainty in Artificial Intelligence (UAI13
, 2013
"... Exact algorithms for learning Bayesian networks guarantee to find provably optimal networks. However, they may fail in difficult learning tasks due to limited time or memory. In this research we adapt several anytime heuristic searchbased algorithms to learn Bayesian networks. These algorithms find ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Exact algorithms for learning Bayesian networks guarantee to find provably optimal networks. However, they may fail in difficult learning tasks due to limited time or memory. In this research we adapt several anytime heuristic searchbased algorithms to learn Bayesian networks. These algorithms find highquality solutions quickly, and continually improve the incumbent solution or prove its optimality before resources are exhausted. Empirical results show that the anytime window A * algorithm usually finds higherquality, often optimal, networks more quickly than other approaches. The results also show that, surprisingly, while generating networks with few parents per variable are structurally simpler, they are harder to learn than complex generating networks with more parents per variable. 1
Optimal search on clustered structural constraint for learning Bayesian network structure
 Journal of Machine Learning Research
"... We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the superstructure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To exte ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the superstructure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To extend its feasibility, we propose to divide the superstructure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for superstructures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform stateoftheart heuristic algorithms both in terms of score and structural hamming distance.