Results 1  10
of
13
Finding Optimal Gene Networks Using Biological Constraints
 Genome Informatics
, 2003
"... The accurate estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Since the problem of estimating gene networks is NPhard and exhibits a search space of superexponential size, researchers are using heuristic algorithms for this task. ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
The accurate estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Since the problem of estimating gene networks is NPhard and exhibits a search space of superexponential size, researchers are using heuristic algorithms for this task. However, little can be said about the accuracy of heuristic estimations. In order to overcome this problem, we present a general approach to reduce the search space to a biologically meaningful subspace and to find optimal solutions within the subspace in linear time. We show the e#ectiveness of this approach in application to yeast and Bacillus subtilis data.
Finding Optimal Bayesian Network Given a SuperStructure
"... Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independenc ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the searchandscore phase. Subsequently, we theorize the structural constraint by introducing the concept of superstructure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a superstructure constrained optimal search (COS): its time complexity is upper bounded by O(γm n), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree ˜m and sparse structures allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound superstructure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the tradeoff between speed and accuracy. For incomplete superstructures, a greedily postprocessed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, superstructure, connected 1.
Applying dynamic bayesian networks to perturbed gene expression data
 BMC bioinformatics
, 2006
"... Abstract Motivation: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayes ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract Motivation: A central goal of molecular biology is to understand the regulatory mechanisms of gene transcription and protein synthesis. Because of their solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way, Bayesian networks appear attractive in the field of inferring gene interactions structure from microarray experiments data. However, the basic formalism has some disadvantages, e.g. it is sometimes hard to distinguish between the origin and the object of an interaction. Two kinds of microarray experiments yield data particularly rich in information regarding the direction of interactions: time series and perturbation experiments. In order to correctly handle them, the basic formalism must be modified. For example, dynamic Bayesian networks apply to time series microarray data. Results: We extend the framework of dynamic Bayesian networks in order to handle perturbations. A new discretization method, specialized for datasets from time series perturbations experiments, is also introduced. We compare networks inferred from realistic simulations data by our method and by dynamic Bayesian networks learning techniques. We conclude that application of our method substantially improves inferring. 1 Introduction As most genetic regulatory systems involve many components connected through complex networks of interactions, formal methods and computer tools for modeling and simulating are needed. Therefore, various formalisms were proposed to describe genetic regulatory systems, including Boolean networks and their generalizations, ordinary and partial differential equations, stochastic equations and Bayesian networks (see [4] for a review). While differential and stochastic equations describe the biophysical processes at a very refined level of detail and prove useful in simulations of well studied systems, Bayesian networks appear attractive in the field of inferring the regulatory network structure from gene expression data. The reason is that their learning techniques have solid basis in statistics, allowing to deal with the stochastic aspects of gene expressions and noisy measurements in a natural way.
Optimal search on clustered structural constraint for learning Bayesian network structure
 Journal of Machine Learning Research
"... We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the superstructure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To exte ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the superstructure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To extend its feasibility, we propose to divide the superstructure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for superstructures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform stateoftheart heuristic algorithms both in terms of score and structural hamming distance.
Utilizing evolutionary information and gene expression data for estimating gene networks with Bayesian network models
, 2005
"... Since microarray gene expression data do not contain sufficient information for estimating accurate gene networks, other biological information has been considered to improve the estimated networks. Recent studies have revealed that highly conserved proteins that exhibit similar expression patterns ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Since microarray gene expression data do not contain sufficient information for estimating accurate gene networks, other biological information has been considered to improve the estimated networks. Recent studies have revealed that highly conserved proteins that exhibit similar expression patterns in different organisms, have almost the same function in each organism. Such conserved proteins are also known to play similar roles in terms of the regulation of genes. Therefore, this evolutionary information can be used to refine regulatory relationships among genes, which are estimated from gene expression data. We propose a statistical method for estimating gene networks from gene expression data by utilizing evolutionarily conserved relationships between genes. Our method simultaneously estimates two gene networks of two distinct organisms, with a Bayesian network model utilizing the evolutionary information so that gene expression data of one organism helps to estimate the gene network of the other. We show the effectiveness of the method through the analysis on Saccharomyces cerevisiae and Homo sapiens cell cycle gene expression data. Our method was successful in estimating gene networks that capture many known relationships as well as several unknown relationships which are likely to be novel. Supplementary information is available at
Increasing Feasibility of Optimal Gene Network Estimation
 Genome Informatics
, 2004
"... Disentangling networks of regulation of gene expression is a major challenge in the field of computational biology. Harvesting the information contained in microarray data sets is a promising approach towards this challenge. We propose an algorithm for the optimal estimation of Bayesian networks fro ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Disentangling networks of regulation of gene expression is a major challenge in the field of computational biology. Harvesting the information contained in microarray data sets is a promising approach towards this challenge. We propose an algorithm for the optimal estimation of Bayesian networks from microarray data, which reduces the CPU time and memory consumption of previous algorithms. We prove that the space complexity can be reduced from O(n )toO(2 ), and that the expected calculation time can be reduced from O(n )toO(n ), where n is the number of genes. We make intrinsic use of a limitation of the maximal number of regulators of each gene, which has biological as well as statistical justifications. The improvements are significant for some applications in research.
Methods to Accelerate the Learning of Bayesian Network Structures
 PROCEEDINGS OF THE 2007 UK WORKSHOP ON COMPUTATIONAL INTELLIGENCE
, 2007
"... Bayesian networks have become a standard technique in the representation of uncertain knowledge. This paper proposes methods that can accelerate the learning of a Bayesian network structure from a data set. These methods are applicable when learning an equivalence class of Bayesian network structure ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Bayesian networks have become a standard technique in the representation of uncertain knowledge. This paper proposes methods that can accelerate the learning of a Bayesian network structure from a data set. These methods are applicable when learning an equivalence class of Bayesian network structures whilst using a score and search strategy. They work by constraining the number of validity tests that need to be done and by caching the results of validity tests. The results of experiments show that the methods improve the performance of algorithms that search through the space of equivalence classes multiple times and that operate on wide data sets. The experiments were performed by sampling data from six standard Bayesian networks and running an ant colony optimization algorithm designed to learn a Bayesian network equivalence class. 1
Residual Bootstrapping and Median Filtering for Robust Estimation of Gene Networks from Microarray Data
"... Abstract. We propose a robust estimation method of gene networks based on microarray gene expression data. It is wellknown that microarray data contain a large amount of noise and some outliers that interrupt the estimation of accurate gene networks. In addition, some relationships between genes ar ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. We propose a robust estimation method of gene networks based on microarray gene expression data. It is wellknown that microarray data contain a large amount of noise and some outliers that interrupt the estimation of accurate gene networks. In addition, some relationships between genes are nonlinear, and linear models thus are not enough for capturing such a complex structure. In this paper, we utilize the moving boxcel median filter and the residual bootstrap for constructing a Bayesian network in order to attain robust estimation of gene networks. We conduct Monte Carlo simulations to examine the properties of the proposed method. We also analyze Saccharomyces cerevisiae cell cycle data as a real data example. 1
Parallel Algorithm for Learning Optimal Bayesian Network Structure
"... We present a parallel algorithm for the scorebased optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n·2 n) time and space complexity, which is known to be the fastest algorithm for the optimal structure search of networks with n ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a parallel algorithm for the scorebased optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n·2 n) time and space complexity, which is known to be the fastest algorithm for the optimal structure search of networks with n nodes. The bottleneck of the problem is the memory requirement, and therefore, the algorithm is currently applicable for up to a few tens of nodes. While the recently proposed algorithm overcomes this limitation by a spacetime tradeoff, our proposed algorithm realizes direct parallelization of the original DP algorithm with O(n σ) time and space overhead calculations, where σ>0 controls the communicationspace tradeoff. The overall time and space complexity is O(n σ+1 2 n). This algorithm splits the search space so that the required communication between independent calculations is minimal. Because of this advantage, our algorithm can run on distributed memory supercomputers. Through computational experiments, we confirmed that our algorithm can run in parallel using up to 256 processors with a parallelization efficiency of 0.74, compared to the original DP algorithm with a single processor. We also demonstrate optimal structure search for a 32node network without any constraints, which is the largest network search presented in literature.
Enumeration of Likely Gene Networks and Network Motif Extraction for Large Gene Networks
, 2003
"... Introduction The reliable estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Recently, an algorithm for the optimal estimation of small gene networks within the Bayesian network framework was found [3]. This algorithm was further exten ..."
Abstract
 Add to MetaCart
Introduction The reliable estimation of gene networks from gene expression measurements is a major challenge in the field of Bioinformatics. Recently, an algorithm for the optimal estimation of small gene networks within the Bayesian network framework was found [3]. This algorithm was further extended to allow the enumeration of all optimal networks and also suboptimal networks in the order of their likelihood [2]. In this work, we show how this result can be applied to the enumeration of likely gene networks for a large number of genes. Enumerating a number of the most likely gene network models instead of just focusing on the single most likely network model allows to evaluate the reliability of the estimations. If we can find a partial network that is common to most of the likely network models, we can expect this part to be the most reliable part. We denote such common parts as gene network motifs. 2 Method Let us start with defining a class of subsets of the set of acyclic dir