Results 1  10
of
66
Using Bayesian networks to analyze expression data
 Journal of Computational Biology
, 2000
"... DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biologica ..."
Abstract

Cited by 731 (16 self)
 Add to MetaCart
DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graphbased model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cellcycle measurements of Spellman et al. (1998). Key words: gene expression, microarrays, Bayesian methods. 1.
Estimating highdimensional directed acyclic graphs with the pcalgorithm
 Journal of Machine Learning Research
, 2005
"... We consider the PCalgorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very highdimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PCalgorithm is computationally feasible and often very fast for sparse problems with many nodes ..."
Abstract

Cited by 50 (5 self)
 Add to MetaCart
We consider the PCalgorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very highdimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PCalgorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very highdimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n a) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PCalgorithm for simulated data. Keywords: asymptotic consistency, DAG, graphical model, PCalgorithm, skeleton 1.
Learning graphical model structure using L1regularization paths
 In Proceedings of the 21st Conference on Artificial Intelligence (AAAI
, 2007
"... Sparsitypromoting L1regularization has recently been succesfully used to learn the structure of undirected graphical models. In this paper, we apply this technique to learn the structure of directed graphical models. Specifically, we make three contributions. First, we show how the decomposability ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
Sparsitypromoting L1regularization has recently been succesfully used to learn the structure of undirected graphical models. In this paper, we apply this technique to learn the structure of directed graphical models. Specifically, we make three contributions. First, we show how the decomposability of the MDL score, plus the ability to quickly compute entire regularization paths, allows us to efficiently pick the optimal regularization parameter on a pernode basis. Second, we show how to use L1 variable selection to select the Markov blanket, before a DAG search stage. Finally, we show how L1 variable selection can be used inside of an order search algorithm. The effectiveness of these L1based approaches are compared to current state of the art methods on 10 datasets.
Structure Learning of Bayesian Networks using Constraints
"... This paper addresses exact learning of Bayesian network structure from data and expert’s knowledge based on score functions that are decomposable. First, it describes useful properties that strongly reduce the time and memory costs of many known methods such as hillclimbing, dynamic programming and ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
This paper addresses exact learning of Bayesian network structure from data and expert’s knowledge based on score functions that are decomposable. First, it describes useful properties that strongly reduce the time and memory costs of many known methods such as hillclimbing, dynamic programming and sampling variable orderings. Secondly, a branch and bound algorithm is presented that integrates parameter and structural constraints with data in a way to guarantee global optimality with respect to the score function. It is an anytime procedure because, if stopped, it provides the best current solution and an estimation about how far it is from the global solution. We show empirically the advantages of the properties and the constraints, and the applicability of the algorithm to large data sets (up to one hundred variables) that cannot be handled by other current methods (limited to around 30 variables). 1.
Efficient markov network structure discovery using independence tests
 In Proc SIAM Data Mining
, 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the GrowShrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s wellknown properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition
Bayesian structure learning using dynamic programming and MCMC
 In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Modelling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model
"... We present a novel approach for detecting global behaviour anomalies in multiple disjoint cameras by learning time delayed dependencies between activities cross camera views. Specifically, we propose to model multicamera activities using a Time Delayed Probabilistic Graphical Model (TDPGM) with di ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
We present a novel approach for detecting global behaviour anomalies in multiple disjoint cameras by learning time delayed dependencies between activities cross camera views. Specifically, we propose to model multicamera activities using a Time Delayed Probabilistic Graphical Model (TDPGM) with different nodes representing activities in different semantically decomposed regions from different camera views, and the directed links between nodes encoding causal relationships between the activities. A novel twostage structure learning algorithm is formulated to learn globally optimised timedelayed dependencies. A new cumulative abnormality score is also introduced to replace the conventional loglikelihood score for gaining significantly more robust and reliable realtime anomaly detection. The effectiveness of the proposed approach is validated using a camera network installed at a busy underground station. 1.
A Recursive Method for Structural Learning of Directed Acyclic Graphs
"... In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively in ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method.
Optimal search on clustered structural constraint for learning Bayesian network structure
 Journal of Machine Learning Research
"... We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the superstructure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To exte ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We study the problem of learning an optimal Bayesian network in a constrained search space; skeletons are compelled to be subgraphs of a given undirected graph called the superstructure. The previously derived constrained optimal search (COS) remains limited even for sparse superstructures. To extend its feasibility, we propose to divide the superstructure into several clusters and perform an optimal search on each of them. Further, to ensure acyclicity, we introduce the concept of ancestral constraints (ACs) and derive an optimal algorithm satisfying a given set of ACs. Finally, we theoretically derive the necessary and sufficient sets of ACs to be considered for finding an optimal constrained graph. Empirical evaluations demonstrate that our algorithm can learn optimal Bayesian networks for some graphs containing several hundreds of vertices, and even for superstructures having a high average degree (up to four), which is a drastic improvement in feasibility over the previous optimal algorithm. Learnt networks are shown to largely outperform stateoftheart heuristic algorithms both in terms of score and structural hamming distance.
Convergence of Estimation of Distribution Algorithms for Finite Samples
"... Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. Our algorithm FDA assumes that the function to be optimized is additively decomposed (ADF). The interaction graph GADF is used to create exact or approximate factorizations of the Boltzmann distribu ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Estimation of Distribution Algorithms (EDA) have been proposed as an extension of genetic algorithms. Our algorithm FDA assumes that the function to be optimized is additively decomposed (ADF). The interaction graph GADF is used to create exact or approximate factorizations of the Boltzmann distribution. Using Gibbs sampling instead of probabilistic logic sampling is investigated. We also discuss the algorithm LFDA which learns a Bayesian network from data. For both algorithms estimates of the necessary sample size N to find the optimum are derived. The bounds are based on statistical learning theory and PAC learning. If the assumptions of a factorization theorem are fulfilled, the upper bound of the sample size N of FDA is of order O(n ln n) where n is the size of the problem. The computational complexity per generation is O(N ∗ n). For LFDA a bound cannot be proven because the network learned might be far from optimal. In many applications the optimal network is not necessary for converge to the global optima. For the 2D Ising model only 60 % of the edges of GADF need to be contained in the learned graph. Bounds can be obtained for two new learning methods. The first one learns factor graphs instead of Bayesian networks, the second one detects the structure of the function by computing its Walsh or Fourier coefficients. The computational complexity to compute the Walsh coefficients is O(n 2 ln n). The networks computed by FDA and LFDA are analyzed for a set of benchmark functions.