## Modelling gene expression data using dynamic bayesian networks (1999)

### Cached

### Download Links

Citations: | 166 - 1 self |

### BibTeX

@TECHREPORT{Murphy99modellinggene,

author = {Kevin Murphy and Saira Mian},

title = {Modelling gene expression data using dynamic bayesian networks},

institution = {},

year = {1999}

}

### Years of Citing Articles

### OpenURL

### Abstract

Recently, there has been much interest in reverse engineering genetic networks from time series data. In this paper, we show that most of the proposed discrete time models — including the boolean network model [Kau93, SS96], the linear model of D’haeseleer et al. [DWFS99], and the nonlinear model of Weaver et al. [WWS99] — are all special cases of a general class of models called Dynamic Bayesian Networks (DBNs). The advantages of DBNs include the ability to model stochasticity, to incorporate prior knowledge, and to handle hidden variables and missing data in a principled way. This paper provides a review of techniques for learning DBNs. Keywords: Genetic networks, boolean networks, Bayesian networks, neural networks, reverse engineering, machine learning. 1

### Citations

9087 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...gorithm [LFS98], and as in the Reconstructability Analysis (RA) or General Systems Theory community [Kli85, Kri86]) this is stated as finding the model in which the sum of the mutual information (MI) =-=[CT91]-=- between each node and its parents is maximal; in Appendix A, we prove that these objective functions are equivalent, in the sense that they rank models in the same order. The trouble is, the ML model... |

7407 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...these factors, even if we cannot measure their values. This prior knowledge can be used to constrain the set of possible models we learn. The models we use are called Bayesian (belief) Networks (BNs) =-=[Pea88]-=-, which have become the method of choice for representing stochastic models in the UAI (Uncertainty in Artificial Intelligence) community. In Section 2, we explain what BNs are, and show how they gene... |

5246 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...discrete) state, and discuss their relationship to the the linear model of D'haeseleer et al. [DWFS99], the nonlinear model of Weaver et al. [WWS99], and techniques from the neural network literature =-=[Bis95]-=-. 2 Bayesian Networks BNs are a special case of a more general class called graphical models in which nodes represent random variables, and the lack of arcs represent conditional independence assumpti... |

4553 | A tutorial on hidden markov models and selection applications in speech recognition
- Rabiner
- 1989
(Show Context)
Citation Context ... another random variable whose distribution depends on (and only on) X t . Hence this graph captures all and only the conditional independence assumptions that are made in a Hidden Markov Model (HMM) =-=[Rab89]-=-. In addition to the graph structure, a BN requires that we specify the Conditional Probability Distribution (CPD) of each node given its parents. In an HMM, we assume that the hidden state variables ... |

2365 |
Time Series Analysis
- Hamilton
- 1994
(Show Context)
Citation Context ...g this is X t = WX t\Gamma1 +s+st , wherestsN (0; S) are independent Gaussian noise terms, corresponding to unmodelled influences. This is called a first-order auto-regressive AR(1) time series model =-=[Ham94]-=-. If X t is a vector containing the expression levels of all the genes at time t, then this corresponds precisely to the model in [DWFS99]. (They do not explicitely mention Gaussian noise, but it is i... |

2040 | Cluster analysis and display of genome-wide expression patterns - Eisen, Spellman, et al. - 1998 |

1166 |
C.: Algorithmic graph theory and perfect graphs (2nd
- Golumbic
- 2004
(Show Context)
Citation Context ...d and undirected graphical models, see [Pea88, Whi90, Lau96]. 1 1 It is interesting to note that much of the theory underlying graphical models involves concepts such as chordal (triangulated) graphs =-=[Gol80]-=-, which also arise in other areas of computational biology, such as evolutionary tree construction (perfect phylogenies) and physical mapping (interval graphs). X1 X2 X3 Y1 Y2 Y3 X1 X2 X3 (a) (b) Figu... |

1158 |
The Origins of Order. Self-Organization and Selection in Evolution
- Kauffman
- 1993
(Show Context)
Citation Context ...ses are independent [MH97]. Another popular compact representation for CPDs in the UAI community is a decision tree [BFGK96]. This is a stochastic generalization of the concept of canalyzing function =-=[Kau93]-=-, popular in the boolean networks field. (A function is canalyzing if at least one of its inputs has the property that, when it takes a specific value, the output of the function is independent of all... |

1151 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ...(scalar) node, the inter-slice connections would correspond to the non-zero values in the weight matrix W , and undirected connections within a slice would correspond to non-zero entries in S \Gamma1 =-=[Lau96]-=-. For example, if W = / 1 1 1 1 0 0 0 1 1 ! , and S is diagonal, the model is equivalent to the one in Figure 3(a). Hence, in the case of continuous state models, we can convert back and forth between... |

965 |
An introduction to Bayesian Network
- Jensen
- 1996
(Show Context)
Citation Context ...m to compute the expected sufficient statistics (or related quantity) for each node. 5.1 Inference in BNs Inference in Bayesian networks is a huge subject which we will not go into in this paper. See =-=[Jen96]-=- for an introduction to one of the most commonly used algorithm (the junction tree algorithm). [HD94] gives a good cookbook introduction to the junction tree algorithm. [SHJ97] explains how the forwar... |

901 | Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids
- Durbin, Eddy, et al.
- 1998
(Show Context)
Citation Context ...models in the UAI (Uncertainty in Artificial Intelligence) community. In Section 2, we explain what BNs are, and show how they generalize the boolean network model [Kau93, SS96], Hidden Markov Models =-=[DEKM98]-=-, and other models widely used in the computational biology community. In Sections 3 to 7, we review various techniques for learning BNs from data, and show how REVEAL [LFS98] is a special case of suc... |

891 | A tutorial on learning with Bayesian networks
- Heckerman
- 1998
(Show Context)
Citation Context ...tage that they return a distribution over possible models instead of a single best model. Handling priors on model structures, however, is quite complicated, so we do not discuss this issue here (see =-=[Hec96]-=- for a review). Instead, we assume that the goal is to find a single model which maximizes some scoring function (discussed in Section 6.1). We will, however, consider priors on parameters, which are ... |

866 | An introduction to variational methods for graphical models
- Jordan, Gharamani, et al.
- 1998
(Show Context)
Citation Context ...discrete) BNs is computationally intractible, so we must use approximate methods. There are many approaches, including sampling methods such as MCMC [Mac98] and variational methods such as mean-field =-=[JGJS98]-=-. DBNs are even more computationally intractible in the sense that, even if the connections between two slices are sparse, correlations can arise over several time steps, thus rendering the unrolled n... |

700 |
Exploring the metabolic and genetic control of gene expression on a genomic scale
- Risi, Iyer, et al.
- 1997
(Show Context)
Citation Context ...k has 45 independent parameters, and the second has 708. 6.4 Scaling up to large networks Since there are about ns6400 genes for yeast (S. Cerevisae) , all of which can now be simultaneously measured =-=[DLB97], it is cl-=-ear that we will have to do some preprocessing to exclude the "irrelevant" genes, and just try to learn the connections between the rest. After all, many of them do not change their expressi... |

679 | Approximating Discrete Probability Distributions with Dependence Trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...(X i ) is independent of the structure of the graph G, we find L(G; D) \Gamma L(G 0 ; D) = P i I(X i ; PaG (X i )) \Gamma I(X i ; Pa G 0 (X i )). We note that this theorem is similar to the result in =-=[CL68]-=-, who show that the optimal 10 tree-structured MRF is a maximal weight spanning tree (MWST) of the graph in which the weight of an arc between two nodes, X i and X j , is I(X i ; X j ). [MJM98] extend... |

461 | Graphical Models in Applied Multivariate Statistics - Whittaker - 1990 |

442 | Optimal brain damage
- LeCun, Denker, et al.
- 1990
(Show Context)
Citation Context ...ve examining the change in the error function due to small changes in the values of the weights. This requires computing the Hessian H of the error surface. In the technique of "optimal brain dam=-=age" [CDS90], they ass-=-ume H is diagonal; in the more sophisticated technique of "optimal brain surgeon" [HS93], they remove this assumption. Since we believe (hope!) that the true model is sparse (i.e., most entr... |

302 | Context-specific independence in Bayesian networks
- Boutilier, Friedman, et al.
- 1996
(Show Context)
Citation Context ... [Hen89, Sri93, HB94]. It is also possible to loosen the assumption that all the causes are independent [MH97]. Another popular compact representation for CPDs in the UAI community is a decision tree =-=[BFGK96]-=-. This is a stochastic generalization of the concept of canalyzing function [Kau93], popular in the boolean networks field. (A function is canalyzing if at least one of its inputs has the property tha... |

294 | Bucket elimination: A unifying framework for probabilistic inference
- Dechter
- 1996
(Show Context)
Citation Context ... the junction tree algorithm. [SHJ97] explains how the forwards-backwards algorithm is a special case of the junction tree algorithm, and might be a good place to start if you are familiar with HMMs. =-=[Dec98]-=- would be a good place to start if you are familiar with the peeling algorithm (although the junction tree approach is much more efficient for learning). [Mur99] discusses inference in BNs with discre... |

269 | Tractable inference for complex stochastic processes - Boyen, Koller - 1998 |

256 | Operations for learning with graphical models - Buntine - 1994 |

251 |
Stochastic mechanisms in gene expression
- McAdams, Arkin
- 1997
(Show Context)
Citation Context ...96], all of which are deterministic and fully observable. The fact that our models are stochastic is very important, since it is well known that gene expression is an inherently stochastic phenomenon =-=[MA97]-=-. In addition, even if the underlying system were deterministic, it might appear stochastic due to our inability to perfectly measure all the variables. Hence it is crucial that our learning algorithm... |

243 | Learning Bayesian networks with local structure
- Friedman, Goldszmidt
- 1998
(Show Context)
Citation Context ...idden variable, which makes things more complicated (see Section 5). Alternatively, we can use a decision tree [BFGK96], or a table of parent values along with their associated non-zero probabilities =-=[FG96]-=-, to represent the CPD. This can increase the number of free parameters gradually, from 1 to 2 k , where k is the number of parents. 5 Known structure, partial observability When some of the variables... |

227 | Learning the structure of dynamic probabilistic networks
- Friedman, Murphy, et al.
- 1998
(Show Context)
Citation Context ...hanging the model structure, until a local maximum is reached. This is called the Structural EM (SEM) algorithm. SEM was succesfully used to learn the structure of discrete DBNs withinmissing data in =-=[FMR98]-=-. 7.1 Inventing new hidden nodes So far, structure learning has meant finding the right connectivity between pre-existing nodes. A more interesting problem is inventing hidden nodes on demand. Hidden ... |

227 |
The em algorithm for graphical association models with missing data
- Lauritzen
- 1995
(Show Context)
Citation Context ...s the number of parents. 5 Known structure, partial observability When some of the variables are not observed, the likelihood surface becomes multimodal, and we must use iterative methods, such as EM =-=[Lau95]-=- or gradient ascent [BKRK97], to find a local maximum of the ML/MAP function. These algorithms need to use an inference algorithm to compute the expected sufficient statistics (or related quantity) fo... |

225 | The Bayesian Structural EM Algorithm
- Friedman
- 1998
(Show Context)
Citation Context ...hical models over undirected ones include the fact that BNs can encode deterministic relationships, and that it is easier to learn BNs (see Section 3) since they are separable models (in the sense of =-=[Fri98]-=-). Hence we shall focus exclusively on BNs in this paper. For a careful study of the relationship between directed and undirected graphical models, see [Pea88, Whi90, Lau96]. 1 1 It is interesting to ... |

202 | Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pacific Symposium on Biocomputing 4
- Akutsu, Miyano, et al.
- 1999
(Show Context)
Citation Context ...iably learn structure. For deterministic boolean networks, this issue has been addressed from a statistical physics perspective [Her98] and a computational learning theory (combinatorial) perspective =-=[AMK99]-=-. In particular, [AMK99] prove that, if the fan-in is bounded by a constant K , the number of samples needed to identify a boolean network of n nodes is lower bounded by W(2 K + K log n) and upper bou... |

190 | Object-oriented Bayesian networks - Koller, Pfeffer - 1997 |

185 | Linear modeling of mRNA expression levels during CNS development and injury
- D’Haeseleer, Wen, et al.
- 1999
(Show Context)
Citation Context ...tic networks from time series data. In this paper, we show that most of the proposed discrete time models --- including the boolean network model [Kau93, SS96], the linear model of D'haeseleer et al. =-=[DWFS99]-=-, and the nonlinear model of Weaver et al. [WWS99] --- are all special cases of a general class of models called Dynamic Bayesian Networks (DBNs). The advantages of DBNs include the ability to model s... |

181 | Second order derivatives for network pruning: Optimal brain surgeon
- Hassibi, Stork
- 1993
(Show Context)
Citation Context ...is requires computing the Hessian H of the error surface. In the technique of "optimal brain damage" [CDS90], they assume H is diagonal; in the more sophisticated technique of "optimal =-=brain surgeon" [HS93]-=-, they remove this assumption. Since we believe (hope!) that the true model is sparse (i.e., most entries in W are 0), we can encode this knowledge (assumption) by using a N (0; 1=ff) Gaussian prior 7... |

178 | A guide to the literature on learning probabilistic networks from data - Buntine - 1996 |

173 | Probabilistic independence networks for hidden markov probability models
- Smyth, Heckerman, et al.
- 1997
(Show Context)
Citation Context ...into in this paper. See [Jen96] for an introduction to one of the most commonly used algorithm (the junction tree algorithm). [HD94] gives a good cookbook introduction to the junction tree algorithm. =-=[SHJ97]-=- explains how the forwards-backwards algorithm is a special case of the junction tree algorithm, and might be a good place to start if you are familiar with HMMs. [Dec98] would be a good place to star... |

161 | Adaptive probabilistic networks with hidden variables
- Binder, Russell, et al.
- 1997
(Show Context)
Citation Context ...Known structure, partial observability When some of the variables are not observed, the likelihood surface becomes multimodal, and we must use iterative methods, such as EM [Lau95] or gradient ascent =-=[BKRK97]-=-, to find a local maximum of the ML/MAP function. These algorithms need to use an inference algorithm to compute the expected sufficient statistics (or related quantity) for each node. 5.1 Inference i... |

156 | Inference in belief networks: A procedural guide
- Huang, Darwiche
- 1994
(Show Context)
Citation Context ...BNs Inference in Bayesian networks is a huge subject which we will not go into in this paper. See [Jen96] for an introduction to one of the most commonly used algorithm (the junction tree algorithm). =-=[HD94]-=- gives a good cookbook introduction to the junction tree algorithm. [SHJ97] explains how the forwards-backwards algorithm is a special case of the junction tree algorithm, and might be a good place to... |

154 | Probable Networks and Plausible Predictions- A Review of Practical Bayesian Methods for Supervised Neural Networks.” Unpublished manuscript
- MacKay
- 1986
(Show Context)
Citation Context ...ks better in practice is to approximate their posterior by a Gaussian (or maybe a mixture of Gaussians), find their MAP values, and then plug them in to the above equation: see [Bis95, sec. 10.4] and =-=[Mac95] for detai-=-ls. We can associate a separate hyperparameter ff i;j for each weight W i;j , find their MAP values, and use this as a "soft" means of findingwhich entries of W i;j to keep: this is called A... |

152 | Introduction to Monte Carlo Methods
- MacKay
- 1998
(Show Context)
Citation Context ...erved nodes. Exact inference in densely connected (discrete) BNs is computationally intractible, so we must use approximate methods. There are many approaches, including sampling methods such as MCMC =-=[Mac98]-=- and variational methods such as mean-field [JGJS98]. DBNs are even more computationally intractible in the sense that, even if the connections between two slices are sparse, correlations can arise ov... |

152 | Large-scale temporal gene expression mapping of central nervous system development - Wen, Fuhrman, et al. - 1998 |

149 | Gradient calculations for dynamic recurrent neural networks: A survey
- Pearlmutter
- 1995
(Show Context)
Citation Context ... the standard technique is to make a local linear approximation --- this is called the Extended Kalman Filter (EKF), and is similar to techniques developed in the Recurrent Neural Networks literature =-=[Pea95]-=-. (One reason for the widespread success of HMMs with Gaussian outputs is that the discrete hidden variables can be used to approximate arbitrary non-linear dynamics, given enough training data.) In t... |

145 | Modeling regulatory networks with weight matrices
- Weaver, Workman, et al.
- 1999
(Show Context)
Citation Context ...we show that most of the proposed discrete time models --- including the boolean network model [Kau93, SS96], the linear model of D'haeseleer et al. [DWFS99], and the nonlinear model of Weaver et al. =-=[WWS99]-=- --- are all special cases of a general class of models called Dynamic Bayesian Networks (DBNs). The advantages of DBNs include the ability to model stochasticity, to incorporate prior knowledge, and ... |

111 | Some practical issues in constructing belief networks - Henrion - 1989 |

97 | Architecture of Systems Problem Solving - KLIR - 1985 |

92 | A test case of correlation metric construction of a reaction pathway from measurements - Arkin, Shen, et al. - 1997 |

92 | Modeling the complexity of genetic networks: Understanding multigenic and pleiotropic regulation, Complexity 1 (6 - Somogyi, Sniegoski - 1996 |

84 | A bayesian approach to causal discovery
- Heckerman, Meek, et al.
- 1997
(Show Context)
Citation Context ...odes which are most commonly connected to the class variable, she found that they were the nodes in the vicinity of the splice junction, and further, that their CPTs encoded the known AG/G pattern. 3 =-=[HMC97]-=- discusses Bayesian techniques for learning causal (static) networks, and [SGS93] discusses constraint based techniques --- but see [Fre97] for a critique of this approach. These techniques have mostl... |

76 |
Note on the bias of information estimates
- Miller
- 1955
(Show Context)
Citation Context ...deterministic relationship. For stochastic relationships, we must decide whether the gains in MI produced by a larger parent set is "worth it". The standard approach in the RA community uses=-= the fact [Mil55]-=- thats2 (X; Y )sI(X; Y )N ln(4), where N is the number of samples. Hence we can use a 2 test to decide whether an increase in MI is statistically significant. (This also gives us some kind of confiden... |

72 | A generalization of noisy-or model - Srinivas - 1993 |

67 | Causal independence for probability assessment and inference using Bayesian networks - Heckerman, Breese - 1995 |

55 |
Comparing biases for minimal network construction with back-propagation
- Hanson, Pratt
- 1989
(Show Context)
Citation Context ... is called a regularizer, and encourages learning of a weight matrix with small values. (Unfortunately, this prior favours many small weights, rather than a few large ones, although this can be fixed =-=[HP89]-=-.) This regularization technique is also called weight decay. Note that the use of a regularizer overcomes the problem that W will often be underdetermined, i.e., there will be fewer samples than para... |

54 | Decision-theoretic foundations for causal reasoning
- Heckerman, Shachter
- 1995
(Show Context)
Citation Context ...ted arcs are called chain graphs. In a BN, one can intuitively regard an arc from A to B as indicating the fact that A "causes" B. (For a more formal treatment of causality in the context of=-= BNs, see [HS95]-=-.) Since evidence can be assigned to any subset of the nodes (i.e., any subset of nodes can be observed), BNs can be used for both causal reasoning (from known causes to unknown effects) an diagnostic... |

50 | Asymptotic model selection for directed networks with hidden variables
- Geiger, Heckerman, et al.
- 1996
(Show Context)
Citation Context ...h we can obviously use branch-and-bound). For large n, this is computationally infeasible, so a common approach number of free parameters. In a model with hidden variables, it might be less than this =-=[GHM96]-=-. 6 Hence nodes with compact representations of their CPDs will encur a lower penalty, which can allow connections to form which might otherwise have been rejected [FG96]. is to only search up until l... |