## Bayesian Networks for Genomic Analysis (2004)

Citations: | 4 - 1 self |

### BibTeX

@MISC{Sebastiani04bayesiannetworks,

author = {Paola Sebastiani and Maria M. Abad and Marco F. Ramoni},

title = {Bayesian Networks for Genomic Analysis},

year = {2004}

}

### OpenURL

### Abstract

Bayesian networks are emerging into the genomic arena as a general modeling tool able to unravel the cellular mechanism, to identify genotypes that confer susceptibility to disease, and to lead to diagnostic models. This chapter reviews the foundations of Bayesian networks and shows their application to the analysis of various types of genomic data, from genomic markers to gene expression data. The examples will highlight the potential of this methodology but also the current limitations and we will describe new research directions that hold the promise to make Bayesian networks a fundamental tool for genome data

### Citations

7414 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...s exist to perform this inference when the network variables are all discrete, all continuous and modelled with Gaussian distributions, or the network topology is constrained to particular structures =-=[9, 58, 69]-=-. For general network topologies and non standard distributions, we need to resort to stochastic simulation [12]. Among the several stochastic simulation methods currently available, Gibbs sampling [3... |

4119 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...l Markov property, the joint probability p(y1k, . . . , yvk, ck) of class and attributes is factorized as p(ck)p(y1k, . . . , yvk | ck). The simplest example is known as a Naïve Bayes (NB) classifier =-=[25, 53]-=-, and makes the further simplification that the attributes Yi are conditionally independent given the class C so that p(y1k, . . . , yvk|ck) = � p(yik|ck). Figure 7: The structure of the Naïve Bayes c... |

3990 |
Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...9]. For general network topologies and non standard distributions, we need to resort to stochastic simulation [12]. Among the several stochastic simulation methods currently available, Gibbs sampling =-=[36, 103]-=- is particularly appropriate for Bayesian network reasoning because of its ability to leverage on the graphical decomposition of joint multivariate distributions to improve computational efficiency. G... |

2265 |
The Elements of Statistical Learning
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...ximately the same size. Then K −1 sets are used for retraining (or inducing) the network from data that is then tested on the remaining set using monitors or other measures of the predictive accuracy =-=[42]-=-. By repeating this process K times, we derive independent measures of the predictive accuracy of the network induced from data as well as measures of the robustness of the network to sampling variabi... |

1733 |
Generalised Linear Modelling
- McCullagh, Nelder
- 1989
(Show Context)
Citation Context ...∼ Gamma(αi, µi(pa(yi), βi)), where µi(pa(yi), βi) is the conditional mean of Yi and µi(pa(yi), βi) 2 /αi is the conditional variance. We use the standard parameterization of generalized linear models =-=[65]-=-, in which the mean µi(pa(yi), βi) is not restricted to be a linear function of the parameters βij, but the linearity in the parameters is enforced in the linear predictor ηi, which is itself related ... |

1474 |
Statistical analysis with missing data
- Little, Rubin
- 1987
(Show Context)
Citation Context ...mechanisms [85]. We also introduced two approaches to model selection with partially ignorable missing data mechanisms: ignorable imputation and model folding. Contrary to standard imputation schemes =-=[35, 60, 80, 100, 101]-=-, ignorable imputation accounts for the missing-data mechanism and produces, asymptotically, a proper imputation model as defined by Rubin [76, 78]. However, the computation effort can be very demandi... |

1436 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ...mechanisms [85]. We also introduced two approaches to model selection with partially ignorable missing data mechanisms: ignorable imputation and model folding. Contrary to standard imputation schemes =-=[35, 60, 80, 100, 101]-=-, ignorable imputation accounts for the missing-data mechanism and produces, asymptotically, a proper imputation model as defined by Rubin [76, 78]. However, the computation effort can be very demandi... |

1334 |
Local computations with probabilities on graphical structures and their applications to expert systems
- Lauritzen, Spiegelhalter
- 1988
(Show Context)
Citation Context ...s exist to perform this inference when the network variables are all discrete, all continuous and modelled with Gaussian distributions, or the network topology is constrained to particular structures =-=[9, 58, 69]-=-. For general network topologies and non standard distributions, we need to resort to stochastic simulation [12]. Among the several stochastic simulation methods currently available, Gibbs sampling [3... |

1151 | Graphical Models
- Lauritzen
- 1996
(Show Context)
Citation Context ... definitions of conditional independence. The overall list of marginal and conditional independencies represented by the directed acyclic graph is summarized by the local and global Markov properties =-=[57]-=- that are exemplified in Figure 3 using a network of seven variables. The local Markov property states that each node is independent of its non descendant given the parent nodes and leads to a direct ... |

1127 | A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...or each variable grows exponentially with the number of candidate parents and successful heuristic search procedures (both deterministic and stochastic) have been proposed to render the task feasible =-=[16, 55, 96, 108]-=-. The aim of these heuristic search procedures is to impose some restrictions on the search space to capitalize on the decomposability of the posterior probability of each Bayesian network Mh. One sug... |

1023 |
Quantitative monitoring of gene expression patterns with a complementary DNA microarray
- Schena, Shalon, et al.
- 1995
(Show Context)
Citation Context ...ic platforms, on the other hand, are designed to quantify the expression of the genes encoded by the DNA of a cell, as amount of RNA produced by each single gene. cDNA and oligonucleotide microarrays =-=[61, 81, 83]-=- enable investigators to simultaneously measure the expression of thousands of genes and hold the promise to cast new light onto the regulatory mechanisms of the genome [51]. The ability they offer to... |

948 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...� p(yik|pa(yi)k, θhi)p(θhi)dθhi = � p(D|Mhi) ik where p(D|Mhi) = � � k p(yik|pa(yi)k, θhi)p(θhi)dθhi. By further assuming decomposable network prior probabilities that factorize as p(Mh) = � i p(Mhi) =-=[44]-=-, the posterior probability of a model Mh is the product: i 8 isp(Mh|D) = � p(Mhi|D). Here p(Mhi|D) is the posterior probability weighting the dependency of Yi on the set of parents specified by the m... |

896 |
Multiple Imputation for Nonresponse in Surveys
- Rubin
- 1987
(Show Context)
Citation Context ...ion, particularly when the continuous variables are highly skewed. Missing Data The received view of the effect of missing data on statistical inference is based on the approach described by Rubin in =-=[76]-=-. This approach classifies the missing data mechanism as ignorable or not, according to whether the data are missing completely at random (MCAR), missing at random (MAR), or informatively missing (IM)... |

794 | Using Bayesian Network to Analyze Expression Data
- Friedman, Linial, et al.
- 2000
(Show Context)
Citation Context ...ul framework to model these different data sources. Bayesian networks have been already applied, by us and others, to the analysis of different types of genomic data —from gene expression microarrays =-=[28, 31, 70, 92, 93]-=- to protein-protein interactions [46] and genotype data [7, 90] — and their modular nature makes them easily extensible to the task of modeling these different types of data. However, the application ... |

661 |
Probabilistic Networks and Expert Systems
- Cowell, Dawid, et al.
- 1999
(Show Context)
Citation Context ...n) vector of covariances between Yi and each 11sparent Yij. With this parameterization, the prior on τi is usually a hyper-Wishart distribution for the joint variance-covariance matrix of Yi, P a(yi) =-=[17]-=-. The Wishart distribution is the multivariate generalization of a Gamma distribution. An alternative approach is to work directly with the conditional variance of Yi. In this case, we estimate the co... |

632 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...lassifiers have been proposed to relax the assumption that attributes are conditionally independent given the class. Perhaps the most competitive one is the Tree Augmented Naïve Bayes(TAN) classifier =-=[29]-=- in which all the attributes have the class variable as a parent as well as another attribute. To avoid cycles, the attributes have to be ordered and the first attribute does not have other parents be... |

530 |
Causation, Prediction and Search
- Spirtes, Glymour, et al.
- 2001
(Show Context)
Citation Context ...s testing. Other approaches based on independence tests or variants of the Bayesian metric like the minimum description length (MDL) score or the Bayesian information criterion (BIC) are described in =-=[57, 98, 105]-=-. We suppose to have a set M = {M0, M1, ..., Mg} of Bayesian networks, each network describing an hypothesis on the dependency structure of the random variables Y1, ..., Yv. Our task is to choose one ... |

508 |
Bayesian classification (AutoClass): Theory and results
- Cheeseman, Stutz
- 1995
(Show Context)
Citation Context ...d form solution for the computation of the marginal likelihood [56]. This property has been applied, for example, to model-based clustering by [74], and it is commonly used in classification problems =-=[10]-=-. However, this restriction can quickly become unrealistic and greatly limit the set of models to explore. As a consequence, common practice is still to discretize continuous variables with possible l... |

501 |
of Incomplete Multivariate Data
- Schafer
- 1997
(Show Context)
Citation Context ...mechanisms [85]. We also introduced two approaches to model selection with partially ignorable missing data mechanisms: ignorable imputation and model folding. Contrary to standard imputation schemes =-=[35, 60, 80, 100, 101]-=-, ignorable imputation accounts for the missing-data mechanism and produces, asymptotically, a proper imputation model as defined by Rubin [76, 78]. However, the computation effort can be very demandi... |

461 |
Graphical Models in Applied Multivariate Statistics
- Whittaker
- 1990
(Show Context)
Citation Context ...s this property: the two parent variables are marginally independent, but they become dependent when we condition on their common child. A well known consequence of this fact is the Simpson’s paradox =-=[105]-=- and a typical application in genetics is the dependency structure of genotypes among members of the same family: the genotypes of two parents are independent, assuming random mating, but they become ... |

359 | An analysis of bayesian classifiers
- Langley, Iba, et al.
- 1992
(Show Context)
Citation Context ...l Markov property, the joint probability p(y1k, . . . , yvk, ck) of class and attributes is factorized as p(ck)p(y1k, . . . , yvk | ck). The simplest example is known as a Naïve Bayes (NB) classifier =-=[25, 53]-=-, and makes the further simplification that the attributes Yi are conditionally independent given the class C so that p(y1k, . . . , yvk|ck) = � p(yik|ck). Figure 7: The structure of the Naïve Bayes c... |

350 |
Inference and missing data
- Rubin
- 1976
(Show Context)
Citation Context ...d model folding reconstruct a completion of the incomplete data by taking into account the variables responsible for the missing data. This property is in agreement with the suggestion put forward in =-=[45, 60, 75]-=- that the variables responsible for the missing data should be kept in the model. However, our approach allows us to also evaluate the likelihoods of models that do not depend explicitly on these vari... |

338 |
Expression monitoring by hybridization to high-density oligonucleotide arrays
- Lockhart, Dong, et al.
- 1996
(Show Context)
Citation Context ...ic platforms, on the other hand, are designed to quantify the expression of the genes encoded by the DNA of a cell, as amount of RNA produced by each single gene. cDNA and oligonucleotide microarrays =-=[61, 81, 83]-=- enable investigators to simultaneously measure the expression of thousands of genes and hold the promise to cast new light onto the regulatory mechanisms of the genome [51]. The ability they offer to... |

322 |
Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data. Nat Genet 34
- Segal, Shapira, et al.
- 2003
(Show Context)
Citation Context ...ul framework to model these different data sources. Bayesian networks have been already applied, by us and others, to the analysis of different types of genomic data —from gene expression microarrays =-=[28, 31, 70, 92, 93]-=- to protein-protein interactions [46] and genotype data [7, 90] — and their modular nature makes them easily extensible to the task of modeling these different types of data. However, the application ... |

289 | Model selection and accounting for model uncertainty in graphical models using Occam’s window
- Madigan, Raftery
- 1994
(Show Context)
Citation Context ...ayesware.com), and the R-package Deal [4]. Greedy search can be trapped in local maxima and induce spurious dependency and a variant of this search to limit spurious dependency is stepwise regression =-=[62]-=-. However, there is evidence that the K2 algorithm performs as well as other search algorithms [107]. 2.2.3 Validation The automation of model selection is not without problems and both diagnostic and... |

287 |
Gene expression correlates of clinical prostate cancer behavior, Cancer Cell 1
- Singh, Febbo, et al.
- 2002
(Show Context)
Citation Context ... specimens of patients undergoing surgery between 1996 and 1997, and controls were 50 normal specimens. The expression profiles were derived with the U95Av2 Affymetrix microarray and are described in =-=[95]-=-. We first analyzed the data with BADGE [88], a program for differential analysis that uses Bayesian model averaging to compute the posterior probability of differential expression. We selected about ... |

253 | Probabilistic boolean networks: a rule-based uncertainty model for gene regulatory networks
- Shmulevich, Dougherty, et al.
(Show Context)
Citation Context ...eneral framework to integrate multivariate time series of gene products and to represent feed-forward loops and feedback mechanisms [28] that is alternative to other network models of gene regulation =-=[94]-=-. A dynamic Bayesian network is defined by a directed acyclic graph in which nodes continue to represent stochastic variables and arrows represent temporal dependencies that are quantified by probabil... |

251 | Bayesian graphical models for discrete data
- Madigan, York
- 1995
(Show Context)
Citation Context ... networks a fundamental tool for genomic data analysis. 2 Fundamentals of Bayesian Networks Bayesian networks are a representation formalism at the cutting edge of knowledge discovery and data mining =-=[43, 63, 64]-=-. In this section, we will review the formalism of Bayesian networks and the process of learning them from databases. 2.1 Representation and Reasoning A Bayesian network has two components: a directed... |

234 |
Inferring cellular networks using probabilistic graphical models
- Friedman
(Show Context)
Citation Context ...ul framework to model these different data sources. Bayesian networks have been already applied, by us and others, to the analysis of different types of genomic data —from gene expression microarrays =-=[28, 31, 70, 92, 93]-=- to protein-protein interactions [46] and genotype data [7, 90] — and their modular nature makes them easily extensible to the task of modeling these different types of data. However, the application ... |

228 | Learning the Structure of Dynamic Probabilistic Networks
- Friedman, Murphy, et al.
- 1998
(Show Context)
Citation Context ...dels are joined by the common ancestors. An example is in Figure 15. The search of each local dependency structure is simplified by the natural ordering imposed on the variables by the temporal frame =-=[32]-=- that constrains the model space of each variable Yi at time t: the set of candidate parents consists of the variables Y i(t−1), . . . , Y i(t−p) as well as the variables Y h(t−j) for all h �= i, and ... |

228 | Induction of selective Bayesian classifiers
- Langley, Sage
- 1994
(Show Context)
Citation Context ...rameters were learned with the Bayesian approach discussed in Section 2.2. Due the large number of input attributes, we used a filtered version of the wrapped feature selection algorithm described in =-=[54]-=- to increase the predictive accuracy. Column 3 shows the accuracy of the same classifiers that were built by selecting a subset of the genes and shows that accuracy sensibly increases when feature sel... |

217 | Being bayesian about network structure: A bayesian approach tostructure discovery in bayesian networks
- Friedman, Koller
(Show Context)
Citation Context ... 89] and more recently genotype data [90]. Recent results have shown that restricting the search space by imposing an order among the variables yields a more regular space over the network structures =-=[30]-=-. In functional genomics, the determination of this order can be aided by the available information about gene control interactions embedded into known pathways. When the variables represent gene prod... |

210 |
Predicting the clinical status of human breast cancer by using gene expression profiles
- West
- 2001
(Show Context)
Citation Context ...nd Classification The goal of many studies in genomics medicine is the discovery of a molecular profile for disease diagnosis or prognosis. The molecular profile is typically based on gene expression =-=[38, 68, 104]-=-. Bayesian networks have been used in the past few years as supervised classification models able to discover and represent molecular profiles that characterize a disease [49, 109]. This section descr... |

207 |
Construction and Assessment of Classification Rules
- Hand
- 1997
(Show Context)
Citation Context ...to define tests for predictive accuracy [17]. In the absence of an independent test set, standard cross validation techniques are typically used to assess the predictive accuracy of one or more nodes =-=[41]-=-. In K-fold cross validation, the data are divided into K non-overlapping sets of approximately the same size. Then K −1 sets are used for retraining (or inducing) the network from data that is then t... |

206 |
Sequential Updating of Conditional Probabilities on Directed Graphical Structures
- Lauritzen
- 1990
(Show Context)
Citation Context ...The full joint distribution is defined by the parameters θ3jk, and the parameters θ1k and θ2k that specify the marginal distributions of Y1 and Y2. 10sknown as global and local parameter independence =-=[97]-=-, and are valid only under the assumption the hyperparameters αijk satisfy the consistency rule � j αij = α for all i [40, 34]. Symmetric Dirichlet distributions satisfy easily this constraint by sett... |

196 | A Bayesian networks approach for predicting protein-protein interactions from genomic data
- Jansen
- 2003
(Show Context)
Citation Context ...ayesian networks have been already applied, by us and others, to the analysis of different types of genomic data —from gene expression microarrays [28, 31, 70, 92, 93] to protein-protein interactions =-=[46]-=- and genotype data [7, 90] — and their modular nature makes them easily extensible to the task of modeling these different types of data. However, the application of Bayesian networks to genomics requ... |

195 |
Multiple Imputation after 18 + Years
- RUBIN
- 1996
(Show Context)
Citation Context ...norable for parameter estimation, but it is not when data are IM. An important but overlooked issue is whether the missing data mechanism generating data that are MAR is ignorable for model selection =-=[77, 85]-=-. We have shown that this is not the case for the class of graphical models exemplified in Figure 16 [85]. We assume that there is only one variable with missing data (the variable Y4 in the DAG) and ... |

189 |
Expert systems and probabilistic networks models
- Castillo, Guttiérrez, et al.
- 1997
(Show Context)
Citation Context ...s exist to perform this inference when the network variables are all discrete, all continuous and modelled with Gaussian distributions, or the network topology is constrained to particular structures =-=[9, 58, 69]-=-. For general network topologies and non standard distributions, we need to resort to stochastic simulation [12]. Among the several stochastic simulation methods currently available, Gibbs sampling [3... |

185 | Modeling gene expression with differential equations
- Chen, He, et al.
- 1999
(Show Context)
Citation Context ... of change by linear Gaussian networks [22]. However, the development of similar approximations for non regularly spaced time points and for general, non linear, kinetic equations with feedback loops =-=[11]-=- is an open issue. The further advantage of dynamic Bayesian network is to offer an environment for causal inference with well designed temporal experiments. 28s5 Research Directions This chapter has ... |

185 |
Tools for statistical inference
- TANNER
- 1996
(Show Context)
Citation Context |

183 | Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning 29(2/3):181
- Chickering, Heckerman
- 1997
(Show Context)
Citation Context ...iety of interactions. Unfortunately, no closed form solution exists to compute the marginal likelihood of these distributions, and we have to resort to computationally demanding approximation methods =-=[14]-=-. The normality assumption on the variables can be relaxed to the more general case that the variables have distributions in the exponential family, and we have introduced the family of GGNs to descri... |

163 |
The estimation of probabilities: An essay on modern bayesian methods
- Good
- 1965
(Show Context)
Citation Context ...stributions of Y1 and Y2. 10sknown as global and local parameter independence [97], and are valid only under the assumption the hyperparameters αijk satisfy the consistency rule � j αij = α for all i =-=[40, 34]-=-. Symmetric Dirichlet distributions satisfy easily this constraint by setting αijk = α/(ciqi) where qi is the number of states of the parents of Yi. One advantage of adopting symmetric hyper Dirichlet... |

160 |
Analysis of Human Genetic Linkage. The
- Ott
- 1991
(Show Context)
Citation Context ...egression models are that they can be used to assess whether the association between the risk for disease and a particular genotype is confounded by some external factor (such as population admixture =-=[67]-=-) and they can be used to test whether an external factor or a particular genotype is an effect modifier of an association [47]. However, logistic regression models pose three serious limitations: whe... |

144 | Propagation of Probabilities, Means and Variances in Mixed Graphical Association Models
- Lauritzen
- 1992
(Show Context)
Citation Context ... that discrete variables can only be parent nodes in the network, but cannot be children of any continuous Gaussian node leads to a closed form solution for the computation of the marginal likelihood =-=[56]-=-. This property has been applied, for example, to model-based clustering by [74], and it is commonly used in classification problems [10]. However, this restriction can quickly become unrealistic and ... |

133 | Learning equivalence classes of Bayesian network structures
- Chickering
- 1996
(Show Context)
Citation Context ... hoc” stochastic methods [96] or Markov Chain Monte Carlo methods [30] can also be used. An alternative approach to limit the search space is to define classes of equivalent directed graphical models =-=[13]-=-. The order imposed on the variables defines a set of candidate parents for each variable Yi and one way to proceed is to implement an independent model selection for each variable Yi and then link to... |

123 |
Hyper Markov laws in the statistical analysis of decomposable graphical models
- Dawid, Lauritzen
- 1993
(Show Context)
Citation Context ...el parameters, the overall likelihood is then given by the product p(D|θh) = � p(yik|pa(yi)k, θh). ik Computational efficiency is gained by using priors for Θh that obey the Directed Hyper-Markov law =-=[21]-=-. Under this assumption, the prior density p(θh) admits the same factorization of the likelihood function, namely p(θh) = � i p(θhi), where θhi is the subset of parameters used to describe the depende... |

117 |
Learning Gaussian Networks
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ... of parents-child dependency and then the joint multivariate distribution that is needed for the reasoning algorithms is derived by multiplication. More details are described for example in [105] and =-=[33]-=-. We focus on this second approach and again use the global parameter independence [97] to assign independent prior distributions to each set of parameters τi, βi that quantify the dependency of the v... |

115 |
MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data
- Doniger, Salomonis, et al.
- 2003
(Show Context)
Citation Context ...xploited for example in [92] to restrict the set of possible dependency structures between genes. This ordering operation can be largely automated by using some available programs, such as MAPPFinder =-=[24]-=- 13sor GenMAPP [18], able to automatically map gene expression data to known pathways. For genes with unknown function, one can use different orders with random restarts. Other search strategies based... |

112 |
Conditional independence in statistical theory (with discussion
- DAWID
- 1979
(Show Context)
Citation Context ... the marker, given that only the genotype of the other child is known. Because the probability of Y2 changes according to the value of Y1, the two variables are dependent. The seminal papers by Dawid =-=[19, 20]-=- summarize many important properties and alternative definitions of conditional independence. The overall list of marginal and conditional independencies represented by the directed acyclic graph is s... |

112 | Learning limited dependence bayesian classi ers
- Sahami
- 1996
(Show Context)
Citation Context ...of the assumptions made by the NB or the TAN classifiers. Some examples are the l-Limited Dependence Bayesian classifier (l-LDB) in which the maximum number of parents that an attribute can have is l =-=[79]-=-. Another example is the unrestricted Augmented Naïve Bayes classifier (ANB) in which the number of parents is unlimited but the scoring metric used for learning, the minimum description length criter... |