## Using Bayesian networks to analyze expression data (2000)

### Cached

### Download Links

- [www.ls.huji.ac.il]
- [www.ls.huji.ac.il]
- [www.achen.tcu.edu.tw]
- [www.cse.unsw.edu.au]
- [www.kddresearch.org]
- [www.cs.iastate.edu]
- [www.politespider.com]
- [www.cs.huji.ac.il]
- [www.cs.huji.ac.il]
- [cbio.ensmp.fr]
- [www.ics.uci.edu]
- [www.cs.huji.ac.il]
- [www.cs.huji.ac.il]
- [www.cs.cornell.edu]
- [www.cs.cornell.edu]
- [www.cs.queensu.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Computational Biology |

Citations: | 792 - 17 self |

### BibTeX

@ARTICLE{Friedman00usingbayesian,

author = {Nir Friedman and Michal Linial and Iftach Nachman},

title = {Using Bayesian networks to analyze expression data},

journal = {Journal of Computational Biology},

year = {2000},

volume = {7},

pages = {601--620}

}

### Years of Citing Articles

### OpenURL

### Abstract

DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graph-based model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cell-cycle measurements of Spellman et al. (1998). Key words: gene expression, microarrays, Bayesian methods. 1.

### Citations

7407 |
Probabilistic reasoning in intelligent systems: Networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ...the transcriptional program by examining statistical properties of dependence and conditional independence in the data. We base our approach on the well-studied statistical tool of Bayesian networks (=-=Pearl 1988-=-). These networks represent the dependence structure between multiple interacting quantities (e.g., expression levels of different genes). Our approach, probabilistic in nature, is capable of handling... |

4886 | Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
- Altschul, Madden, et al.
- 1997
(Show Context)
Citation Context ...list of the top scoring relations can be found in Table 2. Among these, all involving two known genes (10/20) make sense biologically. When one of the ORFs is unknown careful searches using PsiBlast (=-=Altschul et al. 1997-=-), Pfam (Sonnhammer et al. 1998) and Protomap (Yona et al. 1998) can reveal firm homologies to proteins functionally related to the other gene in the pair. (e.g. YHR143W, which is paired to the endoch... |

3115 | An introduction to the bootstrap
- Efron, Tibshirani
- 1993
(Show Context)
Citation Context ...tworks is huge. Instead, we resort to an approximate method in the general spirit of the ideal solution. An effective and relatively simple approach for estimating confidence is the bootstrap method (=-=Efron & Tibshirani 1993-=-). The main idea behind the bootstrap is simple. We generate “perturbed” versions of our original data set, and learn from them. In this way we collect many networks, all of which are fairly reasonabl... |

2039 |
Cluster analysis and display of genome-wide expression patterns
- Eisen, Spellman, et al.
- 1998
(Show Context)
Citation Context ...in use are based on clustering algorithms. These algorithms attempt to locate groups of genes that have similar expression patterns over a set of experiments (Alon et al. 1999, BenDor & Yakhini 1999, =-=Eisen et al. 1998-=-, Michaels et al. 1998, Spellman et al. 1998). Such analysis is useful in discovering genes that are co-regulated. A more ambitious goal for analysis is revealing the structure of the transcriptional ... |

1127 | A Bayesian method for the induction of probabilistic networks from data. Machine Learning 9:309–347
- Cooper, Herskovits
- 1992
(Show Context)
Citation Context ...ion that evaluates each network with respect to the training data, and to search for the optimal network according to this score. A commonly used scoring function is the Bayesian scoring metric (see (=-=Cooper & Herskovits 1992-=-, Heckerman et al. 1995) for complete description): Score where is a constant independent of and is the marginal likelihood which averages the probability of the data over all possible parameter assig... |

965 |
An introduction to Bayesian Network
- Jensen
- 1996
(Show Context)
Citation Context ...n of some of the other variables?) or independencies in the domain (e.g., are and independent once we observe ?). The literature contains a suite of algorithms that can answer such queries (see e.g. (=-=Jensen 1996-=-, Pearl 1988)), exploiting the explicit representation of structure in order to answer queries efficiently. 2.3 Equivalence Classes of Bayesian Networks A Bayesian network structure implies a set of i... |

948 | Learning Bayesian networks: The combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1995
(Show Context)
Citation Context ...hybrid networks of multinomial distributions and conditional Gaussian distributions. (This prior combines earlier works on priors for multinomial networks (Buntine, 1991; Cooper and Herskovits, 1992; =-=Heckerman et al., 1995-=-) and for Gaussian networks (Geiger and Heckerman, 1994).) We refer the reader to Heckerman and Geiger (1995) and Heckerman (1998) for details on these priors. In the analysis of gene expression data,... |

891 | A tutorial on learning with Bayesian networks - Heckerman - 1998 |

868 |
Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9:3273–3297
- Spellman, Sherlock, et al.
- 1998
(Show Context)
Citation Context ...nism, providing a “genomic” viewpoint on gene expression. As a consequence, this technology facilitates new experimental approaches for understanding gene expression and regulation (Iyer et al. 1999, =-=Spellman et al. 1998-=-). Early microarray experiments examined few samples, and mainly focused on differential display across tissues or conditions of interest. The design of recent experiments focuses on performing a larg... |

778 |
Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
- Alon, Barkai, et al.
- 1999
(Show Context)
Citation Context ...om. Most of the analysis tools currently in use are based on clustering algorithms. These algorithms attempt to locate groups of genes that have similar expression patterns over a set of experiments (=-=Alon et al. 1999-=-, BenDor & Yakhini 1999, Eisen et al. 1998, Michaels et al. 1998, Spellman et al. 1998). Such analysis is useful in discovering genes that are co-regulated. A more ambitious goal for analysis is revea... |

700 |
Exploring the metabolic and genetic control of gene expression on a genomic scale
- Risi, Iyer, et al.
- 1997
(Show Context)
Citation Context ...e and Engineering, Hebrew University, Jerusalem, 91904, Israel. 601602 FRIEDMAN ET AL. DNA microarrays, researchers are now able to measure the abundance of thousands of mRNA targets simultaneously (=-=DeRisi et al., 1997-=-; Lockhart et al., 1996; Wen et al., 1998). Unlike classical experiments, where the expression levels of only a few genes were reported, DNA microarray experiments can measure all the genes of an orga... |

632 | Bayesian network classifiers
- Friedman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...networks from observations, and computational algorithms to do so 1A B C Figure 1: An example of a simple network structure. are well understood and have been used successfully in many applications (=-=Friedman et al. 1997-=-, Thiesson et al. 1998). Finally, Bayesian networks provide models of causal influence: Although Bayesian networks are mathematically defined strictly in terms of probabilities and conditional indepen... |

529 |
Causation, Prediction, and Search
- Spirtes, Glymour, et al.
- 2000
(Show Context)
Citation Context ...of probabilities and conditional independence statements, a connection can be made between this characterization and the notion of direct causal influence. (Heckerman et al. 1997, Pearl & Verma 1991, =-=Spirtes et al. 1993-=-). The remainder of this paper is organized as follows. In Section 2, we review key concepts of Bayesian networks, learning them from observations, and using them to infer causality. In Section 3, we ... |

465 | Modeling and simulation of genetic regulatory systems: a literature review - Jong, H |

354 | Clustering gene expression patterns
- Ben-Dor, Yakhini
- 1999
(Show Context)
Citation Context ...). Although we did not use any prior knowledge, we managed to extract many biologically plausible conclusions from this analysis. Our approach is quite different than the clustering approach used by (=-=Ben-Dor & Yakhini 1999-=-, Alon et al. 1999, Eisen et al. 1998, Michaels et al. 1998, Spellman et al. 1998), in that it attempts to learn a much richer structure from the data. Our methods are capable of discovering causal re... |

289 | Model selection and accounting for model uncertainty in graphical models using Occam’s window - Madigan, Raftery - 1994 |

251 | Bayesian graphical models for discrete data
- Madigan, York
- 1995
(Show Context)
Citation Context ...ayesian network models. It equally well applies to other models that are learned from gene expression data, such as clustering models.USING BAYESIAN NETWORKS 609 Carlo (MCMC) sampling procedure (see =-=Madigan and York, 1995-=-, and Gilks et al., 1996, for a general introduction to MCMC sampling). However, it is not clear how these methods scale up for large domains. Although recent developments in MCMC methods for Bayesian... |

227 | Learning the structure of dynamic probabilistic networks
- Friedman, Murphy, et al.
- 1998
(Show Context)
Citation Context ...d for discretization of the expression levels. 2 We note that we can also learn temporal models using a Bayesian network that includes gene expression values in two (or more) consecutive time points (=-=Friedman et al. 1998-=-). We are currently perusing this issue. 9Table 1: List of dominant genes in the ordering relations (top 14 out of 30) Gene/ORF Dominance # of descendent genes Score notes YLR183C 551 609 708 Contain... |

217 | Being bayesian about network structure: A bayesian approach tostructure discovery in bayesian networks - Friedman, Koller |

213 | A theory of inferred causation
- Pearl, Verma
- 1991
(Show Context)
Citation Context ...d strictly in terms of probabilities and conditional independence statements, a connection can be made between this characterization and the notion of direct causal influence. (Heckerman et al. 1997, =-=Pearl & Verma 1991-=-, Spirtes et al. 1993). The remainder of this paper is organized as follows. In Section 2, we review key concepts of Bayesian networks, learning them from observations, and using them to infer causali... |

212 | Minimum complexity density estimation - Barron, Cover - 1991 |

204 |
The transcriptional program in response of human fibroblasts to serum. Science 283:83–87
- Iyer, Eisen, et al.
- 1999
(Show Context)
Citation Context ...e genes of an organism, providing a “genomic” viewpoint on gene expression. As a consequence, this technology facilitates new experimental approaches for understanding gene expression and regulation (=-=Iyer et al. 1999-=-, Spellman et al. 1998). Early microarray experiments examined few samples, and mainly focused on differential display across tissues or conditions of interest. The design of recent experiments focuse... |

196 | Theory refinement on Bayesian networks - Buntine - 1991 |

189 | Learning Bayesian network structure from massive datasets: the ‘‘sparse candidate” algorithm, UAI 29 - Friedman, Nachman, et al. - 1999 |

174 |
Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative. The Annals of Statistics 17(1
- Lauritzen, Wermuth
- 1989
(Show Context)
Citation Context ...e are loosing information. An alternative to discretization is using (semi)parametric density models for representing conditional probabilities in the networks we learn (e.g (Heckerman & Geiger 1995, =-=Lauritzen & Wermuth 1989-=-, Hoffman & Tresp 1996)). However, a bad choice of the parametric family can strongly bias the learning algorithm. We believe that discretization provides a reasonably unbiased approach for dealing wi... |

161 | Learning bayesian networks is NP-complete
- Chickering
- 1995
(Show Context)
Citation Context ...hey assign values to all the variables in . Once the prior is specified and the data is given, learning amounts to finding the structure that maximizes the score. This problem is known to be NP-hard (=-=Chickering 1996-=-), thus we resort to heuristic search. The decomposition of the score is crucial for this optimization problem. A local search procedure that changes one edge at each move can efficiently evaluate the... |

152 |
Large-scale temporal gene expression mapping of central nervous system development
- Wen, Fuhrman, et al.
- 1998
(Show Context)
Citation Context ...ned organization to a solid surface. By using DNA microarrays researchers are now able to measure the abundance of thousands of mRNA targets simultaneously (DeRisi. et al. 1997, Lockhart et al. 1996, =-=Wen et al. 1998-=-). Unlike classical experiments, where the expression levels of only a few genes were reported, DNA microarray experiments can measure all the genes of an organism, providing a “genomic” viewpoint on ... |

145 | Modeling regulatory networks with weight matrices
- Weaver, Workman, et al.
- 1999
(Show Context)
Citation Context ...overing genes that are co-regulated. A more ambitious goal for analysis is revealing the structure of the transcriptional regulation system (Akutsu et al. 1998, Chen et al. 1999, Somogyi et al. 1996, =-=Weaver et al. 1999-=-). This is clearly a hard problem: Mainly since mRNA expression data alone gives only a partial picture that does not reflect key events, such as translation and protein (in)activation, which play a m... |

117 |
Learning Gaussian Networks
- Geiger, Heckerman
- 1994
(Show Context)
Citation Context ...nditional Gaussian distributions. (This prior combines earlier works on priors for multinomial networks (Buntine, 1991; Cooper and Herskovits, 1992; Heckerman et al., 1995) and for Gaussian networks (=-=Geiger and Heckerman, 1994-=-).) We refer the reader to Heckerman and Geiger (1995) and Heckerman (1998) for details on these priors. In the analysis of gene expression data, we use a small number of samples. Therefore, care shou... |

110 | Causality - Models - Pearl - 2000 |

95 | Pfam: multiple sequence alignments and HMM-profiles of protein domains
- Sonnhammer, Eddy, et al.
- 1998
(Show Context)
Citation Context ...ions can be found in Table 2. Among these, all involving two known genes (10/20) make sense biologically. When one of the ORFs is unknown careful searches using PsiBlast (Altschul et al. 1997), Pfam (=-=Sonnhammer et al. 1998-=-) and Protomap (Yona et al. 1998) can reveal firm homologies to proteins functionally related to the other gene in the pair. (e.g. YHR143W, which is paired to the endochitinase CTS1, is related to EGT... |

95 | Reverse engineering of regulatory networks in human B cells - Basso, Margolin, et al. - 2005 |

94 | A transformational characterization of equivalent Bayesian network structures
- Chickering
- 1995
(Show Context)
Citation Context ...raphs and are equivalent if Ind Ind . This notion of equivalence is crucial, since when we examine observations from a distribution, we often cannot distinguish between equivalent graphs. Results of (=-=Chickering 1995-=-, Pearl & Verma 1991) show that we can characterize equivalence classes of graphs using a simple representation. In particular, these results establish that equivalent graphs have the same underlying ... |

84 | A bayesian approach to causal discovery
- Heckerman, Meek, et al.
- 1997
(Show Context)
Citation Context ...e mathematically defined strictly in terms of probabilities and conditional independence statements, a connection can be made between this characterization and the notion of direct causal influence. (=-=Heckerman et al. 1997-=-, Pearl & Verma 1991, Spirtes et al. 1993). The remainder of this paper is organized as follows. In Section 2, we review key concepts of Bayesian networks, learning them from observations, and using t... |

80 | Aliferis. The maxmin hill-climbing Bayesian network structure learning algorithm - Tsamardinos, Brown, et al. |

69 |
A direct link between sister chromatid cohesion and chromosome condensation revealed through the analysis of MCD1 in S. cerevisiae." Cell 91(1
- Guacci, Koshland
- 1997
(Show Context)
Citation Context ...CLN2, CDC5, and RAD53 whose functional relation has been established (Cvrckova and Nasmyth, 1993; Drebot et al., 1993). The genes MCD1, RFA2, CDC45, RAD53, CDC5, and POL30 were found to be essential (=-=Guacci et al., 1997-=-). These are clearly key genes in essential cell functions. Some of them are components of prereplication complexes(CDC45,POL30). Others (like RFA2,POL30, and MSH6) are involved in DNA repair. It is k... |

67 | Identification of gene regulatory networks by strategic gene disruptions and gene overexpressions
- Akutsu, Kuhara, et al.
- 1998
(Show Context)
Citation Context ...998, Spellman et al. 1998). Such analysis is useful in discovering genes that are co-regulated. A more ambitious goal for analysis is revealing the structure of the transcriptional regulation system (=-=Akutsu et al. 1998-=-, Chen et al. 1999, Somogyi et al. 1996, Weaver et al. 1999). This is clearly a hard problem: Mainly since mRNA expression data alone gives only a partial picture that does not reflect key events, suc... |

63 | Causal Discovery from a Mixture of Experimental and Observational Data
- Cooper, Yoo
- 1999
(Show Context)
Citation Context ...em. In the context of gene expression, we should view knockout/overexpressed mutants as such interventions. Thus, we can design methods that deal with mixed forms of data in a principled manner (See (=-=Cooper & Yoo 1999-=-) for a recent work in this direction). In addition, this theory can provide tools for experimental design, that is, understanding which interventions are deemed most informative to determining the ca... |

50 | On the sample complexity of learning Bayesian networks
- Friedman, Yakhini
- 1996
(Show Context)
Citation Context ...ly large number of samples, graph structures that exactly capture all dependencies in the distribution, will receive, with high probability, a higher score than all other graphs (Barron & Cover 1991, =-=Friedman & Yakhini 1996-=-, Höffgen 1993). This means, that given a sufficiently large number of instances in large data sets, learning procedures can pinpoint the exact network structure up to the correct equivalence class. H... |

48 | Identifying gene regulatory networks from experimental data
- Chen, Filkov, et al.
- 1999
(Show Context)
Citation Context ... 1998). Such analysis is useful in discovering genes that are co-regulated. A more ambitious goal for analysis is revealing the structure of the transcriptional regulation system (Akutsu et al. 1998, =-=Chen et al. 1999-=-, Somogyi et al. 1996, Weaver et al. 1999). This is clearly a hard problem: Mainly since mRNA expression data alone gives only a partial picture that does not reflect key events, such as translation a... |

48 | Data Analysis with Bayesian Networks: A Bootstrap Approach
- Friedman, Goldszmidt, et al.
- 1999
(Show Context)
Citation Context ...ilds on two techniques that were motivated by the challenges posed by this domain: a novel search algorithm (Friedman, Nachman, and Pe’er, 1999) and an approach for estimating statistical con� dence (=-=Friedman, Goldszmidt, and Wyner, 1999-=-). We applied our methods to the real expression data of Spellman et al. (1998). Although, we did not use any prior knowledge, we managed to extract many biologically plausible conclusions from this a... |

44 | Learning Bayesian networks: A unification for discrete and Gaussian domains
- Heckerman, Geiger
- 1995
(Show Context)
Citation Context ...sured expression levels we are loosing information. An alternative to discretization is using (semi)parametric density models for representing conditional probabilities in the networks we learn (e.g (=-=Heckerman & Geiger 1995-=-, Lauritzen & Wermuth 1989, Hoffman & Tresp 1996)). However, a bad choice of the parametric family can strongly bias the learning algorithm. We believe that discretization provides a reasonably unbias... |

36 | Gene clusters and polycistronic transcription in eukaryotes - Blumenthal - 1998 |

35 | From signatures to models: understanding cancer using microarrays - Segal, Friedman, et al. - 2005 |

33 |
Learning and robust learning of product distributions
- Hoffgen
- 1993
(Show Context)
Citation Context ...s, graph structures that exactly capture all dependencies in the distribution, will receive, with high probability, a higher score than all other graphs (Barron & Cover 1991, Friedman & Yakhini 1996, =-=Höffgen 1993-=-). This means, that given a sufficiently large number of instances in large data sets, learning procedures can pinpoint the exact network structure up to the correct equivalence class. Heckerman et al... |

29 |
The gene expression matrix: Towards the extraction of genetic network architectures. Nonlinear Analysis-Theory Methods
- Somogyi, Fuhrman, et al.
- 1997
(Show Context)
Citation Context ...sis is useful in discovering genes that are co-regulated. A more ambitious goal for analysis is revealing the structure of the transcriptional regulation system (Akutsu et al. 1998, Chen et al. 1999, =-=Somogyi et al. 1996-=-, Weaver et al. 1999). This is clearly a hard problem: Mainly since mRNA expression data alone gives only a partial picture that does not reflect key events, such as translation and protein (in)activa... |

29 | Causal inference in the presence of latent variables and selection bias
- Spirtes, Meek, et al.
- 1995
(Show Context)
Citation Context ...entral issue is: When can we learn a causal network from observations? This issue received a thorough treatment in the literature (Heckerman et al., 1999; Pearl and Verma, 1991; Spirtes et al., 1993; =-=Spirtes et al., 1999-=-). We brie� y review the relevant results for our needs here. For a more detailed treatment of the topic, we refer the reader to Pearl (2000) and to Cooper and Glymour (1999). First it is important to... |

24 | Application of abductive ILP to learning metabolic network inhibition from temporal data - Tamaddoni-Nezhad, Chaleil, et al. - 2006 |

22 | Learning mixtures of bayesian networks
- Thiesson, Meek, et al.
- 1997
(Show Context)
Citation Context ...ions, and computational algorithms to do so 1A B C Figure 1: An example of a simple network structure. are well understood and have been used successfully in many applications (Friedman et al. 1997, =-=Thiesson et al. 1998-=-). Finally, Bayesian networks provide models of causal influence: Although Bayesian networks are mathematically defined strictly in terms of probabilities and conditional independence statements, a co... |

20 | Gaussian Process Networks - Friedman, Nachman - 2000 |