Results 1  10
of
25
Learning Bayesian networks: The combination of knowledge and statistical data
 Machine Learning
, 1995
"... We describe scoring metrics for learning Bayesian networks from a combination of user knowledge and statistical data. We identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simpl ..."
Abstract

Cited by 913 (38 self)
 Add to MetaCart
We describe scoring metrics for learning Bayesian networks from a combination of user knowledge and statistical data. We identify two important properties of metrics, which we call event equivalence and parameter modularity. These properties have been mostly ignored, but when combined, greatly simplify the encoding of a user’s prior knowledge. In particular, a user can express his knowledge—for the most part—as a single prior Bayesian network for the domain. 1
Bayesian Network Classifiers
, 1997
"... Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restr ..."
Abstract

Cited by 587 (22 self)
 Add to MetaCart
Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with stateoftheart classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we evaluate approaches for inducing classifiers from data, based on the theory of learning Bayesian networks. These networks are factored representations of probability distributions that generalize the naive Bayesian classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness that characterize naive Bayes. We experimentally tested these approaches, using problems from the University of California at Irvine repository, and compared them to C4.5, naive Bayes, and wrapper methods for feature selection.
A Guide to the Literature on Learning Probabilistic Networks From Data
, 1996
"... This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the ..."
Abstract

Cited by 172 (0 self)
 Add to MetaCart
This literature review discusses different methods under the general rubric of learning Bayesian networks from data, and includes some overlapping work on more general probabilistic networks. Connections are drawn between the statistical, neural network, and uncertainty communities, and between the different methodological communities, such as Bayesian, description length, and classical statistics. Basic concepts for learning and Bayesian networks are introduced and methods are then reviewed. Methods are discussed for learning parameters of a probabilistic network, for learning the structure, and for learning hidden variables. The presentation avoids formal definitions and theorems, as these are plentiful in the literature, and instead illustrates key concepts with simplified examples. Keywords Bayesian networks, graphical models, hidden variables, learning, learning structure, probabilistic networks, knowledge discovery. I. Introduction Probabilistic networks or probabilistic gra...
Learning Bayesian Networks is NPHard
, 1994
"... Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodnessoffit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et ..."
Abstract

Cited by 130 (2 self)
 Add to MetaCart
Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodnessoffit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et al. (1994) introduced a Bayesian metric, called the BDe metric, that computes the relative posterior probability of a network structure given data. They show that the metric has a property desireable for inferring causal structure from data. In this paper, we show that the problem of deciding whether there is a Bayesian networkamong those where each node has at most k parentsthat has a relative posterior probability greater than a given constant is NPcomplete, when the BDe metric is used. 1 Introduction Recently, many researchers have begun to investigate methods for learning Bayesian networks, including Bayesian methods [Cooper and Herskovits, 1991, Buntine, 1991, York 1992, Spiegel...
Learning Bayesian Networks by Genetic Algorithms. A case study in the prediction of survival in malignant skin melanoma
, 1997
"... In this work we introduce a methodology based on Genetic Algorithms for the automatic induction of Bayesian Networks from a file containing cases and variables related to the problem. The methodology is applied to the problem of predicting survival of people after one, three and five years of being ..."
Abstract

Cited by 71 (11 self)
 Add to MetaCart
In this work we introduce a methodology based on Genetic Algorithms for the automatic induction of Bayesian Networks from a file containing cases and variables related to the problem. The methodology is applied to the problem of predicting survival of people after one, three and five years of being diagnosed as having malignant skin melanoma. The accuracy of the obtained model, measured in terms of the percentage of wellclassified subjects, is compared to that obtained by the called NaiveBayes. In both cases, the estimation of the model accuracy is obtained from the 10fold crossvalidation method. 1. Introduction Expert systems, one of the most developed areas in the field of Artificial Intelligence, are computer programs designed to help or replace humans beings in tasks in which the human experience and human knowledge are scarce and unreliable. Although, there are domains in which the tasks can be specifed by logic rules, other domains are characterized by an uncertainty inherent...
Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks
 In Proceedings of the IEEE Computer Society Bioinformatics Conference (CSB 03
, 2003
"... We propose a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including proteinprotein interactions, proteinDNA interactions, binding site information, existing literature and so on. Unfortunately, m ..."
Abstract

Cited by 55 (5 self)
 Add to MetaCart
We propose a statistical method for estimating a gene network based on Bayesian networks from microarray gene expression data together with biological knowledge including proteinprotein interactions, proteinDNA interactions, binding site information, existing literature and so on. Unfortunately, microarray data do not contain enough information for constructing gene networks accurately in many cases. Our method adds biological knowledge to the estimation method of gene networks under a Bayesian statistical framework, and also controls the tradeoff between microarray information and biological knowledge automatically. We conduct Monte Carlo simulations to show the effectiveness of the proposed method. We analyze Saccharomyces cerevisiae gene expression data as an application. 1.
Learning Bayesian Network Structures by Searching For the Best Ordering With Genetic Algorithms
 IEEE Transactions on Systems, Man and Cybernetics
, 1996
"... In this paper we present a ne_(l n [!ii ' with respect to Bayesian networks con ogy for inducing Bayesian network structures frop3 titute the roblem of the evidence propagation and a database of cases. The methodology is based oap&lll searching for the best ordering of the system vari the problem ..."
Abstract

Cited by 54 (9 self)
 Add to MetaCart
In this paper we present a ne_(l n [!ii ' with respect to Bayesian networks con ogy for inducing Bayesian network structures frop3 titute the roblem of the evidence propagation and a database of cases. The methodology is based oap&lll searching for the best ordering of the system vari the problem of the model search. The problem of shies by means of genetic algorithl{. Since his th_vidence propagation consists of once the vMproblem of finding an optimal ordea. teeuarue}rables are known, the assignment of resembles the traveling salesman p'FolUleh)ve use .... IW. ....... probablhles to the values of the rest of the van genetic operators that were developed for the latter  problem. The quality of a variable ordering is eval ables. Cooper [4] demonstrated that this problem Mated with the algorithm K2. We present empirical results that were obtained with a simulation of the ALARM network.
Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties
, 1996
"... This paper was partially presented at the 9th conference on Uncertainty in Artificial Intelligence, July 1993. ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
This paper was partially presented at the 9th conference on Uncertainty in Artificial Intelligence, July 1993.
Asymptotic model selection for directed networks with hidden variables
, 1996
"... We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a ..."
Abstract

Cited by 49 (15 self)
 Add to MetaCart
We extend the Bayesian Information Criterion (BIC), an asymptotic approximation for the marginal likelihood, to Bayesian networks with hidden variables. This approximation can be used to select models given large samples of data. The standard BIC as well as our extension punishes the complexity of a model according to the dimension of its parameters. We argue that the dimension of a Bayesian network with hidden variables is the rank of the Jacobian matrix of the transformation between the parameters of the network and the parameters of the observable variables. We compute the dimensions of several networks including the naive Bayes model with a hidden root node. 1
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare di ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.