Results 11  20
of
83
Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory
, 1997
"... This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our threephase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the ..."
Abstract

Cited by 49 (0 self)
 Add to MetaCart
This paper addresses the problem of learning Bayesian network structures from data by using an information theoretic dependency analysis approach. Based on our threephase construction mechanism, two efficient algorithms have been developed. One of our algorithms deals with a special case where the node ordering is given, the algorithm only require ) ( 2 N O CI tests and is correct given that the underlying model is DAGFaithful [Spirtes et. al., 1996]. The other algorithm deals with the general case and requires ) ( 4 N O conditional independence (CI) tests. It is correct given that the underlying model is monotone DAGFaithful (see Section 4.4). A system based on these algorithms has been developed and distributed through the Internet. The empirical results show that our approach is efficient and reliable. 1 Introduction The Bayesian network is a powerful knowledge representation and reasoning tool under conditions of uncertainty. A Bayesian network is a directed acyclic graph ...
Causal Inference in the Presence of Latent Variables and Selection Bias
 In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence
"... This paper uses Bayesian network models for that investigation. Bayesian networks, or directed acyclic graph (DAG) models have proved very useful in representing both causal and statistical hypotheses. The nodes of the graph represent vertices, directed edges represent direct influences, and the top ..."
Abstract

Cited by 49 (8 self)
 Add to MetaCart
This paper uses Bayesian network models for that investigation. Bayesian networks, or directed acyclic graph (DAG) models have proved very useful in representing both causal and statistical hypotheses. The nodes of the graph represent vertices, directed edges represent direct influences, and the topology of the graph encodes statistical constraints. We will consider features of such models that can be determined from data under assumptions that are related to those routinely applied in experimental situations:
Improved learning of Bayesian networks
 Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
(Show Context)
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the socalled “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and realworld data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management
 In Proceedings of the 13th International Conference on Machine Learning
, 1996
"... This paper discusses issues related to Bayesian network model learning for unbalanced binary classification tasks. In general, the primary focus of current research on Bayesian network learning systems (e.g., K2 and its variants) is on the creation of the Bayesian network structure that fits the dat ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
(Show Context)
This paper discusses issues related to Bayesian network model learning for unbalanced binary classification tasks. In general, the primary focus of current research on Bayesian network learning systems (e.g., K2 and its variants) is on the creation of the Bayesian network structure that fits the database best. It turns out that when applied with a specific purpose in mind, such as classification, the performance of these network models may be very poor. We demonstrate that Bayesian network models should be created to meet the specific goal or purpose intended for the model. We first present a goaloriented algorithm for constructing Bayesian networks for predicting uncollectibles in telecommunications riskmanagement datasets. Second, we argue and demonstrate that current Bayesian network learning methods may fail to perform satisfactorily in real life applications since they do not learn models tailored to a specific goal or purpose. Third, we discuss the performance of "goal oriented"...
A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data
 PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants
A new approach for learning belief networks using independence criteria
 International Journal of Approximate Reasoning
, 2000
"... q ..."
(Show Context)
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare di ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.
Searching for Bayesian Network Structures in the Space of Restricted Acyclic Aprtially Directed Graphs
 Journal of Artificial Intelligence Research
, 2003
"... Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
(Show Context)
Although many algorithms have been designed to construct Bayesian network structures using dierent approaches and principles, they all employ only two methods: those based on independence criteria, and those based on a scoring function and a search procedure (although some methods combine the two). Within the score+search paradigm, the dominant approach uses local search methods in the space of directed acyclic graphs (DAGs), where the usual choices for de ning the elementary modi cations (local changes) that can be applied are arc addition, arc deletion, and arc reversal. In this paper, we propose a new local search method that uses a dierent search space, and which takes account of the concept of equivalence between network structures: restricted acyclic partially directed graphs (RPDAGs). In this way, the number of dierent con gurations of the search space is reduced, thus improving eciency. Moreover, although the nal result must necessarily be a local optimum given the nature of the search method, the topology of the new search space, which avoids making early decisions about the directions of the arcs, may help to nd better local optima than those obtained by searching in the DAG space.
Learning Causal Networks from Data: A survey and a new algorithm for recovering possibilistic causal networks
, 1997
"... Introduction Reasoning in terms of cause and effect is a strategy that arises in many tasks. For example, diagnosis is usually defined as the task of finding the causes (illnesses) from the observed effects (symptoms). Similarly, prediction can be understood as the description of a future plausible ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Introduction Reasoning in terms of cause and effect is a strategy that arises in many tasks. For example, diagnosis is usually defined as the task of finding the causes (illnesses) from the observed effects (symptoms). Similarly, prediction can be understood as the description of a future plausible situation where observed effects will be in accordance with the known causal structure of the phenomenon being studied. Causal models are a summary of the knowledge about a phenomenon expressed in terms of causation. Many areas of the ap # This work has been partially supported by the Spanish Comission Interministerial de Ciencia y Tecnologia Project CICYTTIC96 0878. plied sciences (econometry, biomedics, engineering, etc.) have used such a term to refer to models that yield explanations, allow for prediction and facilitate planning and decision making. Causal reasoning can be viewed as inference guided by a causation theory. That kind of inference can be further specialised into induc
Learning Bayesian Networks Using Feature Selection
 in D. Fisher & H. Lenz, eds, Proceedings of the fifth International Workshop on Artificial Intelligence and Statistics, Ft. Lauderdale, FL
, 1995
"... This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, highpredictiveaccuracy networks. The new approach selects a subset of features which maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects o ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel enhancement for learning Bayesian networks with a bias for small, highpredictiveaccuracy networks. The new approach selects a subset of features which maximizes predictive accuracy prior to the network learning phase. We examine explicitly the effects of two aspects of the algorithm, feature selection and node ordering. Our approach generates networks which are computationally simpler to evaluate and which display predictive accuracy comparable to that of Bayesian networks which model all attributes. 1 INTRODUCTION Bayesian networks are being increasingly recognized as an important representation for probabilistic reasoning. For many domains, the need to specify the probability distributions for a Bayesian network is considerable, and learning these probabilities from data using an algorithm like K2 [8] 1 could alleviate such specification difficulties. We describe an extension to the Bayesian network learning approaches introduced in K2. Rather than ...