Results 1  10
of
22
Factorized normalized maximum likelihood criterion for learning bayesian network structures,” Submitted for PGM08
, 2008
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance. 1
Learning networks determined by the ratio of prior and data
 In Proceedings of 26th Conference Conference on Uncertainty in Artificial Intelligence
, 2010
"... Recent reports have described that the equivalent sample size (ESS) in a Dirichlet prior plays an important role in learning Bayesian networks. This paper provides an asymptotic analysis of the marginal likelihood score for a Bayesian network. Results show that the ratio of the ESS and sample size ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Recent reports have described that the equivalent sample size (ESS) in a Dirichlet prior plays an important role in learning Bayesian networks. This paper provides an asymptotic analysis of the marginal likelihood score for a Bayesian network. Results show that the ratio of the ESS and sample size determine the penalty of adding arcs in learning Bayesian networks. The number of arcs increases monotonically as the ESS increases; the number of arcs monotonically decreases as the ESS decreases. Furthermore, the marginal likelihood score provides a unified expression of various score metrics by changing prior knowledge. 1
Locally Minimax Optimal Predictive Modeling with Bayesian Networks
"... We propose an informationtheoretic approach for predictive modeling with Bayesian networks. Our approach is based on the minimax optimal Normalized Maximum Likelihood (NML) distribution, motivated by the MDL principle. In particular, we present a parameter learning method which, together with a pre ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We propose an informationtheoretic approach for predictive modeling with Bayesian networks. Our approach is based on the minimax optimal Normalized Maximum Likelihood (NML) distribution, motivated by the MDL principle. In particular, we present a parameter learning method which, together with a previously introduced NMLbased model selection criterion, provides a way to construct highly predictive Bayesian network models from data. The method is parameterfree and robust, unlike the currently popular Bayesian marginal likelihood approach which has been shown to be sensitive to the choice of prior hyperparameters. Empirical tests show that the proposed method compares favorably with the Bayesian approach in predictive tasks. 1
OneShot Learning with Bayesian Networks
"... Humans often make accurate inferences given a single exposure to a novel situation. Some of these inferences can be achieved by discovering and using neardeterministic relationships between attributes. Approaches based on Bayesian networks are good at discovering and using soft probabilistic relati ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Humans often make accurate inferences given a single exposure to a novel situation. Some of these inferences can be achieved by discovering and using neardeterministic relationships between attributes. Approaches based on Bayesian networks are good at discovering and using soft probabilistic relationships between attributes, but typically fail to identify and exploit neardeterministic relationships. Here we develop a Bayesian network approach that overcomes this limitation by learning a hyperparameter for each distribution in the network that specifies whether it is nondeterministic or neardeterministic. We apply our approach to oneshot learning problems based on a realworld database of immigration records, and show that it outperforms a more standard Bayesian network approach.
Learning optimal Bayesian networks with heuristic search
 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, MISSISSIPPI STATE UNIVERSITY
, 2012
"... Bayesian networks are a widely used graphical model which formalize reasoning under uncertainty. Unfortunately, construction of a Bayesian network by an expert is timeconsuming, and, in some cases, all experts may not agree on the best structure for a problem domain. Additionally, for some complex ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Bayesian networks are a widely used graphical model which formalize reasoning under uncertainty. Unfortunately, construction of a Bayesian network by an expert is timeconsuming, and, in some cases, all experts may not agree on the best structure for a problem domain. Additionally, for some complex systems such as those present in molecular biology, experts with an understanding of the entire domain and how individual components interact may not exist. In these cases, we must learn the network structure from available data. This dissertation focuses on scorebased structure learning. In this context, a scoring function is used to measure the goodness of fit of a structure to data. The goal is to find the structure which optimizes the scoring function. The first contribution of this dissertation is a shortestpath finding perspective for the problem of learning optimal Bayesian network structures. This perspective builds on earlier dynamic programming strategies, but, as we show, offers much more flexibility. Second, we develop a set of data structures to improve the efficiency of many of the
On the Robustness of Bayesian Networks to Learning from Nonconjugate Sampling
"... Under local DeRobertis separation measures, the posterior distances between two densities is the same as between the prior densities. Like KullbackLeibler separation they are also additive under factorisation. These two properties allow us to prove that the precise specification of the prior will n ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Under local DeRobertis separation measures, the posterior distances between two densities is the same as between the prior densities. Like KullbackLeibler separation they are also additive under factorisation. These two properties allow us to prove that the precise specification of the prior will not be critical with respect to the variation distance on the posteriors under the following conditions. The genuine and approximating priors need to be similarly rough, the approximating prior must have concentrated on a small ball on the margin of interest, not on the boundary of the probability space, and the approximating prior must have similar or fatter tails to the genuine prior. Robustness then follows for all likelihoods, even ones that are misspecified. Furthermore, the total variation distances can be bounded explicitly by an easy to calculate function of the prior local DeRobertis separation measures and simple summary statistics of the functioning posterior. In this paper we apply these results to study the robustness of prior specification to learning Bayesian networks.
The Most Generative Maximum Margin Bayesian Networks
"... *These authors contributed equally to this paper Althoughdiscriminativelearningingraphical models generally improves classification results, the generative semantics of the model are compromised. In this paper, we introduce a novel approach of hybrid generativediscriminative learning for Bayesian ne ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
*These authors contributed equally to this paper Althoughdiscriminativelearningingraphical models generally improves classification results, the generative semantics of the model are compromised. In this paper, we introduce a novel approach of hybrid generativediscriminative learning for Bayesian networks. We use an SVMtype large margin formulation for discriminative training, introducing a likelihoodweighted ℓ 1norm for theSVMnormpenalization. Thissimultaneouslyoptimizesthedatalikelihoodandtherefore partly maintains the generative character of the model. For many network structures,ourmethodcanbeformulatedasaconvex problem, guaranteeingaglobally optimal solution. Intermsofclassification,theresulting models outperform stateofthe art generative and discriminative learning methods for Bayesian networks, and are comparable with linear and kernelized SVMs. Furthermore, the models achieve likelihoods close to the maximum likelihood solution and show robust behavior in classification experiments with missing features. 1.
AN NMLBASED METHOD FOR LEARNING BAYESIAN NETWORKS
"... Bayesian networks are among most popular model classes for discrete vectorvalued i.i.d data. Currently the most popular model selection criterion for Bayesian networks follows Bayesian paradigm. However, this method has ..."
Abstract
 Add to MetaCart
(Show Context)
Bayesian networks are among most popular model classes for discrete vectorvalued i.i.d data. Currently the most popular model selection criterion for Bayesian networks follows Bayesian paradigm. However, this method has
Learning Extended Tree Augmented Naive StructuresI
"... This work proposes an extended version of the wellknown treeaugmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds ’ algorithm, our structure learning procedure explores a sup ..."
Abstract
 Add to MetaCart
(Show Context)
This work proposes an extended version of the wellknown treeaugmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds ’ algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments show that we obtain models with better prediction accuracy than Naive Bayes and TAN, and comparable to the accuracy of the stateoftheart classifier averaged onedependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka.
CALCULATING THE NML DISTRIBUTION FOR TREESTRUCTURED BAYESIAN NETWORKS
"... We are interested in model class selection. We want to compute a criterion which, given two competing model classes, chooses the better one. When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the m ..."
Abstract
 Add to MetaCart
(Show Context)
We are interested in model class selection. We want to compute a criterion which, given two competing model classes, chooses the better one. When learning Bayesian network structures from sample data, an important issue is how to evaluate the goodness of alternative network structures. Perhaps the most commonly used model (class) selection criterion is the marginal likelihood, which is obtained by integrating over a prior distribution for the model parameters. However, the problem of determining a reasonable prior for the parameters is a highly controversial issue, and no completely satisfying Bayesian solution has yet been presented in the noninformative setting [1]. The normalized maximum likelihood (NML), based on Rissanen’s informationtheoretic Minimum Description Length MDL methodology [2,