Results 21  30
of
54
Embedded Bayesian Network Classifiers
, 1997
"... Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The mo ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Lowdimensional probability models for local distribution functions in a Bayesian network include decision trees, decision graphs, and causal independence models. We describe a new probability model for discrete Bayesian networks, which we call an embedded Bayesian network classifier or EBNC. The model for a node Y given parents X is obtained from a (usually different) Bayesian network for Y and X in which X need not be the parents of Y . We show that an EBNC is a special case of a softmax polynomial regression model. Also, we show how to identify a nonredundant set of parameters for an EBNC, and describe an asymptotic approximation for learning the structure of Bayesian networks that contain EBNCs. Unlike the decision tree, decision graph, and causal independence models, we are unaware of a semantic justification for the use of these models. Experiments are needed to determine whether the models presented in this paper are useful in practice. Keywords: Bayesian networks, model dimen...
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
, 2008
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance.
A theoretical study of Y structures for causal discovery
 Proceedings of the Conference on Uncertainty in Artificial Intelligence
, 2006
"... Causal discovery from observational data in the presence of unobserved variables is challenging. Identification of socalled Y substructures is a sufficient condition for ascertaining some causal relations in the large sample limit, without the assumption of no hidden common causes. An example of a ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Causal discovery from observational data in the presence of unobserved variables is challenging. Identification of socalled Y substructures is a sufficient condition for ascertaining some causal relations in the large sample limit, without the assumption of no hidden common causes. An example of a Y substructure is A → C, B → C, C → D. This paper describes the first asymptotically reliable and computationally feasible scorebased search for discrete Y structures that does not assume that there are no unobserved common causes. For any parameterization of a directed acyclic graph (DAG) that has scores with the property that any DAG that can represent the distribution beats any DAG that can’t, and for two DAGs that represent the distribution, if one has fewer parameters than the other, the one with the fewest parameter wins. In this framework there is no need to assign scores to causal structures with unobserved common causes. The paper also describes how the existence of a Y structure shows the presence of an unconfounded causal relation, without assuming that there are no hidden common causes. 1
Generalizing The Derivation Of The Schwarz Information Criterion
, 1999
"... The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Althoug ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Although the original derivation assumes that the observed data is independent, identically distributed, and arising from a probability distribution in the regular exponential family, SIC has traditionally been used in a much larger scope of model selection problems. To better justify the widespread applicability of SIC, we derive the criterion in a very general framework: one which does not assume any specific form for the likelihood function, but only requires that it satisfies certain nonrestrictive regularity conditions.
Learning the Bayesian Network Structure: Dirichlet Prior versus Data
"... In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximumaposteriori estimate of the Bayesian network structure. In our first contribution, we t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximumaposteriori estimate of the Bayesian network structure. In our first contribution, we theoretically analyze the case of large ESSvalues, which complements previous work: among other results, we find that the presence of an edge in a Bayesian network is favoured over its absence even if both the Dirichlet prior and the data imply independence, as long as the conditional empirical distribution is notably different from uniform. In our second contribution, we focus on realistic ESSvalues, and provide an analytical approximation to the ‘optimal ’ ESSvalue in a predictive sense (its accuracy is also validated experimentally): this approximation provides an understanding as to which properties of the data have the main effect determining the ‘optimal ’ ESSvalue. 1
CONSISTENT ESTIMATION OF THE BASIC NEIGHBORHOOD OF MARKOV RANDOM FIELDS
, 2006
"... For Markov random fields on Z d with finite state space, we address the statistical estimation of the basic neighborhood, the smallest region that determines the conditional distribution at a site on the condition that the values at all other sites are given. A modification of the Bayesian Informati ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
For Markov random fields on Z d with finite state space, we address the statistical estimation of the basic neighborhood, the smallest region that determines the conditional distribution at a site on the condition that the values at all other sites are given. A modification of the Bayesian Information Criterion, replacing likelihood by pseudolikelihood, is proved to provide strongly consistent estimation from observing a realization of the field on increasing finite regions: the estimated basic neighborhood equals the true one eventually almost surely, not assuming any prior bound on the size of the latter. Stationarity of the Markov field is not required, and phase transition does not affect the results. 1. Introduction. In this paper Markov
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance. 1
On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network
"... Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a succ ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous works have studied BNs sample complexity, yet they mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning task, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results (lower and upperbounds) on the probability of learning a wrong structure, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes. 1
Learning Locally Minimax Optimal Bayesian Networks
"... We consider the problem of learning Bayesian network models in a noninformative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a se ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the problem of learning Bayesian network models in a noninformative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a set of independence relations), and learning the parameters of the model (that fix the probability distribution from the set of all distributions consistent with the chosen structure). There are not many theoretical frameworks that consistently handle both these problems together, the Bayesian framework being an exception. In this paper we propose an alternative, informationtheoretic framework which sidesteps some of the technical problems facing the Bayesian approach. The framework is based on the minimaxoptimal Normalized Maximum Likelihood (NML) distribution, which is motivated by the Minimum Description Length (MDL) principle. The resulting model selection criterion is consistent, and it provides a way to construct highly predictive Bayesian network models. Our empirical tests show that the proposed method compares favorably with alternative approaches in both model selection and prediction tasks. 1