Results 21 - 30
of
43
On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network
"... Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a succ ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous works have studied BNs sample complexity, yet they mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning task, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results (lower and upper-bounds) on the probability of learning a wrong structure, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes. 1
A theoretical study of Y structures for causal discovery
- Proceedings of the Conference on Uncertainty in Artificial Intelligence
, 2006
"... Causal discovery from observational data in the presence of unobserved variables is challenging. Identification of so-called Y substructures is a sufficient condition for ascertaining some causal relations in the large sample limit, without the assumption of no hidden common causes. An example of a ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Causal discovery from observational data in the presence of unobserved variables is challenging. Identification of so-called Y substructures is a sufficient condition for ascertaining some causal relations in the large sample limit, without the assumption of no hidden common causes. An example of a Y substructure is A → C, B → C, C → D. This paper describes the first asymptotically reliable and computationally feasible scorebased search for discrete Y structures that does not assume that there are no unobserved common causes. For any parameterization of a directed acyclic graph (DAG) that has scores with the property that any DAG that can represent the distribution beats any DAG that can’t, and for two DAGs that represent the distribution, if one has fewer parameters than the other, the one with the fewest parameter wins. In this framework there is no need to assign scores to causal structures with unobserved common causes. The paper also describes how the existence of a Y structure shows the presence of an unconfounded causal relation, without assuming that there are no hidden common causes. 1
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance. 1
Generalizing The Derivation Of The Schwarz Information Criterion
, 1999
"... The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Althoug ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Although the original derivation assumes that the observed data is independent, identically distributed, and arising from a probability distribution in the regular exponential family, SIC has traditionally been used in a much larger scope of model selection problems. To better justify the widespread applicability of SIC, we derive the criterion in a very general framework: one which does not assume any specific form for the likelihood function, but only requires that it satisfies certain non-restrictive regularity conditions.
On the Selection of Markov Random Field Texture Models
, 1992
"... ABSTRACT. The problem of selecting pair-potentials of finite range for Gibbs random fields is considered as an important step in modelling multi-textured images. In a decision theoretic set-up, the Bayesian procedure is approximated by using Laplace's method for asymptotic expansion of integrals. Ce ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
ABSTRACT. The problem of selecting pair-potentials of finite range for Gibbs random fields is considered as an important step in modelling multi-textured images. In a decision theoretic set-up, the Bayesian procedure is approximated by using Laplace's method for asymptotic expansion of integrals. Certain frequentist properties of the selection procedure are investigated. In particular, its consistency is justified regardless of phase transition of the Gibbs random fields. 1.
Identification and likelihood inference for recursive linear models with correlated errors
, 2007
"... In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by recursive systems of linear structural equations. Such models appear in particular in seemingly unrelated regressions, structural equation modelling, simultaneous equati ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In recursive linear models, the multivariate normal joint distribution of all variables exhibits a dependence structure induced by recursive systems of linear structural equations. Such models appear in particular in seemingly unrelated regressions, structural equation modelling, simultaneous equation systems, and in Gaussian graphical modelling. We show that recursive linear models that are ‘bow-free’ are well-behaved statistical models, namely, they are everywhere identifiable and form curved exponential families. Here, ‘bow-free ’ refers to models satisfying the condition that if a variable x occurs in the structural equation for y, then the errors for x and y are uncorrelated. For the computation of maximum likelihood estimates in ‘bow-free ’ recursive linear models we introduce the Residual Iterative Conditional Fitting (RICF) algorithm. Compared to existing algorithms RICF is easily implemented requiring only least squares computations, has clear convergence properties, and finds parameter estimates in closed form whenever possible. KEY WORDS: Linear structural equation model; curved exponential family; maximum likelihood estimation; residual iterative conditional fitting; bow-free acyclic path diagrams; BAP. 1
Learning the Bayesian Network Structure: Dirichlet Prior versus Data
"... In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximum-a-posteriori estimate of the Bayesian network structure. In our first contribution, we t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximum-a-posteriori estimate of the Bayesian network structure. In our first contribution, we theoretically analyze the case of large ESS-values, which complements previous work: among other results, we find that the presence of an edge in a Bayesian network is favoured over its absence even if both the Dirichlet prior and the data imply independence, as long as the conditional empirical distribution is notably different from uniform. In our second contribution, we focus on realistic ESS-values, and provide an analytical approximation to the ‘optimal ’ ESS-value in a predictive sense (its accuracy is also validated experimentally): this approximation provides an understanding as to which properties of the data have the main effect determining the ‘optimal ’ ESS-value. 1
Learning Locally Minimax Optimal Bayesian Networks
"... We consider the problem of learning Bayesian network models in a non-informative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a se ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We consider the problem of learning Bayesian network models in a non-informative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a set of independence relations), and learning the parameters of the model (that fix the probability distribution from the set of all distributions consistent with the chosen structure). There are not many theoretical frameworks that consistently handle both these problems together, the Bayesian framework being an exception. In this paper we propose an alternative, information-theoretic framework which sidesteps some of the technical problems facing the Bayesian approach. The framework is based on the minimax-optimal Normalized Maximum Likelihood (NML) distribution, which is motivated by the Minimum Description Length (MDL) principle. The resulting model selection criterion is consistent, and it provides a way to construct highly predictive Bayesian network models. Our empirical tests show that the proposed method compares favorably with alternative approaches in both model selection and prediction tasks. 1
Bayesian-Motivated Tests of Function Fit and their Asymptotic Frequentist Properties, The Annals of Statistics
, 2004
"... We propose and analyze nonparametric tests of the null hypothesis that a function belongs to a specified parametric family. The tests are based on BIC approximations, πBIC, to the posterior probability of the null model, and may be carried out in either Bayesian or frequentist fashion. We obtain res ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We propose and analyze nonparametric tests of the null hypothesis that a function belongs to a specified parametric family. The tests are based on BIC approximations, πBIC, to the posterior probability of the null model, and may be carried out in either Bayesian or frequentist fashion. We obtain results on the asymptotic distribution of πBIC under both the null hypothesis and local alternatives. One version of πBIC, call it π ∗ BIC, uses a class of models that are orthogonal to each other and growing in number without bound as sample size, n, tends to infinity. We show that √ n(1 − π ∗ BIC) converges in distribution to a stable law under the null hypothesis. We also show that π ∗ BIC can detect local alternatives converging to the null at the rate � log n/n. A particularly interesting finding is that the power of the π ∗ BIC-based test is asymptotically equal to that of a test based on the maximum of alternative log-likelihoods. Simulation results and an example involving variable star data illustrate desirable features of the proposed tests. 1. Introduction. Consider

