Results 21  30
of
55
Quantifier elimination for statistical problems
 In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI99
, 1999
"... Recent improvements on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden va ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Recent improvements on Tarski's procedure for quantifier elimination in the first order theory of real numbers makes it feasible to solve small instances of the following problems completely automatically: 1. listing all equality and inequality constraints implied by a graphical model with hidden variables. 2. Comparing graphical models with hidden variables (i.e., model equivalence, inclusion, and overlap). 3. Answering questions about the identification of a model or portion of a model, and about bounds on quantities derived from a model. 4. Determining whether an independence assertion is implied from a given set of independence assertions. We discuss the foundations of quantifier elimination and demonstrate its application to these problems. 1
On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network
"... Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a succ ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Bayesian Networks (BNs) are useful tools giving a natural and compact representation of joint probability distributions. In many applications one needs to learn a Bayesian Network (BN) from data. In this context, it is important to understand the number of samples needed in order to guarantee a successful learning. Previous works have studied BNs sample complexity, yet they mainly focused on the requirement that the learned distribution will be close to the original distribution which generated the data. In this work, we study a different aspect of the learning task, namely the number of samples needed in order to learn the correct structure of the network. We give both asymptotic results (lower and upperbounds) on the probability of learning a wrong structure, valid in the large sample limit, and experimental results, demonstrating the learning behavior for feasible sample sizes. 1
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
, 2008
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance.
A theoretical study of Y structures for causal discovery
 Proceedings of the Conference on Uncertainty in Artificial Intelligence
, 2006
"... Causal discovery from observational data in the presence of unobserved variables is challenging. Identification of socalled Y substructures is a sufficient condition for ascertaining some causal relations in the large sample limit, without the assumption of no hidden common causes. An example of a ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Causal discovery from observational data in the presence of unobserved variables is challenging. Identification of socalled Y substructures is a sufficient condition for ascertaining some causal relations in the large sample limit, without the assumption of no hidden common causes. An example of a Y substructure is A → C, B → C, C → D. This paper describes the first asymptotically reliable and computationally feasible scorebased search for discrete Y structures that does not assume that there are no unobserved common causes. For any parameterization of a directed acyclic graph (DAG) that has scores with the property that any DAG that can represent the distribution beats any DAG that can’t, and for two DAGs that represent the distribution, if one has fewer parameters than the other, the one with the fewest parameter wins. In this framework there is no need to assign scores to causal structures with unobserved common causes. The paper also describes how the existence of a Y structure shows the presence of an unconfounded causal relation, without assuming that there are no hidden common causes. 1
Generalizing The Derivation Of The Schwarz Information Criterion
, 1999
"... The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Althoug ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
The Schwarz information criterion (SIC, BIC, SBC) is one of the most widely known and used tools in statistical model selection. The criterion was derived by Schwarz (1978) to serve as an asymptotic approximation to a transformation of the Bayesian posterior probability of a candidate model. Although the original derivation assumes that the observed data is independent, identically distributed, and arising from a probability distribution in the regular exponential family, SIC has traditionally been used in a much larger scope of model selection problems. To better justify the widespread applicability of SIC, we derive the criterion in a very general framework: one which does not assume any specific form for the likelihood function, but only requires that it satisfies certain nonrestrictive regularity conditions.
Learning the Bayesian Network Structure: Dirichlet Prior versus Data
"... In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximumaposteriori estimate of the Bayesian network structure. In our first contribution, we t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
In the Bayesian approach to structure learning of graphical models, the equivalent sample size (ESS) in the Dirichlet prior over the model parameters was recently shown to have an important effect on the maximumaposteriori estimate of the Bayesian network structure. In our first contribution, we theoretically analyze the case of large ESSvalues, which complements previous work: among other results, we find that the presence of an edge in a Bayesian network is favoured over its absence even if both the Dirichlet prior and the data imply independence, as long as the conditional empirical distribution is notably different from uniform. In our second contribution, we focus on realistic ESSvalues, and provide an analytical approximation to the ‘optimal ’ ESSvalue in a predictive sense (its accuracy is also validated experimentally): this approximation provides an understanding as to which properties of the data have the main effect determining the ‘optimal ’ ESSvalue. 1
CONSISTENT ESTIMATION OF THE BASIC NEIGHBORHOOD OF MARKOV RANDOM FIELDS
, 2006
"... For Markov random fields on Z d with finite state space, we address the statistical estimation of the basic neighborhood, the smallest region that determines the conditional distribution at a site on the condition that the values at all other sites are given. A modification of the Bayesian Informati ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
For Markov random fields on Z d with finite state space, we address the statistical estimation of the basic neighborhood, the smallest region that determines the conditional distribution at a site on the condition that the values at all other sites are given. A modification of the Bayesian Information Criterion, replacing likelihood by pseudolikelihood, is proved to provide strongly consistent estimation from observing a realization of the field on increasing finite regions: the estimated basic neighborhood equals the true one eventually almost surely, not assuming any prior bound on the size of the latter. Stationarity of the Markov field is not required, and phase transition does not affect the results. 1. Introduction. In this paper Markov
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance. 1
Learning Locally Minimax Optimal Bayesian Networks
"... We consider the problem of learning Bayesian network models in a noninformative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a se ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the problem of learning Bayesian network models in a noninformative setting, where the only available information is a set of observational data, and no background knowledge is available. The problem can be divided into two different subtasks: learning the structure of the network (a set of independence relations), and learning the parameters of the model (that fix the probability distribution from the set of all distributions consistent with the chosen structure). There are not many theoretical frameworks that consistently handle both these problems together, the Bayesian framework being an exception. In this paper we propose an alternative, informationtheoretic framework which sidesteps some of the technical problems facing the Bayesian approach. The framework is based on the minimaxoptimal Normalized Maximum Likelihood (NML) distribution, which is motivated by the Minimum Description Length (MDL) principle. The resulting model selection criterion is consistent, and it provides a way to construct highly predictive Bayesian network models. Our empirical tests show that the proposed method compares favorably with alternative approaches in both model selection and prediction tasks. 1