Results 11  20
of
60
Causal Inference in the Presence of Latent Variables and Selection Bias
 In Proceedings of Eleventh Conference on Uncertainty in Artificial Intelligence
"... This paper uses Bayesian network models for that investigation. Bayesian networks, or directed acyclic graph (DAG) models have proved very useful in representing both causal and statistical hypotheses. The nodes of the graph represent vertices, directed edges represent direct influences, and the top ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
(Show Context)
This paper uses Bayesian network models for that investigation. Bayesian networks, or directed acyclic graph (DAG) models have proved very useful in representing both causal and statistical hypotheses. The nodes of the graph represent vertices, directed edges represent direct influences, and the topology of the graph encodes statistical constraints. We will consider features of such models that can be determined from data under assumptions that are related to those routinely applied in experimental situations:
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data
 PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants
Parameter priors for directed acyclic graphical models and the characterization of several probability distributions
 MICROSOFT RESEARCH, ADVANCED TECHNOLOGY DIVISION
, 1999
"... We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distri ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n × n, n ≥ 3, positivedefinite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W11 − W12W −1 is independent 22 W ′ 12 of {W12, W22} for every block partitioning
A new approach for learning belief networks using independence criteria
 International Journal of Approximate Reasoning
, 2000
"... q ..."
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare di ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.
Likelihoods and Parameter Priors for Bayesian Networks
, 1995
"... We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesiannetwork s ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesiannetwork structures from a small set of assessments. The most notable assumption is that of likelihood equivalence, which says that data can not help to discriminate network structures that encode the same assertions of conditional independence. We describe the constructions that follow from these assumptions, and also present a method for directly computing the marginal likelihood of a random sample with no missing observations. Also, we show how these assumptions lead to a general framework for characterizing parameter priors of multivariate distributions. Keywords: Bayesian network, learning, likelihood equivalence, Dirichlet, normalWishart. 1 Introduction A Bayesian network is a graphical repres...
Robust independence testing for constraintbased learning of causal structure
 In UAI
, 2003
"... This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NX ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NXY Z that can be used with any test of independence that relies on contingency table statistics. NXY Z can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KLDivergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small. 1
The TETRAD Project: Constraint Based Aids to Causal Model Specification
 MULTIVARIATE BEHAVIORAL RESEARCH
"... ..."
(Show Context)
HighDimensional Gaussian Graphical Model Selection: Walk Summability AND Local Separation Criterion
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2012
"... We consider the problem of highdimensional Gaussian graphical model selection. We identify a set ofgraphsforwhich anefficient estimation algorithmexists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structuralc ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of highdimensional Gaussian graphical model selection. We identify a set ofgraphsforwhich anefficient estimation algorithmexists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structuralconsistency (orsparsistency) forthe proposedalgorithm, when the number of samples n = Ω(J −2 minlogp), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walksummability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel nonasymptotic necessary conditions on the number of samples required for sparsistency.