Results 11  20
of
68
Improved learning of Bayesian networks
 Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
(Show Context)
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the socalled “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and realworld data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
Parameter priors for directed acyclic graphical models and the characterization of several probability distributions
 MICROSOFT RESEARCH, ADVANCED TECHNOLOGY DIVISION
, 1999
"... We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distri ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
We show that the only parameter prior for complete Gaussian DAG models that satisfies global parameter independence, complete model equivalence, and some weak regularity assumptions, is the normalWishart distribution. Our analysis is based on the following new characterization of the Wishart distribution: let W be an n × n, n ≥ 3, positivedefinite symmetric matrix of random variables and f(W) be a pdf of W. Then, f(W) is a Wishart distribution if and only if W11 − W12W −1 is independent 22 W ′ 12 of {W12, W22} for every block partitioning
Statistical Themes and Lessons for Data Mining
, 1997
"... Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statist ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field that has attracted much attention in a very short period of time. This article highlights some statistical themes and lessons that are directly relevant to data mining and attempts to identify opportunities where close cooperation between the statistical and computational communities might reasonably provide synergy for further progress in data analysis.
A Hybrid Anytime Algorithm for the Construction of Causal Models From Sparse Data
 PROCEEDINGS OF THE FIFTEENTH INTERNATIONAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
, 1999
"... We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
We present a hybrid constraintbased/Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraintbased techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants
A new approach for learning belief networks using independence criteria
 International Journal of Approximate Reasoning
, 2000
"... q ..."
A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare di ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
We propose a new scoring function for learning Bayesian networks from data using score search algorithms. This is based on the concept of mutual information and exploits some wellknown properties of this measure in a novel way. Essentially, a statistical independence test based on the chisquare distribution, associated with the mutual information measure, together with a property of additive decomposition of this measure, are combined in order to measure the degree of interaction between each variable and its parent variables in the network. The result is a nonBayesian scoring function called MIT (mutual information tests) which belongs to the family of scores based on information theory. The MIT score also represents a penalization of the KullbackLeibler divergence between the joint probability distributions associated with a candidate network and with the available data set. Detailed results of a complete experimental evaluation of the proposed scoring function and its comparison with the wellknown K2, BDeu and BIC/MDL scores are also presented.
Likelihoods and Parameter Priors for Bayesian Networks
, 1995
"... We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesiannetwork s ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
We develop simple methods for constructing likelihoods and parameter priors for learning about the parameters and structure of a Bayesian network. In particular, we introduce several assumptions that permit the construction of likelihoods and parameter priors for a large number of Bayesiannetwork structures from a small set of assessments. The most notable assumption is that of likelihood equivalence, which says that data can not help to discriminate network structures that encode the same assertions of conditional independence. We describe the constructions that follow from these assumptions, and also present a method for directly computing the marginal likelihood of a random sample with no missing observations. Also, we show how these assumptions lead to a general framework for characterizing parameter priors of multivariate distributions. Keywords: Bayesian network, learning, likelihood equivalence, Dirichlet, normalWishart. 1 Introduction A Bayesian network is a graphical repres...
The TETRAD Project: Constraint Based Aids to Causal Model Specification
 MULTIVARIATE BEHAVIORAL RESEARCH
"... ..."
(Show Context)
HighDimensional Gaussian Graphical Model Selection: Walk Summability AND Local Separation Criterion
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2012
"... We consider the problem of highdimensional Gaussian graphical model selection. We identify a set ofgraphsforwhich anefficient estimation algorithmexists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structuralc ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(Show Context)
We consider the problem of highdimensional Gaussian graphical model selection. We identify a set ofgraphsforwhich anefficient estimation algorithmexists, and this algorithm is based on thresholding of empirical conditional covariances. Under a set of transparent conditions, we establish structuralconsistency (orsparsistency) forthe proposedalgorithm, when the number of samples n = Ω(J −2 minlogp), where p is the number of variables and Jmin is the minimum (absolute) edge potential of the graphical model. The sufficient conditions for sparsistency are based on the notion of walksummability of the model and the presence of sparse local vertex separators in the underlying graph. We also derive novel nonasymptotic necessary conditions on the number of samples required for sparsistency.
Robust independence testing for constraintbased learning of causal structure
 In UAI
, 2003
"... This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NX ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
This paper considers a method that combines ideas from Bayesian learning, Bayesian network inference, and classical hypothesis testing to produce a more reliable and robust test of independence for constraintbased (CB) learning of causal structure. Our method produces a smoothed contingency table NXY Z that can be used with any test of independence that relies on contingency table statistics. NXY Z can be calculated in the same asymptotic time and space required to calculate a standard contingency table, allows the specification of a prior distribution over parameters, and can be calculated when the database is incomplete. We provide theoretical justification for the procedure, and with synthetic data we demonstrate its benefits empirically over both a CB algorithm using the standard contingency table, and over a greedy Bayesian algorithm. We show that, even when used with noninformative priors, it results in better recovery of structural features and it produces networks with smaller KLDivergence, especially as the number of nodes increases or the number of records decreases. Another benefit is the dramatic reduction in the probability that a CB algorithm will stall during the search, providing a remedy for an annoying problem plaguing CB learning when the database is small. 1