Results 1  10
of
46
Optimal Structure Identification with Greedy Search
, 2002
"... In this paper we prove the socalled "Meek Conjecture". In particular, we show that if a is an independence map of another DAG then there exists a finite sequence of edge additions and covered edge reversals in such that (1) after each edge modification and (2) after all mod ..."
Abstract

Cited by 162 (1 self)
 Add to MetaCart
In this paper we prove the socalled "Meek Conjecture". In particular, we show that if a is an independence map of another DAG then there exists a finite sequence of edge additions and covered edge reversals in such that (1) after each edge modification and (2) after all modifications H.
Learning Bayesian Networks is NPHard
, 1994
"... Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodnessoffit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et ..."
Abstract

Cited by 134 (2 self)
 Add to MetaCart
Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodnessoffit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et al. (1994) introduced a Bayesian metric, called the BDe metric, that computes the relative posterior probability of a network structure given data. They show that the metric has a property desireable for inferring causal structure from data. In this paper, we show that the problem of deciding whether there is a Bayesian networkamong those where each node has at most k parentsthat has a relative posterior probability greater than a given constant is NPcomplete, when the BDe metric is used. 1 Introduction Recently, many researchers have begun to investigate methods for learning Bayesian networks, including Bayesian methods [Cooper and Herskovits, 1991, Buntine, 1991, York 1992, Spiegel...
The maxmin hillclimbing bayesian network structure learning algorithm
 Machine Learning
, 2006
"... Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian n ..."
Abstract

Cited by 78 (7 self)
 Add to MetaCart
Abstract. We present a new algorithm for Bayesian network structure learning, called MaxMin HillClimbing (MMHC). The algorithm combines ideas from local learning, constraintbased, and searchandscore techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesianscoring greedy hillclimbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and stateoftheart algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at
Learning the structure of linear latent variable models
 Journal of Machine Learning Research
, 2006
"... We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are dseparated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the ..."
Abstract

Cited by 42 (13 self)
 Add to MetaCart
We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are dseparated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the procedure is pointwise consistent assuming (a) the causal relations can be represented by a directed acyclic graph (DAG) satisfying the Markov Assumption and the Faithfulness Assumption; (b) unrecorded variables are not caused by recorded variables; and (c) dependencies are linear. We compare the procedure with standard approaches over a variety of simulated structures and sample sizes, and illustrate its practical value with brief studies of social science data sets. Finally, we
Improved learning of Bayesian networks
 Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the socalled “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and realworld data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
Temporal Causal Modeling with Graphical Granger Methods
 In Proceedings of the 13th Int. Conference on Knowledge Discovery and Data Mining, 66 – 75: Association for Computing Machinery
, 2007
"... The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical mod ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of “Granger causality”, based on the intuition that a cause helps predict its effects in the future, has gained attention in many domains involving time series data analysis. With the surge of interest in model selection methodologies for regression, such as the Lasso, as practical alternatives to solving structural learning of graphical models, the question arises whether and how to combine these two notions into a practically viable approach for temporal causal modeling. In this paper, we examine a host of related
Efficient learning of hierarchical latent class models
 In: Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence, Boca
, 2004
"... Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are hidden. In earlier work, we have demonstrated in principle the possibility of reconstructing HLC models from data. In this paper, we address the scalability issue and de ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
Hierarchical latent class (HLC) models are treestructured Bayesian networks where leaf nodes are observed while internal nodes are hidden. In earlier work, we have demonstrated in principle the possibility of reconstructing HLC models from data. In this paper, we address the scalability issue and develop a searchbased algorithm that can efficiently learn highquality HLC models for realistic domains. There are three technical contributions: (1) the identification of a set of search operators; (2) the use of improvement in BIC score per unit of increase in model complexity, rather than BIC score itself, for model selection; and (3) the adaptation of structural EM for situations where candidate models contain different variables than the current model. The algorithm was tested on the COIL Challenge 2000 data set and an interesting model was found. 1
On Local Optima in Learning Bayesian Networks
, 2003
"... This paper proposes and evaluates the kgreedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a tradeoff between greediness and randomness, thus exploring different good local optima when run repeatedly. W ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
This paper proposes and evaluates the kgreedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a tradeoff between greediness and randomness, thus exploring different good local optima when run repeatedly. When
The Performance of Bayesian Network Classifiers Constructed Using Different Techniques
 In Working notes of the ECML/PKDD03 workshop on
, 2003
"... This paper presents empirical results for classification using Bayesian networks constructed using the K2 Bayesian metric, and compares these results with those of other researchers who have used Bayesian networks constructed using the MDL score and using conditional independence tests. There ar ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This paper presents empirical results for classification using Bayesian networks constructed using the K2 Bayesian metric, and compares these results with those of other researchers who have used Bayesian networks constructed using the MDL score and using conditional independence tests. There are significant disparities in these results, which is somewhat paradoxical as it is has been shown that the MDL score is asymptotically equivalent to the Bayesian metric, and that structure search based on maximising a score is equivalent to structure search based on conditional independence tests. To resolve this paradox, we analyse the differences in methods used by different researchers to identify the source of the disparities. We conclude that differences in performance are attributable to differences in parameter estimation and structure search heuristics, rather than to differences in the scores/tests used.