Results 1 - 10
of
83
Learning Bayesian Networks is NP-Hard
, 1994
"... Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodness-of-fit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et ..."
Abstract
-
Cited by 98 (1 self)
- Add to MetaCart
Algorithms for learning Bayesian networks from data have two components: a scoring metric and a search procedure. The scoring metric computes a score reflecting the goodness-of-fit of the structure to the data. The search procedure tries to identify network structures with high scores. Heckerman et al. (1994) introduced a Bayesian metric, called the BDe metric, that computes the relative posterior probability of a network structure given data. They show that the metric has a property desireable for inferring causal structure from data. In this paper, we show that the problem of deciding whether there is a Bayesian network---among those where each node has at most k parents---that has a relative posterior probability greater than a given constant is NP-complete, when the BDe metric is used. 1 Introduction Recently, many researchers have begun to investigate methods for learning Bayesian networks, including Bayesian methods [Cooper and Herskovits, 1991, Buntine, 1991, York 1992, Spiegel...
The max-min hill-climbing bayesian network structure learning algorithm
- Machine Learning
, 2006
"... Abstract. We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian n ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
Abstract. We present a new algorithm for Bayesian network structure learning, called Max-Min Hill-Climbing (MMHC). The algorithm combines ideas from local learning, constraint-based, and search-and-score techniques in a principled and effective way. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. In our extensive empirical evaluation MMHC outperforms on average and in terms of various metrics several prototypical and state-of-the-art algorithms, namely the PC, Sparse Candidate, Three Phase Dependency Analysis, Optimal Reinsertion, Greedy Equivalence Search, and Greedy Search. These are the first empirical results simultaneously comparing most of the major Bayesian network algorithms against each other. MMHC offers certain theoretical advantages, specifically over the Sparse Candidate algorithm, corroborated by our experiments. MMHC and detailed results of our study are publicly available at
Improved learning of Bayesian networks
- Proc. of the Conf. on Uncertainty in Artificial Intelligence
, 2001
"... Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set o ..."
Abstract
-
Cited by 33 (6 self)
- Add to MetaCart
Two or more Bayesian network structures are Markov equivalent when the corresponding acyclic digraphs encode the same set of conditional independencies. Therefore, the search space of Bayesian network structures may be organized in equivalence classes, where each of them represents a different set of conditional independencies. The collection of sets of conditional independencies obeys a partial order, the so-called “inclusion order.” This paper discusses in depth the role that the inclusion order plays in learning the structure of Bayesian networks. In particular, this role involves the way a learning algorithm traverses the search space. We introduce a condition for traversal operators, the inclusion boundary condition, which, when it is satisfied, guarantees that the search strategy can avoid local maxima. This is proved under the assumptions that the data is sampled from a probability distribution which is faithful to an acyclic digraph, and the length of the sample is unbounded. The previous discussion leads to the design of a new traversal operator and two new learning algorithms in the context of heuristic search and the Markov Chain Monte Carlo method. We carry out a set of experiments with synthetic and real-world data that show empirically the benefit of striving for the inclusion order when learning Bayesian networks from data.
Estimating high-dimensional directed acyclic graphs with the pc-algorithm
- Journal of Machine Learning Research
, 2005
"... We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
We consider the PC-algorithm (Spirtes et al., 2000) for estimating the skeleton and equivalence class of a very high-dimensional directed acyclic graph (DAG) with corresponding Gaussian distribution. The PC-algorithm is computationally feasible and often very fast for sparse problems with many nodes (variables), and it has the attractive property to automatically achieve high computational efficiency as a function of sparseness of the true underlying DAG. We prove uniform consistency of the algorithm for very high-dimensional, sparse DAGs where the number of nodes is allowed to quickly grow with sample size n, as fast as O(n a) for any 0 < a < ∞. The sparseness assumption is rather minimal requiring only that the neighborhoods in the DAG are of lower order than sample size n. We also demonstrate the PC-algorithm for simulated data. Keywords: asymptotic consistency, DAG, graphical model, PC-algorithm, skeleton 1.
Learning the Structure of Linear Latent Variable Models
- JOURNAL OF MACHINE LEARNING RESEARCH 7 (2006) 191--246
, 2006
"... We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are d-separated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove t ..."
Abstract
-
Cited by 26 (8 self)
- Add to MetaCart
We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are d-separated by a single common unrecorded cause, if such exists; (2) return information about the causal relations among the latent factors so identified. We prove the procedure is point-wise consistent assuming (a) the causal relations can be represented by a directed acyclic graph (DAG) satisfying the Markov Assumption and the Faithfulness Assumption; (b) unrecorded variables are not caused by recorded variables; and (c) dependencies are linear. We compare the procedure with standard approaches over a variety of simulated structures and sample sizes, and illustrate its practical value with brief studies of social science data sets. Finally, we consider generalizations for non-linear systems.
Structure learning in random fields for heart motion abnormality detection
- In CVPR
, 2008
"... Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this pr ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intra-observer variability. Previous work indicates that in order to approach this problem, the interactions between the different heart regions and their overall influence on the clinical condition of the heart need to be considered. To do this, we propose a method for jointly learning the structure and parameters of conditional random fields, formulating these tasks as a convex optimization problem. We consider block-L1 regularization for each set of features associated with an edge, and formalize an efficient projection method to find the globally optimal penalized maximum likelihood solution. We perform extensive numerical experiments comparing the presented method with related methods that approach the structure learning problem differently. We verify the robustness of our method on echocardiograms collected in routine clinical practice at one hospital. 1.
BNT structure learning package: documentation and experiments
- Technical Report FRE CNRS 2645). Laboratoire PSI, Universitè et INSA de Rouen
, 2004
"... Bayesian networks are a formalism for probabilistic reasonning that is more and more used for classification task in data-mining. In some situations, the network structure is given by an expert, otherwise, retrieving it from a database is a NP-hard problem, notably because of the search space comple ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Bayesian networks are a formalism for probabilistic reasonning that is more and more used for classification task in data-mining. In some situations, the network structure is given by an expert, otherwise, retrieving it from a database is a NP-hard problem, notably because of the search space complexity. In the last decade, lot of methods have been introduced to learn the network structure automatically, by simplifying the search space (augmented naive bayes, K2) or by using an heuristic in this search space (greedy search). Most of these methods deal with completely observed data, but some others can deal with incomplete data (SEM, MWST-EM). The Bayes Net Toolbox introduced by [Murphy, 2001a] for Matlab allows us using Bayesian Networks or learning them. But this toolbox is not ’state of the art ’ if we want to perform a Structural Learning, that’s why we propose this package.
On Local Optima in Learning Bayesian Networks
, 2003
"... This paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima when run repeatedly. W ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
This paper proposes and evaluates the k-greedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a trade-off between greediness and randomness, thus exploring different good local optima when run repeatedly. When
Markov equivalence for ancestral graphs
, 2004
"... Ancestral graph models can encode conditional independence relations that arise in directed acyclic graph (DAG) models with latent and selection variables. However, for any 3JJ.cestral graph, there may be several other graphs to which it is Markov equivalent. We state and prove conditions under whic ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
Ancestral graph models can encode conditional independence relations that arise in directed acyclic graph (DAG) models with latent and selection variables. However, for any 3JJ.cestral graph, there may be several other graphs to which it is Markov equivalent. We state and prove conditions under which two maximal ancestral graphs are Markov equivalent to each other, thereby extending analogous results for DAGs given by other authors. 'University of W2k'lhi.ng1;on Technical No. 466. Contents
Graphical models for causal inference
- ROYAL ECONOMIC SOCIETY SUMMER SCHOOL, OXFORD
, 2005
"... (slides) ..."

