Results 1  10
of
42
Optimal Structure Identification with Greedy Search
, 2002
"... In this paper we prove the socalled "Meek Conjecture". In particular, we show that if a is an independence map of another DAG then there exists a finite sequence of edge additions and covered edge reversals in such that (1) after each edge modification and (2) after all mod ..."
Abstract

Cited by 214 (1 self)
 Add to MetaCart
In this paper we prove the socalled "Meek Conjecture". In particular, we show that if a is an independence map of another DAG then there exists a finite sequence of edge additions and covered edge reversals in such that (1) after each edge modification and (2) after all modifications H.
Learning factor graphs in polynomial time and sample complexity
 JMLR
, 2006
"... We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
We study the computational and sample complexity of parameter and structure learning in graphical models. Our main result shows that the class of factor graphs with bounded degree can be learned in polynomial time and from a polynomial number of training examples, assuming that the data is generated by a network in this class. This result covers both parameter estimation for a known network structure and structure learning. It implies as a corollary that we can learn factor graphs for both Bayesian networks and Markov networks of bounded degree, in polynomial time and sample complexity. Importantly, unlike standard maximum likelihood estimation algorithms, our method does not require inference in the underlying network, and so applies to networks where inference is intractable. We also show that the error of our learned model degrades gracefully when the generating distribution is not a member of the target class of networks. In addition to our main result, we show that the sample complexity of parameter learning in graphical models has an O(1) dependence on the number of variables in the model when using the KLdivergence normalized by the number of variables as the performance criterion.
Exact bayesian structure learning from uncertain interventions
 AI & Statistics, In
, 2007
"... We show how to apply the dynamic programming algorithm of Koivisto and Sood [KS04, Koi06], which computes the exact posterior marginal edge probabilities p(Gij = 1D) of a DAG G given data D, to the case where the data is obtained by interventions (experiments). In particular, we consider the case w ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
(Show Context)
We show how to apply the dynamic programming algorithm of Koivisto and Sood [KS04, Koi06], which computes the exact posterior marginal edge probabilities p(Gij = 1D) of a DAG G given data D, to the case where the data is obtained by interventions (experiments). In particular, we consider the case where the targets of the interventions are a priori unknown. We show that it is possible to learn the targets of intervention at the same time as learning the causal structure. We apply our exact technique to a biological data set that had previously been analyzed using MCMC [SPP + 05, EW06, WGH06]. 1
On Local Optima in Learning Bayesian Networks
, 2003
"... This paper proposes and evaluates the kgreedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a tradeoff between greediness and randomness, thus exploring different good local optima when run repeatedly. W ..."
Abstract

Cited by 25 (11 self)
 Add to MetaCart
This paper proposes and evaluates the kgreedy equivalence search algorithm (KES) for learning Bayesian networks (BNs) from complete data. The main characteristic of KES is that it allows a tradeoff between greediness and randomness, thus exploring different good local optima when run repeatedly. When
Bayesian structure learning using dynamic programming and MCMC
 In UAI, 2007b
"... We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how s ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
We show how to significantly speed up MCMC sampling of DAG structures by using a powerful nonlocal proposal based on Koivisto’s dynamic programming (DP) algorithm (11; 10), which computes the exact marginal posterior edge probabilities by analytically summing over orders. Furthermore, we show how sampling in DAG space can avoid subtle biases that are introduced by approaches that work only with orders, such as Koivisto’s DP algorithm and MCMC order samplers (6; 5). 1
Consistent Feature Selection for Pattern Recognition in Polynomial Time
"... We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMALOPTIMAL) vs. finding all features relevant to the target variable (ALLRELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene exp ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMALOPTIMAL) vs. finding all features relevant to the target variable (ALLRELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of data distributions for which there exist consistent, polynomialtime algorithms. We also prove that ALLRELEVANT is much harder than MINIMALOPTIMAL and propose two consistent, polynomialtime algorithms. We argue that the distribution classes considered are reasonable in many practical cases, so that our results simplify feature selection in a wide range of machine learning tasks.
Learning Optimal Bayesian Networks: A Shortest Path Perspective
, 2013
"... In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving the shortest path problem, and the design of heuristic functions for guiding the search. This paper introduces several techniques for addressing the issues. One is an A * search algorithm that learns an optimal Bayesian network structure by only searching the most promising part of the solution space. The others are mainly two heuristic functions. The first heuristic function represents a simple relaxation of the acyclicity constraint of a Bayesian network. Although admissible and consistent, the heuristic may introduce too much relaxation and result in a loose bound. The second heuristic function reduces the amount of relaxation by avoiding directed cycles within some groups of variables. Empirical results show that these methods constitute a promising approach to learning optimal Bayesian network structures.
J.M.: Learning Multivariate Regression Chain Graphs under Faithfulness
 In: Proceedings of the 6th European Workshop on Probabilistic Graphical Models
, 2012
"... The correctness of our algorithm in the main text lies upon the assumption that p is faithful to some MVR CG. This is a strong requirement that we would like to weaken, e.g. by replacing it with the milder assumption that p satisfies the composition property. Correct algorithms for learning directed ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
(Show Context)
The correctness of our algorithm in the main text lies upon the assumption that p is faithful to some MVR CG. This is a strong requirement that we would like to weaken, e.g. by replacing it with the milder assumption that p satisfies the composition property. Correct algorithms for learning directed and acyclic graphs (a.k.a. Bayesian networks) under the composition property assumption exist (Chickering and Meek, 2002; Nielsen et al., 2003). We have recently developed a correct algorithm for learning LWF CGs under the composition property (Peña et al., 2014). The way in which these algorithms proceed (a.k.a. score+search based approach) is rather different from that of our algorithm (a.k.a. constraint based approach). In a nutshell, they can be seen as consisting of two phases: A first phase that starts from the empty graph H and adds single edges to it until p is Markovian with respect to H, and a second phase that removes single edges from H until p is Markovian with respect to H and p is not Markovian with respect to any CG F such that I(H) ⊆ I(F). The success of the first phase is guaranteed by the composition property assumption, whereas the success of the second phase is guaranteed by the socalled Meek’s conjecture (Meek, 1997). Specifically, given two directed and acyclic graphs F and H such that I(H) ⊆ I(F), Meek’s conjecture states that
assignment is hard for the MDL, AIC, and NML costs
 in Proceedings of The 19th Annual Conference on Learning Theory (COLT 2006). 2006, Lecture Notes in Computer Science 4005
"... Abstract. Several hardness results are presented for the parent assignment problem: Given m observations of n attributes x1; : : : ; xn, nd the best parents for xn, that is, a subset of the preceding attributes so as to minimize a xed cost function. This attribute or feature selection task plays a ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Several hardness results are presented for the parent assignment problem: Given m observations of n attributes x1; : : : ; xn, nd the best parents for xn, that is, a subset of the preceding attributes so as to minimize a xed cost function. This attribute or feature selection task plays an important role, e.g., in structure learning in Bayesian networks, yet little is known about its computational complexity. In this paper we prove that, under the commonly adopted fullmultinomial likelihood model, the MDL, BIC, or AIC cost cannot be approximated in polynomial time to a ratio less than 2 unless there exists a polynomialtime algorithm for determining whether a directed graph with n nodes has a dominating set of size log n, a LOGSNPcomplete problem for which no polynomialtime algorithm is known; as we also show, it is unlikely that these penalized maximum likelihood costs can be approximated to within any constant ratio. For the NML (normalized maximum likelihood) cost we prove an NPcompleteness result. These results both justify the application of existing methods and motivate research on heuristic and superpolynomialtime algorithms. 1