## Learning the structure of Markov logic networks (2005)

### Cached

### Download Links

Venue: | In Proceedings of the 22nd International Conference on Machine Learning |

Citations: | 94 - 18 self |

### BibTeX

@INPROCEEDINGS{Kok05learningthe,

author = {Stanley Kok and Pedro Domingos},

title = {Learning the structure of Markov logic networks},

booktitle = {In Proceedings of the 22nd International Conference on Machine Learning},

year = {2005},

pages = {441--448},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Markov logic networks (MLNs) combine logic and probability by attaching weights to first-order clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudo-likelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two real-world domains, and found that it outperforms using off-the-shelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledge-based approaches. 1.

### Citations

953 | Learning Bayesian networks: the combination of knowledge and statistical data
- Heckerman, Geiger, et al.
- 1997
(Show Context)
Citation Context ...ates that differ between the current version of the clause and the original one. (If the clause is new, this is simply its length.) This is similar to the approach used in learning Bayesian networks (=-=Heckerman et al., 1995-=-). Following Richardson and Domingos, we also penalize each weight with a Gaussian prior. A potentially serious problem that arises when evaluating candidate clauses using WPLL is that the optimal (ma... |

878 | Learning logical definitions from relations
- Quinlan
- 1990
(Show Context)
Citation Context ... Dˇzeroski, 1994). In the learning from entailment setting, the system searches for clauses that entail all positive examples of some relation (e.g., Friends) and no negative ones. For example, FOIL (=-=Quinlan, 1990-=-) learns each definite clause by starting with the target relation as the head and greedily adding literals to the body. In the learning from interpretations setting, the examples are databases, and t... |

643 | On the optimality of the simple Bayesian classifier under zero-one loss
- Domingos, Pazzani
- 1997
(Show Context)
Citation Context ...(AL)), a pure knowledge-based approach (the hand-coded KB (KB)), the combination of CLAUDIEN and the hand-coded KB as described above (KB+CL), and two pure probabilistic approaches (naive Bayes (NB) (=-=Domingos & Pazzani, 1997-=-) and Bayesian networks (BN) (Heckerman et al., 1995)). Notice that ILP learners like FOIL and Aleph are not directly comparable with MLNs (or CLAUDIEN), because they only learn to predict designated ... |

608 | Markov logic networks
- Richardson, Domingos
- 2006
(Show Context)
Citation Context ... constraint it is: the higher the weight, the k jsgreater the difference in log probability between a world that satisfies the formula and one that does not, other things being equal. Definition 4.1 (=-=Richardson & Domingos, 2004-=-) A Markov logic network L is a set of pairs (Fi, wi), where Fi is a formula in first-order logic and wi is a real number. Together with a finite set of constants C = {c1, c2, . . . , c |C|}, it defin... |

600 | Probability and Statistics - GROOT - 1975 |

579 | Inducing features of random fields
- Pietra, Pietra, et al.
- 1997
(Show Context)
Citation Context ... for collective classification. 3. Markov Networks A Markov network (also known as Markov random field) is a model for the joint distribution of a set of variables X = (X1, X2, . . . , Xn) ∈ X (Della =-=Pietra et al., 1997-=-). It is composed of an undirected graph G and a set of potential functions φk. The graph has a node for each variable, and Learning the Structure of Markov Logic Networks the model has a potential fu... |

532 | Learning probabilistic relational models - Getoor, Friedman, et al. - 2001 |

472 | Shallow parsing with conditional random fields
- Sha, Pereira
- 2003
(Show Context)
Citation Context ...d using iterative scaling (Della Pietra et al., 1997). However, maximizing the likelihood (or posterior) using a quasiNewton optimization method like L-BFGS has recently been found to be much faster (=-=Sha & Pereira, 2003-=-). Work on learning the structure (i.e., the features) of Markov networks has been relatively sparse to date. Della Pietra et al. (1997) induce conjunctive features by starting with a set of atomic fe... |

285 |
Statistical analysis of non-lattice data
- Besag
- 1975
(Show Context)
Citation Context ...v chain Monte Carlo inference (Della (3) Pietra et al., 1997), Richardson and Domingos found this to be too slow. Instead, they maximized the pseudo-likelihood of the data, a widely-used alternative (=-=Besag, 1975-=-). If x is a possible world (relational database) and xl is the lth ground atom’s truth value, the pseudo-log-likelihood of x given weights w is log P ∗ w(X =x) = n� log Pw(Xl =xl|MBx(Xl)) (4) l=1 whe... |

261 | Adaptive Duplicate Detection Using Learnable String Similarity Metrics - Bilenko, Mooney - 2003 |

238 |
Logical Foundations of Artificial Intelligence
- Genesereth, Nilsson
- 1987
(Show Context)
Citation Context ...lly, we report our experiments with the new algorithm and discuss their results (Section 6). 2. Logic and ILP A first-order knowledge base (KB) is a set of sentences or formulas in first-order logic (=-=Genesereth & Nilsson, 1987-=-). Formulas are constructed using four types of symbols: constants, variables, functions, and predicates. Constant symbols represent objects in the domain of interest (e.g., people: Anna, Bob, Chris, ... |

193 | Efficiently inducing features of conditional random fields - McCallum |

189 | Clausal discovery - Raedt, Dehaspe - 1997 |

131 | Inductive Logic Programming: Techniques and Applications - Lavrac, Dzeroski - 1997 |

84 | P (2005) Discriminative training of Markov logic networks - Singla, Domingos |

77 | Raedt. Towards Combining Inductive Logic Programming with Bayesian Networks - Kersting, De - 2001 |

37 | Maximum entropy modeling with clausal constraints
- Dehaspe
- 1997
(Show Context)
Citation Context ...ed by arbitrary formulas in finite first-order logic, and can compactly represent distributions involving large cliques. Richardson and Domingos used an off-the-shelf ILP system (CLAUDIEN (De Raedt & =-=Dehaspe, 1997-=-)) to learn the structure of MLNs. This is unlikely to give the best results, because CLAUDIEN (like other ILP systems) is designed to simply find clauses that hold with some accuracy and frequency in... |

30 | Mining Complex Models from Arbitrarily Large Databases in Constant Time, in
- Hulten, Domingos
- 2002
(Show Context)
Citation Context ...the chosen clause(s) are the same that would be obtained if we computed the WPLL exactly. This effectively makes the runtime of the WPLL computation independent of the number of predicate groundings (=-=Hulten & Domingos, 2002-=-). At the end of the algorithm we do a final round of weight learning without subsampling. • We use a similar strategy to compute the number of true groundings of a clause, required for the WPLL and i... |

16 | Carlo algorithms for enumeration and reliability problems - RM, Luby, et al. - 1983 |

11 | Tractable induction and classification in firstorder logic via stochastic matching - Sebag, Rouveirol - 1997 |

5 | Learning models of relational stochastic processes
- Sanghai, Domingos, et al.
- 2005
(Show Context)
Citation Context ...inutes (Cora) on a standard 2.8 GHz Pentium 4 CPU. We have also applied our algorithms to two time-changing domains, and shown that they outperform pure ILP and purely probabilistic approaches there (=-=Sanghai et al., 2005-=-). In summary, both our algorithms are effective; we recommend using shortest-first search when accuracy is paramount, and beam search when speed is a concern. 7. Conclusion and Future Work Markov log... |

1 |
The Aleph manual (Tech
- Srinivasan
- 2000
(Show Context)
Citation Context ...e Structure of Markov Logic Networks 6.2. Systems We compared nine versions of MLN learning: weight learning applied to the hand-coded KB (MLN(KB)); structure learning using CLAUDIEN, FOIL and Aleph (=-=Srinivasan, 2000-=-) followed by weight learning (respectively MLN(CL), MLN(FO) and MLN(AL)); structure learning using CLAUDIEN with the KB providing the language bias as in Richardson and Domingos (2004), followed by w... |