## Learning first-order probabilistic models with combining rules (2005)

### Cached

### Download Links

- [cs.oregonstate.edu]
- [web.engr.oregonstate.edu]
- [www.cs.orst.edu]
- [web.engr.oregonstate.edu]
- [www.biostat.wisc.edu]
- [www.cs.orst.edu]
- [www.cs.orst.edu]
- [www.biostat.wisc.edu]
- [www.machinelearning.org]
- [web.engr.oregonstate.edu]
- [www.cs.orst.edu]
- [web.engr.oregonstate.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE IN MACHINE LEARNING |

Citations: | 25 - 11 self |

### BibTeX

@INPROCEEDINGS{Natarajan05learningfirst-order,

author = {Sriraam Natarajan and Prasad Tadepalli and Thomas G. Dietterich and Alan Fern},

title = {Learning first-order probabilistic models with combining rules},

booktitle = {IN PROCEEDINGS OF THE INTERNATIONAL CONFERENCE IN MACHINE LEARNING},

year = {2005},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many real-world domains exhibit rich relational structure and stochasticity and motivate the development of models that combine predicate logic with probabilities. These models describe probabilistic influences between attributes of objects that are related to each other through known domain relationships. To keep these models succinct, each such influence is considered independent of others, which is called the assumption of “independence of causal influences” (ICI). In this paper, we describe a language that consists of quantified conditional influence statements and captures most relational probabilistic models based on directed graphs. The influences due to different statements are combined using a set of combining rules such as Noisy-OR. We motivate and introduce multi-level combining rules, where the lower level rules combine the influences due to different ground instances of the same statement, and the upper level rules combine the influences due to different statements. We present algorithms and empirical results for parameter learning in the presence of such combining rules. Specifically, we derive and implement algorithms based on gradient descent and expectation maximization for different combining rules and evaluate them on synthetic data and on a real-world task. The results demonstrate that the algorithms are able to learn both the conditional probability distributions of the influence statements and the parameters of the combining rules.

### Citations

8080 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Lairdsand, et al.
- 1977
(Show Context)
Citation Context ... CPT parameters, whereas it changes most of the weights. 3.4. Expectation-Maximization Expectation-Maximization (EM) is a popular method to compute maximum likelihood estimates given incomplete data (=-=Dempster et al., 1977-=-). EM iteratively performs two steps: the Expectation step, where the algorithm computes the expected values of the missing data based on the current parameters, and the Maximization step, where the m... |

2045 |
2001): The Elements of Statistical Learning
- Hastie, Tibshirani, et al.
(Show Context)
Citation Context ... the Maximization step, where the maximum likelihood of the parameters is computed based on the current expected values of the data. We adapted the EM algorithm for two-component mixture models from (=-=Hastie et al., 2001-=-). Consider n rules with the same resultant. Accordingly, there will be n distributions that need to be combined via a weighted mean. Let wi be the weight for rule i, such that � i wi = 1. Table 1. EM... |

1054 | Efficient induction of logic programs
- Muggleton, Feng
- 1992
(Show Context)
Citation Context ...esian logic programs (BLPs) (Kersting & Raedt, 2002), probabilistic horn abduction language (Poole, 1993), probabilistic logic programs (PLPs) (Ngo & Haddawy, 1995), stochastic logic programs (SLPs) (=-=Muggleton, 1996-=-), Bayesian logic (BLOG) models (Milch et al., 2004), relational Bayesian Appearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany, 2005. Copyright 2005 by the... |

294 | Probabilistic Horn abduction and Bayesian networks
- Poole
- 1993
(Show Context)
Citation Context ...over a set of combining rules to determine the one that best fits the data. The combining rules are also well-explored in other relational probabilistic settings such as Probabilistic Horn Abduction (=-=Poole, 1993-=-) and Bayesian logic programs (Kersting and De Raedt, 2000). Indeed, the ability to tractably compose different influence statements or rules is necessary to build compact models. What is different in... |

209 |
Introduction to Statistical Relational Learning
- Getoor, Taskar
- 2007
(Show Context)
Citation Context ...he combining rules. 1. Introduction New challenging application problems that involve rich relational data and probabilistic influences have led to the development of relational probabilistic models (=-=Getoor and Taskar, 2007-=-). The advantage of these models is that they can succinctly represent probabilistic dependencies between the attributes of different related objects, leading to sample-efficient learning. Models are ... |

158 | Adaptive probabilistic networks with hidden variables
- Binder, Koller, et al.
- 1997
(Show Context)
Citation Context ...i ′ 1 ml,r � 3.3. Gradient Descent for Loglikelihood j Pr(y|x r l, j ) (7) In the context of probabilistic modeling, it is more common to maximize the log likelihood of the data given the hypothesis (=-=Binder et al., 1997-=-). From the definition of P(y l|el), we can see that this is L = � logP(yl|el). (8) l Taking the derivative of L with respect to P(y|x i )=θ y|x i, gives ∂L ∂θ y|x i = � 1 P(yl|el) l 1 � i ′ wi ′ r �l... |

116 | Learning relational probability trees
- Neville, Jensen, et al.
- 2003
(Show Context)
Citation Context ...pose, for example, that we aggregated the temperatures as “degree days above θ degrees”: DD(θ) = ∑ i max(0, Tempi−θ). The appropriate threshold temperature θ might be learned from training data. See (=-=Neville et al., 2003-=-) for an approach to learn such parameters. Learning with combining rules is more difficult, because the individual predicted target variables (e.g., Pop1, Pop2, . . .) are unobserved, so the probabil... |

109 | Bayesian logic programs - Kersting, Raedt - 2001 |

101 | Relational bayesian networks
- Jaeger
- 1997
(Show Context)
Citation Context ... et al., 2004), relational Bayesian Appearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany, 2005. Copyright 2005 by the author(s)/owner(s). networks (RBNs) (=-=Jaeger, 1997-=-), and Markov logic (ML) (Domingos & Richardson, 2004). In most of these models, it is possible to describe a generalized conditional probability distribution in which a particular random variable (th... |

89 | JL (2005) TaskTracer: a desktop environment to support multi-tasking knowledge workers
- Dragunov, Dietterich, et al.
(Show Context)
Citation Context ...Learning First-Order Probabilistic Models with Combining Rules We employed the two rules that were presented earlier in Figure 3, combined using the weighted mean. As part of the Task Tracer project (=-=Dragunov et al., 2005-=-), we collected data for 500 documents and 6 tasks. The documents were stored in 11 different folders. Each document was manually assigned to a role with respect to each task with which it was associa... |

74 | M.: Markov logic: A unifying framework for statistical relational learning. [19
- Domingos, Richardson
(Show Context)
Citation Context ...pearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany, 2005. Copyright 2005 by the author(s)/owner(s). networks (RBNs) (Jaeger, 1997), and Markov logic (ML) (=-=Domingos & Richardson, 2004-=-). In most of these models, it is possible to describe a generalized conditional probability distribution in which a particular random variable (the “target”) is conditioned on a set of parent variabl... |

65 | Causal independence for probability assessment and inference using Bayesian networks
- Heckerman, Brees
- 1994
(Show Context)
Citation Context ...he combining rules. 5. Conclusion and Future Work Combining rules help exploit causal independence in Bayesian networks and make representation and inference more tractable in the propositional case (=-=Heckerman & Breese, 1994-=-). In first order languages, they allow succinct Average Error 0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 GDMS GDMS-True GDMS-Fixed 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Number of Tra... |

46 | Probabilistic models for relational data
- Heckerman, Meek, et al.
- 2004
(Show Context)
Citation Context ...robabilistic models. Several formalisms have been introduced including probabilistic relational models (PRMs) (Getoor et al., 2001), directed acyclic probabilistic entity-relationship (DAPER) models (=-=Heckerman et al., 2004-=-), Bayesian logic programs (BLPs) (Kersting & Raedt, 2002), probabilistic horn abduction language (Poole, 1993), probabilistic logic programs (PLPs) (Ngo & Haddawy, 1995), stochastic logic programs (S... |

44 | Learning probabilities for noisy first-order rules
- Koller, Pfeffer
(Show Context)
Citation Context ...meters remains small. In previous work, Koller and Pfeffer developed an expectation maximization (EM) algorithm for learning in the presence of combining rules and missing data in relational context (=-=Koller and Pfeffer, 1997-=-). Kersting and DeRaedt implemented a gradient descent algorithm for the same (Kersting and De Raedt, 2001). In this paper, we generalize and extend the above work to multi-level combining rules. The ... |

30 | BLOG: Relational modeling with unknown objects
- Milch, Marthi, et al.
- 2004
(Show Context)
Citation Context ...002), probabilistic horn abduction language (Poole, 1993), probabilistic logic programs (PLPs) (Ngo & Haddawy, 1995), stochastic logic programs (SLPs) (Muggleton, 1996), Bayesian logic (BLOG) models (=-=Milch et al., 2004-=-), relational Bayesian Appearing in Proceedings of the 22 nd International Conference on Machine Learning, Bonn, Germany, 2005. Copyright 2005 by the author(s)/owner(s). networks (RBNs) (Jaeger, 1997)... |

30 | Probabilistic logic programming and Bayesian networks
- Ngo, Haddawy
- 1995
(Show Context)
Citation Context ...elationship (DAPER) models (Heckerman et al., 2004), Bayesian logic programs (BLPs) (Kersting & Raedt, 2002), probabilistic horn abduction language (Poole, 1993), probabilistic logic programs (PLPs) (=-=Ngo & Haddawy, 1995-=-), stochastic logic programs (SLPs) (Muggleton, 1996), Bayesian logic (BLOG) models (Milch et al., 2004), relational Bayesian Appearing in Proceedings of the 22 nd International Conference on Machine ... |

25 | Logical bayesian networks and their relation to other probabilistic logical models - Fierens, Blockeel, et al. - 2005 |

24 | Adaptive Bayesian logic programs
- Kersting, Raedt
- 2001
(Show Context)
Citation Context ... algorithm for learning in the presence of combining rules and missing data in relational context (Koller & Pfeffer, 1997). Kersting and DeRaedt implemented a gradient descent algorithm for the same (=-=Kersting & Raedt, 2001-=-). In this paper, we generalize and extend the above work to weighted combinations of individual combining rules. We present algorithms based on both gradient descent and EM. The algorithms are tested... |

21 | Efficient computation for the noisy MAX - Díez, Galán - 2003 |

18 | Learning from sparse data by exploiting monotonicity constraints
- Altendorf, Restificar, et al.
- 2005
(Show Context)
Citation Context ...though in this paper we do not learn with these constraints, we have well-defined semantics of the constraints in FOCIL and learning algorithms for propositional models with monotonicity constraints (=-=Altendorf et al., 2005-=-). 2.2 Combining Rules The following example illustrates the multiple-parent problem described in the introduction. Consider an intelligent desktop assistant that must predict the folder of a document... |

16 | L.: Basic principles of learning Bayesian logic programs, in: Probabilistic Inductive Logic Programming - Theory and Applications
- Kersting, Raedt
- 2008
(Show Context)
Citation Context ...ced including probabilistic relational models (PRMs) (Getoor et al., 2001), directed acyclic probabilistic entity-relationship (DAPER) models (Heckerman et al., 2004), Bayesian logic programs (BLPs) (=-=Kersting & Raedt, 2002-=-), probabilistic horn abduction language (Poole, 1993), probabilistic logic programs (PLPs) (Ngo & Haddawy, 1995), stochastic logic programs (SLPs) (Muggleton, 1996), Bayesian logic (BLOG) models (Mil... |

15 | Prl: A probabilistic relational language
- Getoor, Grant
- 2006
(Show Context)
Citation Context ...as: 7Natarajan and Tadepalli and Dietterich and Fern bt(X) | bt(M), bt(F) ←− Mother(M,X),Father(F,X) More recently Getoor and Grant proposed the formalism of Probabilistic Relational Language (PRL) (=-=Getoor and Grant, 2006-=-). The main motivation behind this work is to represent the original work on probabilistic relational models (PRMs) (Getoor et al., 2001) in logical notation. While PRMs exclusively use aggregators to... |

13 | Parameter learning for relational Bayesian networks
- Jaeger
- 2007
(Show Context)
Citation Context ...ameters of the rules. More recently Jaeger considered a weighted combination or a nested combination of the combining rules and used a gradient ascent algorithm for optimizing the objective function (=-=Jaeger, 2007-=-). This technique has been applied to his formalism of Relational Bayesian Networks(RBNs)(Jaeger, 1997). The RBN is compiled into a likelihood graph that is then used for the various computations that... |

7 |
Learning probabilistic relational models. Invited contribution to the book Relational Data
- Getoor, Friedman, et al.
- 2001
(Show Context)
Citation Context ...l data have led to the development of algorithms for learning in first-order relational probabilistic models. Several formalisms have been introduced including probabilistic relational models (PRMs) (=-=Getoor et al., 2001-=-), directed acyclic probabilistic entity-relationship (DAPER) models (Heckerman et al., 2004), Bayesian logic programs (BLPs) (Kersting & Raedt, 2002), probabilistic horn abduction language (Poole, 19... |

7 | A relational hierarchical model for decision-theoretic assistance
- Natarajan, Tadepalli, et al.
- 2007
(Show Context)
Citation Context ...guages. Extending the SRL languages to dynamic domains with actions and utility makes them much more appropriate for compelling real-world applications. Some work has already begun in this direction (=-=Natarajan et al., 2007-=-). 6. Acknowledgement The authors gratefully acknowledge support of the Defense Advanced Research Projects Agency under DARPA grant HR0011-04-1-0005. Views and conclusions contained in this document a... |

4 | De Raedt, L.: Adaptive bayesian logic programs. Inductive Logic Programming - Kersting - 2001 |

2 |
A.: Learning probabilities for noisy first-order rules. In: IJCAI
- Koller, Pfeffer
- 1997
(Show Context)
Citation Context ...meters remains small. In previous work, Koller and Pfeffer developed an expectation maximization (EM) algorithm for learning in the presence of combining rules and missing data in relational context (=-=Koller & Pfeffer, 1997-=-). Kersting and DeRaedt implemented a gradient descent algorithm for the same (Kersting & Raedt, 2001). In this paper, we generalize and extend the above work to weighted combinations of individual co... |

1 |
30 First-Order Probabilistic Models with Combining Rules
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...er the final value of y is a 0 or an 1. 3.6 Expectation-Maximization for Weighted Mean Expectation-Maximization (EM) is a popular method to compute maximum likelihood estimates given incomplete data (=-=Dempster et al., 1977-=-). EM iteratively performs two steps: the Expectation step, where the algorithm computes the expected values of the missing data based on the current parameters, and the Maximization step, where the m... |

1 | Conditional random fields: Probabilistic models for segmenting and labeling sequence data - Tadepalli, Dietterich, et al. - 2001 |