## RSD: Relational subgroup discovery through first-order feature construction (2002)

Venue: | In 12th International Conference on Inductive Logic Programming |

Citations: | 24 - 7 self |

### BibTeX

@INPROCEEDINGS{Lavrac02rsd:relational,

author = {Nada Lavrac and Filip Zelezny and Peter Flach},

title = {RSD: Relational subgroup discovery through first-order feature construction},

booktitle = {In 12th International Conference on Inductive Logic Programming},

year = {2002},

pages = {149--165},

publisher = {Springer}

}

### OpenURL

### Abstract

Relational rule learning is typically used in solving classification and prediction tasks. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction.

### Citations

751 | The CN2 induction algorithm
- Clark, Niblett
- 1989
(Show Context)
Citation Context ...h lation subgroups of a given target class. This property indicates that rule learning may be an appropriate approach for solving the task. However, we argue that standard propositional rule learning =-=[6,16]-=- and relational rule learning algorithms [19] are unsuitable for subgroup discovery. The main drawback is the use of the covering algorithm for rule set construction. Only the first few rules induced ... |

628 | Inverse entailment and Progol
- Muggleton
(Show Context)
Citation Context ...s. Motivated by the need to easily recycle language-bias declarations already present for numerous ILP problems, RSD accepts declarations very similar to those used by the systems Aleph [2] or Progol =-=[17]-=-, including variable typing, moding, setting a recall parameter etc, used to syntactically constrain the set of possible features. For example, a structural predicate declaration in the well-known dom... |

473 |
Fast discovery of association rules
- Agrawal, Mannila, et al.
- 1996
(Show Context)
Citation Context ...application. 1 Introduction Developments in descriptive induction have recently gained much attention. These involve mining of association rules (e.g., the APRIORI association rule learning algorithm =-=[1]-=-), subgroup discovery (e.g., the MIDOS subgroup discovery algorithm [22]), symbolic clustering and other approaches to non-classificatory induction. The methodology presented in this paper can be appl... |

375 | Learning decision lists
- Rivest
- 1987
(Show Context)
Citation Context ...(p(H|B) − p(H)).154 Nada Lavrač, Filip ˇ Zelezn´y, and Peter A. Flach 2.5 Probabilistic Classification The induced rules can be ordered or unordered. Ordered rules are interpreted as a decision list =-=[20]-=- in a straight-forward manner: when classifying a new example, the rules are sequentially tried and the first rule that covers the example is used for prediction. In the case of unordered rule sets, t... |

325 | Rule induction with CN2: Some recent improvements
- Clark, Boswell
- 1991
(Show Context)
Citation Context ... body B is satisfied: Acc(H ← B) =p(H|B). The accuracy measure can be replaced by the weighted relative accuracy, defined in Equation 1. Furthermore, different probability estimates, like the Laplace =-=[4]-=- or the m-estimate [3,7], can be used for estimating the above probability and the probabilities in Equation 1.RSD: Relational Subgroup Discovery 153 Additionally, a rule learner can apply a signific... |

318 |
The multi-purpose incremental learning system AQ15 and its testing application to three medical domains
- Michalski, Mozetic, et al.
- 1986
(Show Context)
Citation Context ...h lation subgroups of a given target class. This property indicates that rule learning may be an appropriate approach for solving the task. However, we argue that standard propositional rule learning =-=[6,16]-=- and relational rule learning algorithms [19] are unsuitable for subgroup discovery. The main drawback is the use of the covering algorithm for rule set construction. Only the first few rules induced ... |

256 | T.: Robust classification for imprecise environments
- Provost, Fawcett
- 2001
(Show Context)
Citation Context ...from those of the complete data-set. The subgroups are identified by conjunctions of symbols of pre-generated first-order features. As a by-product, 1 ROC stands for Receiver Operating Characteristic =-=[11,18]-=-RSD: Relational Subgroup Discovery 155 RSD also provides a file containing the mentioned set of features and offers to export a single relation (as a text file) with rows corresponding to individuals... |

169 |
Estimating probabilities: A crucial task in machine learning
- Cestnik
- 1990
(Show Context)
Citation Context ...Acc(H ← B) =p(H|B). The accuracy measure can be replaced by the weighted relative accuracy, defined in Equation 1. Furthermore, different probability estimates, like the Laplace [4] or the m-estimate =-=[3,7]-=-, can be used for estimating the above probability and the probabilities in Equation 1.RSD: Relational Subgroup Discovery 153 Additionally, a rule learner can apply a significance test to the induced... |

139 | An algorithm for multi-relational discovery of subgroups
- Wrobel
- 1997
(Show Context)
Citation Context ...recently gained much attention. These involve mining of association rules (e.g., the APRIORI association rule learning algorithm [1]), subgroup discovery (e.g., the MIDOS subgroup discovery algorithm =-=[22]-=-), symbolic clustering and other approaches to non-classificatory induction. The methodology presented in this paper can be applied to relational subgroup discovery. As in the MIDOS approach, a subgro... |

101 |
P.: Propositionalization Approaches to Relational Data Mining
- Kramer, Lavrac, et al.
- 2001
(Show Context)
Citation Context ...ibute-value or a class-value to each individual. It is however also possible that individuals are represented by tuples of variables. In our approach to first-order feature construction, described in =-=[9,12,10]-=-, local variables referring to parts of individuals are introduced by so-called structural predicates. The only place where nondeterminacy can occur in individualcentered representations is in structu... |

92 | Learning decision trees using the area under the ROC curve
- Ferri, Flach, et al.
(Show Context)
Citation Context ...e case of ties, we make the appropriate number of steps up and to the right at once, drawing a diagonal line segment. 5 A description of this method applied to decision tree induction can be found in =-=[8]-=-.RSD: Relational Subgroup Discovery 159 Table 1. Basic properties of the experimental data. Domain Individual No. of examples No. of classes KRK KRK position 1000 2 Trains Train 20 2 Telecommunicatio... |

67 | Rule Evaluation Measures: A Unifying View
- Lavrac, Flach, et al.
- 1999
(Show Context)
Citation Context ...d examples (true positives). We use p(H.B) etc. for the corresponding probabilities. We then have that rule accuracy can be expressed as Acc(H ← B) = p(H|B) = p(H.B) p(B) . Weighted relative accuracy =-=[14,21]-=-, a reformulation of one of the heuristics used in MIDOS [22], is defined as follows. WRAcc(H ← B) =p(B).(p(H|B) − p(H)). (1) Weighted relative accuracy consists of two components: generality p(B), an... |

63 | Feature construction with inductive logic programming: A study of quantitative predictions of biological activity aided by structural attributes
- Srinivasan, King
- 1999
(Show Context)
Citation Context ...ifying Features. Motivated by the need to easily recycle language-bias declarations already present for numerous ILP problems, RSD accepts declarations very similar to those used by the systems Aleph =-=[2]-=- or Progol [17], including variable typing, moding, setting a recall parameter etc, used to syntactically constrain the set of possible features. For example, a structural predicate declaration in the... |

42 | 1BC: A first-order Bayesian classifier
- Flach, Lachiche
- 1999
(Show Context)
Citation Context ...ibute-value or a class-value to each individual. It is however also possible that individuals are represented by tuples of variables. In our approach to first-order feature construction, described in =-=[9,12,10]-=-, local variables referring to parts of individuals are introduced by so-called structural predicates. The only place where nondeterminacy can occur in individualcentered representations is in structu... |

40 | Induction in noisy domains
- Clarke, Niblett
- 1987
(Show Context)
Citation Context ...s of two main procedures: the search procedure that performs search in order to find a single rule and the control procedure that repeatedly executes the search. In the propositional rule learner CN2 =-=[5,6]-=-, for instance, the search procedure performs beam search using classification accuracy of the rule as a heuristic function. The accuracy of rule H ← B is equal to the conditional probability of head ... |

38 | An Extended Transformation Approach to Inductive Logic Programming
- Lavrač, Flach
- 1999
(Show Context)
Citation Context ...ibute-value or a class-value to each individual. It is however also possible that individuals are represented by tuples of variables. In our approach to first-order feature construction, described in =-=[9,12,10]-=-, local variables referring to parts of individuals are introduced by so-called structural predicates. The only place where nondeterminacy can occur in individualcentered representations is in structu... |

25 | Predictive performance of weighted relative accuracy
- Todorovski, Flach, et al.
- 2000
(Show Context)
Citation Context ...d examples (true positives). We use p(H.B) etc. for the corresponding probabilities. We then have that rule accuracy can be expressed as Acc(H ← B) = p(H|B) = p(H.B) p(B) . Weighted relative accuracy =-=[14,21]-=-, a reformulation of one of the heuristics used in MIDOS [22], is defined as follows. WRAcc(H ← B) =p(B).(p(H|B) − p(H)). (1) Weighted relative accuracy consists of two components: generality p(B), an... |

14 | Using the m-estimate in rule induction - Dˇzeroski, Cestnik, et al. - 1993 |

14 |
A study of relevance for learning in deductive databases
- Lavrac, Gamberger, et al.
- 1999
(Show Context)
Citation Context ... complete and consistent hypothesis H ′ = H(E,L ′ ), built from the feature set L ′ = L\{l ′ } that excludes l ′ . This theorem is the basis of an irrelevant feature elimination algorithm proposed in =-=[15]-=-. Note that usually the term feature is used to denote a positive literal (or a conjunction of positive literals; let us, for the simplicity of the arguments below, assume that a feature is a single p... |

13 | Analysing and improving the diagnosis of ischaemic heart disease with machine learning
- Kukar, Kononenko, et al.
- 1999
(Show Context)
Citation Context ...from those of the complete data-set. The subgroups are identified by conjunctions of symbols of pre-generated first-order features. As a by-product, 1 ROC stands for Receiver Operating Characteristic =-=[11,18]-=-RSD: Relational Subgroup Discovery 155 RSD also provides a file containing the mentioned set of features and offers to export a single relation (as a text file) with rows corresponding to individuals... |

6 | A learning system for decision support in telecommunications
- Zelezn´y, O
- 2002
(Show Context)
Citation Context ...opular ILP data sets: the King-Rook-King illegal chess endgame positions (KRK) and East-West trains. We applied RSD also to a real-life problem in telecommunications. The data (described in detail in =-=[23]-=-) represent incoming calls to an enterprise, which were transferred to a particular person by the telephone receptionist. The company has two rather independent divisions (‘datacomm’, ‘telecomm’) and ... |

1 |
Learning Logical definitions from relationa
- Quinlan
- 1990
(Show Context)
Citation Context ...s property indicates that rule learning may be an appropriate approach for solving the task. However, we argue that standard propositional rule learning [6,16] and relational rule learning algorithms =-=[19]-=- are unsuitable for subgroup discovery. The main drawback is the use of the covering algorithm for rule set construction. Only the first few rules induced by a covering algorithm may be of interest as... |