## Scalability and efficiency in multi-relational data mining

### Cached

### Download Links

- [www.cs.kuleuven.be]
- [www.cs.kuleuven.ac.be]
- DBLP

### Other Repositories/Bibliography

Citations: | 18 - 0 self |

### BibTeX

@MISC{Blockeel_scalabilityand,

author = {Hendrik Blockeel and Michele Sebag},

title = {Scalability and efficiency in multi-relational data mining},

year = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

3179 | Genetic programming: on the programming of computers by means of natural selection - Koza - 1992 |

2060 |
Genetic algorithms in search, optimization and machine learning
- Goldberg
- 1989
(Show Context)
Citation Context ...rantees of optimality. A third strategy implements the stochastic, population-based exploration of the hypothesis space. This strategy is that of evolutionary computation and genetic algorithms (GAs) =-=[31; 7]-=-, which crudely mimic the Darwinian principle of survival of thesttest. During each generation, candidate hypotheses are generated by randomly perturbing the current pool of hypotheses; the resulting ... |

939 |
Evolutionary Algorithms in Theory and Practice
- BÄCK
- 1996
(Show Context)
Citation Context ...rantees of optimality. A third strategy implements the stochastic, population-based exploration of the hypothesis space. This strategy is that of evolutionary computation and genetic algorithms (GAs) =-=[31; 7]-=-, which crudely mimic the Darwinian principle of survival of thesttest. During each generation, candidate hypotheses are generated by randomly perturbing the current pool of hypotheses; the resulting ... |

763 | Genetic Programming II: Automatic Discovery of Reusable Programs - Koza - 1994 |

613 | Where the really hard problems are
- Cheeseman, Kanefsky, et al.
- 1991
(Show Context)
Citation Context ... -subsumption testing and logical querying are equivalent to constraint satisfaction problems [33]. In CSPs, another framework for analyzing the computational complexity has appeared in the nineties [=-=17]-=-. Contrasting with average- and worst-case analysis, this novel framework handles complexity as a random variable depending on the order parameters of complexity (e.g., constraint density and tightnes... |

505 |
Fast discovery of association rules
- Agrawal, Mannila, et al.
- 1996
(Show Context)
Citation Context ...]. In each step, some candidate hypotheses are generated from the current hypotheses using so-called renement operators. For instance, the construction of Lk+1 candidates from the Lk ones in Apriori [=-=-=-1], constitutes a renement operator. Along the same lines, many renement operators in ILP proceed by adding or removing a literal from the current hypothesis. Besides limiting the hypothesis space thr... |

361 |
Genetic Programming – An introduction; On the automatic evolution of computer programs and its applications
- Banzhaf, Francone, et al.
- 1998
(Show Context)
Citation Context ...e precisely Genetic Programming [43]. Genetic Programming extends the principles of genetic algorithms to tree-structured search space, and was specically designed for optimization in program spaces [=-=44; 8]-=-. It has been used to explore Horn clauses and context-free grammar spaces [84; 68]. It also allows for direct exploration of higher order logic languages, such as Escher [49; 39]. Interestingly, ecie... |

272 | Tabled evaluation with delaying for general logic programs
- Chen, Warren
- 1996
(Show Context)
Citation Context ...ystems can be applied in practice. We add that current Prolog technology includes methods for storing intermediate results at the fact level (instead of the predicate level): this is known as tabling =-=[18]-=-. Tabling can be considered a lazy version of the materialization mentioned above, and might in some cases be preferable over materialization of complete predicates. The main problem remains the choic... |

214 | Query Optimization in Database Systems
- Jarke, Koch, et al.
- 1984
(Show Context)
Citation Context ...f relational algebra operations is a well-known method for improving the eciency of a computation. For instance, when applying consecutive selections it is useful to apply the most selective onessrst =-=[38-=-]. Similar techniques can be used to improve the eciency of clause-example matching [78]. Note, however, an important dierence between query execution in relational or deductive databases and in Prolo... |

213 | Solving the multiple-instance problem with axisparallel rectangles
- Dietterich, Lathrop, et al.
- 1997
(Show Context)
Citation Context ...stive survey of all approaches beyond the propositional ones in machine learning and data mining. For instance, we did not detail the Multiple Instance Problem paradigm introduced by Dietterich et al =-=[24; 54]-=- which has been analyzed as intermediate between propositional and fully relational settings [19]. Description logics [16] also constitutes a setting most relevant to data mining in particular relatio... |

150 | On the relative expressiveness of description logics and predicate logics. AIJ - Borgida - 1996 |

135 | Discovery of frequent datalog patterns
- Dehaspe, Toivonen
- 1999
(Show Context)
Citation Context ... built and converted into predictive rules, and trend analysis was performed on frequent relational patterns found by the ACE-ilProlog implementation of thesrst order association rule algorithm Warmr =-=[23-=-]. ACE-ilProlog incorporates many of the techniques mentioned here (query packs, data sampling, . . . ). In a dierent investigation [79], the same ACE-ilProlog system was applied to a 194MB database o... |

132 | Raedt. Top-down induction of first order logical decision trees - Blockeel, De - 1998 |

103 | Propositionalization approaches to relational data mining - Kramer, Lavrač, et al. - 2000 |

97 | Controlling the complexity of learning in logic through syntactic and task-oriented models - Kietz, Wrobel - 1992 |

86 | Molecular feature mining in HIV data - Kramer, Raedt, et al. - 2001 |

77 | The Complexity of Acyclic Conjunctive Queries
- Gottlob, Leone, et al.
- 2001
(Show Context)
Citation Context ... and is independent from the extension of the clause, will certainly succeed and need not be tested again. Another line of research examines the case of acyclic conjunctive queries. Following Gottlob =-=[32]-=-, Horvath and Wrobel [37] discuss how eciency gains can be obtained by considering only acyclic conjunctive queries, a relatively general subclass of queries for which the matching problem is tractabl... |

66 | First order jk-clausal theories are PAClearnable
- Raedt, Džeroski
- 1994
(Show Context)
Citation Context ...centered" representation. The use of individual-centered representations has a number of advantages. First, it has a positive eect on the theoretical learnability of concepts. De Raedt and Dzeros=-=ki [21]-=- have obtained positive PAC-learnability results for this setting, and this is mainly due to the assumption that patterns are searched within individuals and that the description of individuals in the... |

63 | Improving the efficiency of inductive logic programming through the use of query packs - Blockeel, Dehaspe, et al. |

58 |
Logical settings for concept-learning
- Raedt
- 1997
(Show Context)
Citation Context ...ete cases (Section 8), and conclude in Section 9. 2. REPRESENTATIONAL ASPECTS A distinction is sometimes made between two paradigms in ILP: learning from entailment, and learning from interpretations =-=[1-=-9]. Which one is used, has an eect on eciency and scalability. This is mainly because they dier with respect to assumptions of locality of relevant information. We do not go into technical details her... |

43 |
Refining the phase transitions in combinatorial search
- Hogg
- 1996
(Show Context)
Citation Context ...ission from A. Giordana and L. Saitta.) Figure 4: Percentage of successful -subsumption tests in plane (m; L) over 1,000 pairs h; e, for N = 100 and n = 10 This experiment conrms thesndings of CSPs [3=-=5-=-]. The effective complexity landscape depicted in Fig. 3 shows that the -subsumption cost is almost always negligible, except in a narrow region termed the phase transition region. The average complex... |

42 | Scaling up inductive logic programming by learning from interpretations
- Blockeel, Raedt, et al.
- 1999
(Show Context)
Citation Context ...mber of techniques follow this approach. Alternatively, memory-wise scalability can be improved by storing data in internal memory as eciently as possible. 6.1 Processing Data on Disk Blockeel et al. =-=[14]-=- describe a version of thesrst order decision tree induction algorithm Tilde that processes an ILP knowledge base without loading it entirely into main memory. The approach is based on the level-wise ... |

32 | An efficient subsumption algorithm for inductive logic programming
- Kietz, Lübbe
- 1994
(Show Context)
Citation Context .... As this is not possible in general (ILP systems look for connected clauses), a SIGKDD Explorations. Volume 4, Issue 2 - page 5 relaxed version of decomposability known as k-locality has been dened [=-=41-=-]; the idea is to take advantage of the fact that sets of literals are independent, after the instantiation of some variables has been dened. Example 3. Consider the conjunctive query ? p(X; Y ); q(Y;... |

29 | A Multi-Relational Decision Tree Learning Algorithm: Implementation and Experiments
- Atramentov, Leiva, et al.
- 2003
(Show Context)
Citation Context ... to dene a pattern language, where the patterns are graphical query representations (\selection graphs"). The use of selection graphs as patterns has since then been adopted by several other auth=-=ors [6; 5-=-]. Syntactical biases are often explicitly enforced through search operators (see below). An alternative is to include type constraints into the denition of H [49; 39], and make no restriction about t... |

23 | Abstraction and Phase Transitions in Relational Learning
- Saitta, Zucker
- 2000
(Show Context)
Citation Context .... For this reason, several optimization heuristics have been developed and will be presented. Last, a theoretical study of -subsumption, based on the phase transition paradigm [36] has been achieved [=-=2-=-9] and its impact on the scalability of ILP has been examined on articial problems. These results are brie y summarized and discussed. 4.1 Logical queries and -subsumption A (candidate) solution is mo... |

22 | An experimental evaluation of coevolutive concept learning
- Anglano, Giordana, et al.
- 1998
(Show Context)
Citation Context ...ns might indifferently generalize or specialize the hypotheses, which makes it easier to escape from local optima. GA-based relational learning, such as implemented in Regal [28], Dogma [34] or G-Net =-=[3]-=-, usually provides very accurate and predictively ecient hypotheses, at a high computational cost; a few hundred of generations is routinely achieved, generating a few hundred candidate hypotheses eac... |

22 | Relational knowledge discovery in databases
- Blockeel, Raedt
- 1996
(Show Context)
Citation Context ...APPROACHES The suitability of ILP to mine relational databases has been recognized early on in the history of ILP, and some research in ILP has explicitly focused on the relational database viewpoint =-=[86; 12; 42-=-]. As ILP uses a logical representation, which is dierent from but largely equivalent to a relational database representation, a natural question is how ILP systems could be adapted to work directly w... |

22 |
Regal: an integrated system for learning relations using genetic algorithms
- Giordana, Saitta
- 1993
(Show Context)
Citation Context ... is that these perturbations might indifferently generalize or specialize the hypotheses, which makes it easier to escape from local optima. GA-based relational learning, such as implemented in Regal =-=[28]-=-, Dogma [34] or G-Net [3], usually provides very accurate and predictively ecient hypotheses, at a high computational cost; a few hundred of generations is routinely achieved, generating a few hundred... |

21 |
Top-down induction of order logical decision trees
- Blockeel, Raedt
- 1998
(Show Context)
Citation Context ...We call this the subdatabase describing Jane. It is shown 1 More explanations and illustrations are given by De Raedt et al. [20] and a constructive denition of this subdatabase is given by Blockeel [=-=11]-=-, p. 77-79. Student SName Maj Min jane math phil. Course CName Prof Cred calculus Jones 4 algebra Smith 3 Follows SName CName jane algebra jane calculus student(jane, math, phil). course(calculus, jon... |

17 | Dlab: A declarative language bias formalism - Dehaspe, Raedt - 1996 |

17 |
On tractable queries and constraints
- Gottlob, Leone, et al.
- 1999
(Show Context)
Citation Context ...added to the sub-sample), with similar results. 4.3 The phase transition barrier As mentioned earlier on, -subsumption testing and logical querying are equivalent to constraint satisfaction problems [=-=33]-=-. In CSPs, another framework for analyzing the computational complexity has appeared in the nineties [17]. Contrasting with average- and worst-case analysis, this novel framework handles complexity as... |

16 | Three companions for data mining in first order logic - Raedt, Blockeel, et al. - 2001 |

11 |
Analyzing relational learning in the phase transition framework
- Giordana, Saitta, et al.
- 2000
(Show Context)
Citation Context ...y specic hypotheses (almost surely subsuming no examples). The phase transition phenomenon that is observed for the -subsumption test, has far reaching eects on the behavior of relational learners [30=-=-=-]. Comprehensive experiments on articial learning problemssrst show that most learners tend to select hypotheses lying in the phase transition. In retrospect, this should have been expected since this... |

11 | DOGMA: A GA-Based Relational Learner
- Hekanaho
- 1998
(Show Context)
Citation Context ...se perturbations might indifferently generalize or specialize the hypotheses, which makes it easier to escape from local optima. GA-based relational learning, such as implemented in Regal [28], Dogma =-=[34]-=- or G-Net [3], usually provides very accurate and predictively ecient hypotheses, at a high computational cost; a few hundred of generations is routinely achieved, generating a few hundred candidate h... |

10 | A depth controlling strategy for strongly typed evolutionary programming
- Kennedy, Giraud-Carrier
- 1999
(Show Context)
Citation Context ...n adopted by several other authors [6; 5]. Syntactical biases are often explicitly enforced through search operators (see below). An alternative is to include type constraints into the denition of H [=-=49; 39]-=-, and make no restriction about the search operators. 3.2 Search biases and pruning rules As mentioned in the introduction, ILP systems perform a search through a hypothesis space, generating and eval... |

9 |
Learning Statistical Models of Relational Data
- Getoor
- 2001
(Show Context)
Citation Context ...es could be employed SIGKDD Explorations. Volume 4, Issue 2 - page 8 in a relational setting, with equally large eciency gains to be expected. Some work in this category is presented by Getoor et al. =-=[27-=-]. They dene stochastic relational models, which form a probabilistic description of a relational database based on relational Bayesian networks. Further work in this direction seems very promising. 5... |

7 | A first-order representation for knowledge discovery and bayesian classification on relational data - Lachiche, Flach - 2000 |

6 | Mining Model Trees: A Multi-Relational Approach
- Appice, Ceci, et al.
- 2003
(Show Context)
Citation Context ... to dene a pattern language, where the patterns are graphical query representations (\selection graphs"). The use of selection graphs as patterns has since then been adopted by several other auth=-=ors [6; 5-=-]. Syntactical biases are often explicitly enforced through search operators (see below). An alternative is to include type constraints into the denition of H [49; 39], and make no restriction about t... |

5 |
Constraintbased learning of long relational concepts
- Ales-Bianchetti, Rouveirol, et al.
- 2002
(Show Context)
Citation Context ...he target concept. SIGKDD Explorations. Volume 4, Issue 2 - page 7 These experiments on \Needle-in-the-Haystack"-like problems suggest that novel heuristics are required to learn long target conc=-=epts [75; 10]-=-. However, these results must be taken with care, for two reasons. First of all, phase transition depicts a global behavior, and does not say anything on a particular case (meaning that simple problem... |

5 |
Improving the eciency of inductive logic programming through the use of query packs
- Blockeel, Dehaspe, et al.
- 2002
(Show Context)
Citation Context ...ne the database in its original format, the other is to reformat the database, explicating the subdatabases. The latter option is used by some ILP systems that learn from interpretations, such as ACE =-=[15], or -=-those that use a term-based representation [26]. Following Flach and Lachiche's terminology [26], we call these representations \individual-centered", as opposed to the original \predicate-center... |

4 | 1BC: a first order Bayesian classifier - Flach, Lachiche - 1999 |

3 | Cautious induction in inductive logic programming
- ANTHONY, FRISCH
- 1997
(Show Context)
Citation Context ...of renement operators per se; these properties have been studied extensively in ILP. For instance, there is no point in generating a given candidate hypothesis more than once (nonredundancy property [=-=4-=-]). Conversely, no potentially relevant hypothesis should be skipped (completeness property). Nienhuys-Cheng and De Wolf [62] provide theoretical foundations for ILP in which renement operators play a... |

3 |
ML-SMART: A problem solver for learning from examples. Fundamenta Informaticae
- Bergadano, Gemello, et al.
(Show Context)
Citation Context ...teria caused by the amount of data. Beam search is another search strategy; it avoids the limitations of greedy myopic search, by retaining and rening a limited number of the best current hypotheses [=-=9]-=-. The computational cost varies linearly with the beam width. The advantage is that a better learning robustness is obtained through beam search, though there are still no guarantees of optimality. A ... |

3 |
Multirelational data mining, using uml for ilp
- Knobbe, Siebes, et al.
- 2000
(Show Context)
Citation Context ...APPROACHES The suitability of ILP to mine relational databases has been recognized early on in the history of ILP, and some research in ILP has explicitly focused on the relational database viewpoint =-=[86; 12; 42-=-]. As ILP uses a logical representation, which is dierent from but largely equivalent to a relational database representation, a natural question is how ILP systems could be adapted to work directly w... |

3 | Toward Discovery of Deep and Wide First-Order Structures: A Case Study - Horvath, Wrobel - 2001 |

2 |
1BC: a order Bayesian classi
- Flach, Lachiche
- 1999
(Show Context)
Citation Context ... to reformat the database, explicating the subdatabases. The latter option is used by some ILP systems that learn from interpretations, such as ACE [15], or those that use a term-based representation =-=[26]. Followin-=-g Flach and Lachiche's terminology [26], we call these representations \individual-centered", as opposed to the original \predicate-centered" representation. The use of individual-centered r... |

1 |
Highperformance data mining on networks of workstations
- Anglano, Giordana, et al.
- 1999
(Show Context)
Citation Context ...tic search is intrinsically parallel (hypotheses are assessed independently of each other), the computational cost was an incentive to develop parallel implementations of GA-based relational learning =-=[-=-2]. SIGKDD Explorations. Volume 4, Issue 2 - page 4 The search space explored by GA-based relational learning is usually dened from a template selected by the expert, in the line of DLab-like specicat... |

1 |
Three companions for data mining in order logic
- Raedt, Blockeel, et al.
- 2001
(Show Context)
Citation Context ...w connected to Jane and therefore possibly relevant for her classication. We call this the subdatabase describing Jane. It is shown 1 More explanations and illustrations are given by De Raedt et al. [=-=20-=-] and a constructive denition of this subdatabase is given by Blockeel [11], p. 77-79. Student SName Maj Min jane math phil. Course CName Prof Cred calculus Jones 4 algebra Smith 3 Follows SName CName... |

1 |
Towards discovery of deep and wide structures: A case study in the domain of mutagenicity
- Horvath, Wrobel
- 2001
(Show Context)
Citation Context ...the extension of the clause, will certainly succeed and need not be tested again. Another line of research examines the case of acyclic conjunctive queries. Following Gottlob [32], Horvath and Wrobel =-=[37]-=- discuss how eciency gains can be obtained by considering only acyclic conjunctive queries, a relatively general subclass of queries for which the matching problem is tractable. Such classes of querie... |

1 | Mining the UK traffic database. Internal SolEUNet report - Krzywania, Struyf, et al. - 2002 |