Results 1  10
of
22
A Perspective on Inductive Logic Programming
"... . The stateoftheart in inductive logic programming is surveyed by analyzing the approach taken by this field over the past 8 years. The analysis investigates the roles of 1) logic programming and machine learning, of 2) theory, techniques and applications, of 3) various technical problems address ..."
Abstract

Cited by 55 (8 self)
 Add to MetaCart
. The stateoftheart in inductive logic programming is surveyed by analyzing the approach taken by this field over the past 8 years. The analysis investigates the roles of 1) logic programming and machine learning, of 2) theory, techniques and applications, of 3) various technical problems addressed within inductive logic programming. 1 Introduction The term inductive logic programming was first coined by Stephen Muggleton in 1990 [1]. Inductive logic programming is concerned with the study of inductive machine learning within the representations offered by computational logic. Since 1991, annual international workshops have been organized [28]. This paper is an attempt to analyze the developments within this field. Particular attention is devoted to the relation between inductive logic programming and its neighboring fields such as machine learning, computational logic and data mining, and to the role that theory, techniques and implementations, and applications play. The analysis...
CrossMine: Efficient Classification Across Multiple Database Relations
 In Proc. 2004 Int. Conf. on Data Engineering (ICDE’04), Boston,MA
, 2004
"... Most of today's structured data is stored in relational databases. Such a database consists of multiple relations which are linked together conceptually via entityrelationship links in the design of relational database schemas. Multirelational classification can be widely used in many disciplines, ..."
Abstract

Cited by 46 (12 self)
 Add to MetaCart
Most of today's structured data is stored in relational databases. Such a database consists of multiple relations which are linked together conceptually via entityrelationship links in the design of relational database schemas. Multirelational classification can be widely used in many disciplines, such as financial decision making, medical research, and geographical applications. However, most classification approaches only work on single "flat" data relations. It is usually difficult to convert multiple relations into a single flat relation without either introducing huge, undesirable "universal relation" or losing essential information. Previous works using Inductive Logic Programming approaches (recently also known as Relational Mining) have proven effective with high accuracy in multirelational classification. Unfortunately, they suffer from poor scalability w.r.t. the number of relations and the number of attributes in databases.
Statistical Relational Learning for Document Mining
, 2003
"... A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most realworld databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored ..."
Abstract

Cited by 36 (5 self)
 Add to MetaCart
A major obstacle to fully integrated deployment of statistical learners is the assumption that data sits in a single table, even though most realworld databases have complex relational structures. In this paper, we introduce an integrated approach to building regression models from data stored in relational databases. Potential features are generated by structured search of the space of queries to the database, and then tested for inclusion in a logistic regression. We present experimental results for the task of predicting where scientific papers will be published based on relational data taken from CiteSeer. This data includes word counts in the document, frequently cited authors or papers, cocitations, publication venues of cited papers, word cooccurrences, and word counts in cited or citing documents. Our approach results in classification accuracies superior to those achieved when using classical "flat" features. Our classification task also serves as a "where to publish?" conference/journal recommendation task.
Stochastic Propositionalization of NonDeterminate Background Knowledge
 Proceedings of the 8th International Conference on Inductive Logic Programming, volume 1446 of Lecture Notes in Artificial Intelligence
, 1997
"... Both propositional and relational learning algorithms require a good representation to perform well in practice. Usually such a representation is either engineered manually by domain experts or derived automatically by means of socalled constructive induction. Inductive Logic Programming (ILP) algo ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
Both propositional and relational learning algorithms require a good representation to perform well in practice. Usually such a representation is either engineered manually by domain experts or derived automatically by means of socalled constructive induction. Inductive Logic Programming (ILP) algorithms put a somewhat less burden on the data engineering effort as they allow for a structured, relational representation of background knowledge. In chemical and engineering domains, a common representational device for graphlike structures are socalled nondeterminate relations. Manually engineered features in such domains typically test for or count occurrences of specific substructures having specific properties. However, representations containing nondeterminate relations pose a serious efficiency problem for most standard ILP algorithms. Therefore, we have devised a stochastic algorithm to automatically derive features from nondeterminate background knowledge. The algorithm conduc...
DOGMA: A GAbased relational learner
 Proceedings of the 8th International Conference on Inductive Logic Programming
, 1998
"... We describe a GAbased concept learning/theory revision system DOGMA and discuss how it can be applied to relational learning. The search for better theories in DOGMA is guided by anovel tness function that combines the minimal description length and information gain measures. To show the e cacy of ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
We describe a GAbased concept learning/theory revision system DOGMA and discuss how it can be applied to relational learning. The search for better theories in DOGMA is guided by anovel tness function that combines the minimal description length and information gain measures. To show the e cacy of the system we compare it to other learners in three relational domains.
Frequent query discovery: a unifying ILP approach to association rule mining
, 1998
"... Discovery of frequent patterns has been studied in a variety of data mining (DM) settings. In its simplest form, known from association rule mining, the task is to find all frequent itemsets, i.e., to list all combinations of items that are found in a sufficient number of examples. A similar task in ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Discovery of frequent patterns has been studied in a variety of data mining (DM) settings. In its simplest form, known from association rule mining, the task is to find all frequent itemsets, i.e., to list all combinations of items that are found in a sufficient number of examples. A similar task in spirit, but at the opposite end of the complexity scale, is the Inductive Logic Programming (ILP) approach where the goal is to discover queries in first order logic that succeed with respect to a sufficient number of examples. We discuss the relationship of ILP to frequent pattern discovery. On one hand, our goal is to relate data mining problems to ILP. On another hand, we want to demonstrate how ILP can be used to solve both existing and new data mining problems. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered. From an ILP viewpoint, however, it can be argued that these settings ar...
Computational Logic and Machine Learning: A roadmap for Inductive Logic Programming
 Technical Report, J. Stefan Institute
, 1998
"... Computational logic has already significantly influenced (symbolic) machine learning through the field of inductive logic programming (ILP) which is concerned with the induction of logic programs from examples and background knowledge. In ILP, the shift of attention from program synthesis to knowled ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Computational logic has already significantly influenced (symbolic) machine learning through the field of inductive logic programming (ILP) which is concerned with the induction of logic programs from examples and background knowledge. In ILP, the shift of attention from program synthesis to knowledge discovery resulted in advanced techniques that are practically applicable for discovering knowledge in relational databases. Machine learning, and ILP in particular, has the potential to influence computational logic by providing an application area full of industrially significant problems, thus providing a challenge for other techniques in computational logic. This paper gives a brief introduction to ILP, presents stateoftheart ILP techniques for relational knowledge discovery as well as some research and organizational directions for further developments in this area. 1 Introduction Inductive logic programming (ILP) [35, 39, 29] is a research area that has its backgrounds in induct...
Dimensionality Reduction in ILP: A Call To Arms
"... The recent uprise of Knowledge Discovery in Databases (KDD) has underlined the need for machine learning algorithms to be able to tackle largescale applications that are currently beyond their scope. One way to address this problem is to use techniques for reducing the dimensionality of the learning ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
The recent uprise of Knowledge Discovery in Databases (KDD) has underlined the need for machine learning algorithms to be able to tackle largescale applications that are currently beyond their scope. One way to address this problem is to use techniques for reducing the dimensionality of the learning problem by reducing the hypothesis space and/or reducing the example space. While research in machine learning has devoted considerable attention to such techniques, they have so far been neglected in ILP research. The purpose of this paper is to motivate research in this area and to present some results on windowing techniques. 1 Introduction One of the most often heard prejudices against ILP algorithms is that they are only applicable to toy problems and will not scale up to applications of significant size. While it is our firm belief that the order of magnitude of this unspecified "significant size" is monotonicly increasing in order to keep the argument alive, it is nevertheless indis...
A Unifying View of Knowledge Representation for Inductive Learning
 In preparation
, 2000
"... This paper provides a foundation for inductive learning based on the use of higherorder logic for knowledge representation. In particular, the paper (i) provides a systematic individualsasterms approach to knowledge representation for inductive learning, and demonstrates the utility of types an ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
This paper provides a foundation for inductive learning based on the use of higherorder logic for knowledge representation. In particular, the paper (i) provides a systematic individualsasterms approach to knowledge representation for inductive learning, and demonstrates the utility of types and higherorder constructs for this purpose; (ii) gives a systematic way of constructing predicates for use in induced definitions; (iii) widens the applicability of decisiontree algorithms beyond the usual attributevalue setting to the classification of individuals with complex structure; and (iv) shows how to induce definitions which are comprehensible and have predictive power. The paper contains ten illustrative applications involving a variety of types to which a decisiontree learning system is applied. The e#ectiveness of the approach is further demonstrated by applying the learning system to two larger benchmark applications. 1 Introduction Inductive learning focuses on tec...
An efficient multirelational naive Bayesian classifier based on semantic relationship graph
 In Proceedings of the 4th international workshop on Multirelational mining (MRDM ’05
"... Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into a single one usually causes many problems, development of multirelational classification algorithms becomes important and attracts many researchers ’ interests. Existing works about extending Naïve Bayes to deal with multirelational data either have to transform data stored in tables to mainmemory Prolog facts, or limit the search space to only a small subset of real world applications. In this work, we aim at solving these problems and building an efficient, accurate Naïve Bayesian classifier to deal with data in multiple tables directly. We propose an algorithm named GraphNB, which upgrades Naïve Bayesian classifier to deal with multiple tables directly. In order to take advantage of linkage relationships among tables, and treat different tables linked to the target table differently, a semantic relationship graph is developed to describe the relationship and to avoid unnecessary joins. Furthermore, to improve accuracy, a pruning strategy is given to simplify the graph to avoid examining too many weakly linked tables. Experimental study on both realworld and synthetic databases shows its high efficiency and good accuracy.