Results 1  10
of
100
Discovery of frequent Datalog patterns
, 1999
"... Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of as ..."
Abstract

Cited by 128 (9 self)
 Add to MetaCart
Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present Warmr, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general Datalog formulation of the frequent pattern discovery problem.
Finding Frequent Substructures in Chemical Compounds
, 1998
"... The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge ..."
Abstract

Cited by 116 (9 self)
 Add to MetaCart
The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem is descriptive rather than based on classification; the goal being to find common substructures and properties in chemical compounds, and in this way to contribute to scientific insight. This approach contrasts with previous machine learning research on this problem, which has mainly concentrated on predicting the toxicity of unknown chemicals. Our contribution to the field of data mining is the ability to discover useful frequent patterns that are beyond the complexity of association rules or their known variants. This is vital to the problem, which requires the discovery of patterns that are out of the reach of simple transformations...
MultiInstance Kernels
 In Proc. 19th International Conf. on Machine Learning
, 2002
"... Learning from structured data is becoming increasingly important. However, most prior work on kernel methods has focused on learning from attributevalue data. Only recently, research started investigating kernels for structured data. This paper considers kernels for multiinstance problems  a cla ..."
Abstract

Cited by 112 (3 self)
 Add to MetaCart
Learning from structured data is becoming increasingly important. However, most prior work on kernel methods has focused on learning from attributevalue data. Only recently, research started investigating kernels for structured data. This paper considers kernels for multiinstance problems  a class of concepts on individuals represented by sets. The main result of this paper is a kernel on multiinstance data that can be shown to separate positive and negative sets under natural assumptions. This kernel compares favorably with state of the art multiinstance learning algorithms in an empirical study. Finally, we give some concluding remarks and propose future work that might further improve the results.
Relational Reinforcement Learning
, 2001
"... Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Qfunctions, relational reinforcement learni ..."
Abstract

Cited by 102 (6 self)
 Add to MetaCart
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Qfunctions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.
Topdown induction of clustering trees
 In 15th Int’l Conf. on Machine Learning
, 1998
"... An approach to clustering is presented that adapts the basic topdown induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for firs ..."
Abstract

Cited by 99 (22 self)
 Add to MetaCart
An approach to clustering is presented that adapts the basic topdown induction of decision trees method towards clustering. To this aim, it employs the principles of instance based learning. The resulting methodology is implemented in the TIC (Top down Induction of Clustering trees) system for first order clustering. The TIC system employs the first order logical decision tree representation of the inductive logic programming system Tilde. Various experiments with TIC are presented, in both propositional and relational domains. 1
Solving the multipleinstance problem: A lazy learning approach
 In Proc. 17th International Conf. on Machine Learning
, 2000
"... As opposed to traditional supervised learning, multipleinstance learning concerns the problem of classifying a bag of instances, given bags that are labeled by a teacher as being overall positive or negative. Current research mainly concentrates on adapting traditional concept learning to solve thi ..."
Abstract

Cited by 70 (3 self)
 Add to MetaCart
As opposed to traditional supervised learning, multipleinstance learning concerns the problem of classifying a bag of instances, given bags that are labeled by a teacher as being overall positive or negative. Current research mainly concentrates on adapting traditional concept learning to solve this problem. In this paper we investigate the use of lazy learning and Hausdorff distance to approach the multipleinstance problem. We present two variants of the Knearest neighbor algorithm, called BayesianKNN and CitationKNN, solving the multipleinstance problem. Experiments on the Drug discovery benchmark data show that both algorithms are competitive with the best ones conceived in the concept learning framework. Further work includes exploring of a combination of lazy and eager multipleinstance problem classifiers. 1.
Scaling up inductive logic programming by learning from interpretations. Data Mining and Knowledge Discovery
 Data Mining and Knowledge Discovery
, 1999
"... Abstract. When comparing inductive logic programming (ILP) and attributevalue learning techniques, there is a tradeoff between expressive power and efficiency. Inductive logic programming techniques are typically more expressive but also less efficient. Therefore, the data sets handled by current ..."
Abstract

Cited by 41 (14 self)
 Add to MetaCart
Abstract. When comparing inductive logic programming (ILP) and attributevalue learning techniques, there is a tradeoff between expressive power and efficiency. Inductive logic programming techniques are typically more expressive but also less efficient. Therefore, the data sets handled by current inductive logic programming systems are small according to general standards within the data mining community. The main source of inefficiency lies in the assumption that several examples may be related to each other, so they cannot be handled independently. Within the learning from interpretations framework for inductive logic programming this assumption is unnecessary, which allows to scale up existing ILP algorithms. In this paper we explain this learning setting in the context of relational databases. We relate the setting to propositional data mining and to the classical ILP setting, and show that learning from interpretations corresponds to learning from multiple relations and thus extends the expressiveness of propositional learning, while maintaining its efficiency to a large extent (which is not the case in the classical ILP setting). As a case study, we present two alternative implementations of the ILP system Tilde (Topdown Induction of Logical DEcision trees): Tildeclassic, which loads all data in main memory, and TildeLDS, which loads the examples one by one. We experimentally compare the implementations, showing TildeLDS can handle large data sets (in the order of 100,000 examples or 100 MB) and indeed scales up linearly in the number of examples.
Discovery of Relational Association Rules
 Relational data mining
, 2000
"... Within KDD, the discovery of frequent patterns has been studied in a variety of settings. In its simplest form, known from association rule mining, the task is to discover all frequent item sets, i.e., all combinations of items that are found in a sufficient number of examples. ..."
Abstract

Cited by 34 (1 self)
 Add to MetaCart
Within KDD, the discovery of frequent patterns has been studied in a variety of settings. In its simplest form, known from association rule mining, the task is to discover all frequent item sets, i.e., all combinations of items that are found in a sufficient number of examples.
A Framework for Defining Distances Between FirstOrder Logic Objects
, 1998
"... this paper we develop a framework for distances between clauses and distances between models. The framework can be parametrised by a measure for the distance between atoms. It takes into account subterms common to distinct atoms of a set of atoms in the measurement of the distance between sets. More ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
this paper we develop a framework for distances between clauses and distances between models. The framework can be parametrised by a measure for the distance between atoms. It takes into account subterms common to distinct atoms of a set of atoms in the measurement of the distance between sets. Moreover, for a constant number of variables, the complexity of the distance computation is polynomially bounded by the size of the objects. Initial experiments show that the framework can be the basis of good clustering algorithms. The framework consists of three levels: At the first level one chooses a distance between atoms . The second level upgrades this distance to a distance between sets of atoms. We propose a framework that is a generalisation of three polynomial time computable similarity measures proposed by Eiter and Mannila, and an instance which is a real distance function, computable in polynomial time. We develop also a binary prototype function for sets of points. Prototype fun
Experiments in Predicting Biodegradability
 Applied Artificial Intelligence
, 1999
"... . We present a novel application of inductive logic programming (ILP) in the area of quantitative structureactivity relationships (QSARs). The activity we want to predict is the biodegradability of chemical compounds in water. In particular, the target variable is the halflife in water for aer ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
. We present a novel application of inductive logic programming (ILP) in the area of quantitative structureactivity relationships (QSARs). The activity we want to predict is the biodegradability of chemical compounds in water. In particular, the target variable is the halflife in water for aerobic aqueous biodegradation. Structural descriptions of chemicals in terms of atoms and bonds are derived from the chemicals' SMILES encodings. Definition of substructures are used as background knowledge. Predicting biodegradability is essentially a regression problem, but we also consider a discretized version of the target variable. We thus employ a number of relational classification and regression methods on the relational representation and compare these to propositional methods applied to different propositionalisations of the problem. Some expert comments on the induced theories are also given. 1 Introduction The persistence of chemicals in the environment (or to environmen...