Results 1 
3 of
3
Using Bayesian Classifiers to Combine Rules
 In Working Notes of MRDM04
, 2004
"... Abstract. One of the most popular techniques for multirelational data mining is Inductive Logic Programming (ILP). Given a set of positive and negative examples, an ILP system ideally finds a logical description of the underlying data model that discriminates the positive examples from the negative ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract. One of the most popular techniques for multirelational data mining is Inductive Logic Programming (ILP). Given a set of positive and negative examples, an ILP system ideally finds a logical description of the underlying data model that discriminates the positive examples from the negative examples. However, in multirelational data mining, one often has to deal with erroneous and missing information. ILP systems can still be useful by generating rules that captures the main relationships in the system. An important question is how to combine these rules to form an accurate classifier. An interesting approach to this problem is to use Bayes Net based classifiers. We compare Naïve Bayes, Tree Augmented Naïve Bayes (TAN) and the Sparse Candidate algorithm to a voting classifier. We also show that a full classifier can be implemented as a CLP(BN) program [14], giving some insight on how to pursue further improvements. 1
Towards Feature Selection for DiskBased Multirelational Learners: A Case Study with a Boosting Algorithm
, 2003
"... Feature selection is an important issue for any learning algorithm, since reduced feature sets lead to an improvement in learning time, reduced model complexity and, in many cases, a reduced risk of overfitting. When performing feature selection for RAMbased learning algorithms, we typically assume ..."
Abstract
 Add to MetaCart
Feature selection is an important issue for any learning algorithm, since reduced feature sets lead to an improvement in learning time, reduced model complexity and, in many cases, a reduced risk of overfitting. When performing feature selection for RAMbased learning algorithms, we typically assume that the cost of accessing each feature is uniform. In multirelational data mining, especially when data are to be held in a relational database management system (RDBMS), this is no longer the case. The dominant cost in such a setting is the scan of a relation, so that the cost of using a feature from a relation that needs to be scanned anyway is comparatively small, whereas adding a feature from a relation that has not been used before is high. This means that existing work on feature selection using the uniform cost assumption may not be applicable in a diskbased setting. In this paper, we report the results of a case study that extends prior work on multirelational feature selection, in particular, in the context of a boosting algorithm. As shown by our study, using the previously developed strategies on average leads to larger numbers of relations that need to be considered and loaded into memory, and thus higher cost in a diskbased setting. Instead, a simple relationoriented strategy can be used to minimize cost of accessing additional relations. We describe experimental results to show how this basic strategy interacts with the feature selection variants proposed previously, and show that significant gains are made even in a mainmemory setting.
Random Relational Rules
"... Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, socalled stochastic discrimination provides a framework for combining arbitrary numbers of weak classifiers (in this case randomly ge ..."
Abstract
 Add to MetaCart
Exhaustive search in relational learning is generally infeasible, therefore some form of heuristic search is usually employed, such as in FOIL[1]. On the other hand, socalled stochastic discrimination provides a framework for combining arbitrary numbers of weak classifiers (in this case randomly generated relational