## Overcoming the myopia of inductive learning algorithms with RELIEFF (1997)

Venue: | Applied Intelligence |

Citations: | 38 - 12 self |

### BibTeX

@ARTICLE{Kononenko97overcomingthe,

author = {Igor Kononenko and Edvard Simec and Marko Robnik- Sikonja},

title = {Overcoming the myopia of inductive learning algorithms with RELIEFF},

journal = {Applied Intelligence},

year = {1997},

volume = {7},

pages = {39--55}

}

### Years of Citing Articles

### OpenURL

### Abstract

. Current inductive machine learning algorithms typically use greedy search with limited lookahead. This prevents them to detect significant conditional dependencies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, an extension of RELIEF developed by Kira and Rendell [10], [11], for heuristic guidance of inductive learning algorithms. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each selection step. The algorithm is tested on several artificial and several real world problems and the results are compared with some other well known machine learning algorithms. Excellent results on artificial data sets and two real world problems show the advantage of the presented approach to inductive learning. Keywords: learning from examples, estimating attributes, impurity function, RELIEFF, empirical evaluation 1. Introduction ...

### Citations

3926 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ... the current state in the search space has a major role in the greedy search. Current inductive learning algorithms use variants of impurity functions like information gain, gain ratio[25], gini-index=-=[1]-=-, distance measure[16], j-measure[30], and MDL[14]. However, all these measures assume that attributes are conditionally independent given the class and therefore in domains with strong conditional de... |

741 |
Aha, UCI repository of machine learning databases, in www.ics.uci.edu/∼mlearn/MLRepository.html
- Murphy, W
- 1992
(Show Context)
Citation Context ... 68.1\Sigma1.7% of the classification accuracy for naive Bayes, 64.6\Sigma3.5 for backpropagation, and 72.7\Sigma1.3 for their rule-based classifier. This data set can be obtained from Irvine database=-=[21]. KRK1: The problem -=-of legality of King-RookKing chess endgame positions. The attributes describe the relevant relations between pieces, such as "same rank" and "adjacent file". Originally the data in... |

357 |
A practical approach to feature selection
- Kira, Rendell
- 1992
(Show Context)
Citation Context ...dependencies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, an extension of RELIEF developed by Kira and Rendell =-=[10]-=-, [11], for heuristic guidance of inductive learning algorithms. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each... |

300 | Estimating attributes: Analysis and extensions of RELIEF
- Kononenko
- 1994
(Show Context)
Citation Context ...tead of only one near hit/miss and averages the contribution of all k nearest hits/misses. It was shown that this extension significantly improves the reliability of estimates of attributes' qualities=-=[13]-=-. To overcome the problem of parameter tuning, in all our experiments k was set to 10 which, empirically, gives satisfactory results. In some problems significantly better results can be obtained with... |

259 |
The feature selection problem: Traditional methods and a new algorithm
- Kira, Rendell
- 1992
(Show Context)
Citation Context ...encies between the attributes that describe training objects. Instead of myopic impurity functions and lookahead, we propose to use RELIEFF, an extension of RELIEF developed by Kira and Rendell [10], =-=[11]-=-, for heuristic guidance of inductive learning algorithms. We have reimplemented Assistant, a system for top down induction of decision trees, using RELIEFF as an estimator of attributes at each selec... |

167 |
Estimating probabilities: A crucial task in machine learning
- Cestnik
- 1990
(Show Context)
Citation Context ...herever appropriate, instead of the relative frequency, Assistant-R uses the m-estimate of probabilities, which was shown to often significantly increase the performance of machine learning algorithms=-=[2]-=-, [3]. For prior probabilities Laplace's law of succession is used: P a (X) = N (X) + 1 N + # of possible outcomes (6) where N is the number of all trials and N (X) the number of trials with the outco... |

111 |
Learning by Being Told and Learning from Examples: An Experimental Comparison of the Two Methods of Knowledge Acquisition in the Context of Developing an Expert System for Soybean Disease Diagnosis, Int.J. of Policy Analysis and Info
- Michalski, Chilausky
- 1980
(Show Context)
Citation Context ... real world data sets (SOYB, IRIS, and VOTE are obtained from the Irvine database[21], SAT is obtained from the StatLog database [18]): SOYB: The famous soybean data set used by Michalski & Chilausky =-=[17]-=-. IRIS: The well known Fisher's problem of determining the type of iris flower. MESH3,MESH15: The problem of determining the number of elements for each of the edges of an object in the finite element... |

71 | The application of inductive logic programming to finite element mesh design
- Dolaak, Muggleton
- 1992
(Show Context)
Citation Context ...wn Fisher's problem of determining the type of iris flower. MESH3,MESH15: The problem of determining the number of elements for each of the edges of an object in the finite element mesh design problem=-=[6]-=-. There are five objects for which experts have constructed appropriate meshes. In each of five experiments one object is used for testing and the other four for learning and the results are averaged.... |

66 | Inductive and bayesian learning in medical diagnosis
- Kononenko
- 1993
(Show Context)
Citation Context .... In medical data sets, attributes are typically conditionally independent given the class . Therefore, it is not surprising that the naive Bayesian classifier shows clear advantage on these data sets=-=[12]-=-. It is interesting that the performance of the k-NN algorithm is good in these domains, although worse than the performance of the naive Bayesian classifier. The information score (Table 10) for BREA... |

61 |
On estimating probabilities in tree pruning
- Cestnik, Bratko
- 1991
(Show Context)
Citation Context ...er appropriate, instead of the relative frequency, Assistant-R uses the m-estimate of probabilities, which was shown to often significantly increase the performance of machine learning algorithms[2], =-=[3]-=-. For prior probabilities Laplace's law of succession is used: P a (X) = N (X) + 1 N + # of possible outcomes (6) where N is the number of all trials and N (X) the number of trials with the outcome X.... |

60 |
Handling noise in Inductive Logic Programming
- Deroski, Bratko
- 1992
(Show Context)
Citation Context ...s, such as "same rank" and "adjacent file". Originally the data included five sets of 1000 examples (1000 for learning and 4000 for testing) and was used to test Inductive Logic Pr=-=ogramming algorithms[7]-=-. The reported classification accuracy is 99.7\Sigma0.1 %. We used only one set of 1000 examples (i.e. 700 instances for training). KRK2: Same as KRK1 except that the only available attributes are the... |

55 |
Learning Decision Rules in Noisy Domains
- Niblett, Bratko
- 1987
(Show Context)
Citation Context ...d: minimal number of training instances, minimal attributes information gain and maximal probability of majority class in the current node. For postpruning, the method developed by Niblett and Bratko =-=[22]-=- is used that uses Laplace's law of succession for estimating the expected classification error of the current node commited by pruning/not pruning its subtree. Incomplete data handling: During learni... |

49 | Use of contextual information for feature ranking and discretization
- Hong
- 1997
(Show Context)
Citation Context ... gain [1], information gain [9], gain ratio [25], and distance measure [16] estimate that the contribution of A3 is the highest while attributes A1 and A2 are estimated as completely irrelevant. Hong =-=[8]-=- developed a procedure similar to RELIEF for estimating the quality of attributes, where he directly emphasizes the use of contextual information. The difference to RELIEF is that his approach uses on... |

45 |
Assistant 86: A knowledge elicitation tool for sophisticated users
- Cestnik, Kononenko, et al.
- 1987
(Show Context)
Citation Context ...neral, relatively efficient, and reliable enough to guide the search in the learning process. In this paper a reimplementation of Assistant learning algorithm for top down induction of decision trees =-=[4]-=- is described, named AssistantR. Instead of information gain, Assistant-R uses RELIEFF as a heuristic function for estimating the attributes' quality at each step during the tree generation. Experimen... |

17 |
Information based evaluation criterion for classifier’s performance
- Kononenko, Bratko
- 1991
(Show Context)
Citation Context ...problem, where the experimental methodology was dictated by previous published results, as described in Section 5.4. Besides the classification accuracy, we measured also the average information score=-=[15]-=-. This measure eliminates the influence of prior probabilities and appropriately treats probabilistic answers of the classifier. The average information score is defined as: Inf = P #testing instances... |

16 | I.: SFOIL: Stochastic approach to inductive logic programming
- Pompe, Kovacic, et al.
- 1993
(Show Context)
Citation Context ...he results are averaged. The results reported by Dzeroski [7] for various ILP systems are 12% classification accuracy for FOIL, 22% for mFOIL and 29% for GOLEM and the result reported by Pompe et al. =-=[23]-=- is 28% for SFOIL. The description of the MESH problem is appropriate for ILP systems. For attribute learners only relations with arity 1 (i.e. attributes) can be used to describe the problem. Note th... |

14 |
ID3 Revisited: A distance based criterion for attribute selection
- Mantaras
- 1989
(Show Context)
Citation Context ... the search space has a major role in the greedy search. Current inductive learning algorithms use variants of impurity functions like information gain, gain ratio[25], gini-index[1], distance measure=-=[16]-=-, j-measure[30], and MDL[14]. However, all these measures assume that attributes are conditionally independent given the class and therefore in domains with strong conditional dependencies between att... |

13 | Linear space induction in first order logic with ReliefF
- Pompe, Kononenko
- 1995
(Show Context)
Citation Context ...with the (semi) naive Bayesian classifier could be useful. On the other hand, current ILP systems use greedy search techniques and the heuristics that guide the search are myopic. Pompe and Kononenko =-=[24]-=- implemented an adapted version of RELIEFF in the FOIL like ILP system called ILP-R and preleminary experiments show similar advantages of this system over other ILP systems as Assistant-R has over As... |

12 | Combinatorial Optimization in Inductive Concept Learning - Mladenić - 1993 |

5 |
General Statistics
- Chase, Bown
- 2000
(Show Context)
Citation Context ...e experimental conditions. To verify the significance of differences we used the onetailed t-test with ff = 0:0005 (99.95% confidence level) and the null hypothesis stating that the difference is zero=-=[5]-=-. All the differences in results having the value of statistic t above the threshold (t ? 3:66) are considered significant. The exception from the above methodology were the experiments in the finite ... |

3 |
On biases when estimating multivalued attributes
- Kononenko
- 1995
(Show Context)
Citation Context ...r role in the greedy search. Current inductive learning algorithms use variants of impurity functions like information gain, gain ratio[25], gini-index[1], distance measure[16], j-measure[30], and MDL=-=[14]-=-. However, all these measures assume that attributes are conditionally independent given the class and therefore in domains with strong conditional dependencies between attributes the greedy search ha... |