Results 1  10
of
23
J.P.: Ranking learning algorithms: Using IBL and metalearning on accuracy and time results
 Machine Learning
, 2003
"... Abstract. We present a metalearning method to support selection of candidate learning algorithms. It uses a kNearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, ..."
Abstract

Cited by 69 (7 self)
 Add to MetaCart
(Show Context)
Abstract. We present a metalearning method to support selection of candidate learning algorithms. It uses a kNearest Neighbor algorithm to identify the datasets that are most similar to the one at hand. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance. The performance of the candidate algorithms on those datasets is used to generate a recommendation to the user in the form of a ranking. The performance is assessed using a multicriteria evaluation measure that takes not only accuracy, but also time into account. As it is not common in Machine Learning to work with rankings, we had to identify and adapt existing statistical techniques to devise an appropriate evaluation methodology. Using that methodology, we show that the metalearning method presented leads to significantly better rankings than the baseline ranking method. The evaluation methodology is general and can be adapted to other ranking problems. Although here we have concentrated on ranking classification algorithms, the metalearning framework presented can provide assistance in the selection of combinations of methods or more complex problem solving strategies.
Local Cascade Generalization
, 1998
"... In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new att ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
In a previous work we have presented Cascade Generalization, a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data by the insertion of new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. In this paper we extend this work by applying Cascade locally. At each iteration of a divide and conquer algorithm, a reconstruction of the instance space occurs by the addition of new attributes. Each new attribute represents the probability that an example belongs to a class given by a base classifier. We have implemented three Local Generalization Algorithms. The first merges a linear discriminant with a decision tree, the second merges a naive Bayes with a decision tree, and the third mer...
Inducing Oblique Decision Trees with Evolutionary Algorithms
 IEEE Transactions on Evolutionary Computation
, 2003
"... This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decisiontree (DT) induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that, at least in some cases ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decisiontree (DT) induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that, at least in some cases, this can be accomplished in a shorter time. We performed experiments with a (1+1) evolution strategy and a simple genetic algorithm on public domain and artificial data sets, and compared the results with three other oblique and one axisparallel DT algorithms. The empirical results suggest that the EAs quickly find competitive classifiers, and that EAs scale up better than traditional methods to the dimensionality of the domain and the number of instances used in training. In addition, we show that the classification accuracy improves when the trees obtained with the EAs are combined in ensembles, and that sometimes it is possible to build the ensemble of evolutionary trees in less time than a single traditional oblique tree. Index TermsClassification, decision trees, ensembles, machine learning, sampling.
A Comparison of Ranking Methods for Classification Algorithm Selection
 In Proceedings of the European Conference on Machine Learning ECML2000 (to Be Published
, 2000
"... . We investigate the problem of using past performance information to select an algorithm for a given classification problem. We present three ranking methods for that purpose: average ranks, success rate ratios and significant wins. We also analyze the problem of evaluating and comparing these ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
. We investigate the problem of using past performance information to select an algorithm for a given classification problem. We present three ranking methods for that purpose: average ranks, success rate ratios and significant wins. We also analyze the problem of evaluating and comparing these methods. The evaluation technique used is based on a leaveoneout procedure. On each iteration, the method generates a ranking using the results obtained by the algorithms on the training datasets. This ranking is then evaluated by calculating its distance from the ideal ranking built using the performance information on the test dataset. The distance measure adopted here, average correlation, is based on Spearman's rank correlation coefficient. To compare ranking methods, a combination of Friedman's test and Dunn's multiple comparison procedure is adopted. When applied to the methods presented here, these tests indicate that the success rate ratios and average ranks methods perfo...
Fast Subsampling Performance Estimates for Classification Algorithm Selection
 Proceedings of the ECML00 Workshop on MetaLearning: Building Automatic Advice Strategies for Model Selection and Method Combination
, 2000
"... The typical data mining process is characterized by the prospective and iterative application of a variety of different data mining algorithms from an algorithm toolbox. While it would be desirable to check many different algorithms and algorithm combinations for their performance on a database, ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
The typical data mining process is characterized by the prospective and iterative application of a variety of different data mining algorithms from an algorithm toolbox. While it would be desirable to check many different algorithms and algorithm combinations for their performance on a database, it is often not feasible because of time and other resource constraints. This paper investigates the effectiveness of simple and fast subsampling strategies for algorithm selection. We show that even such simple strategies perform quite well in many cases and propose to use them as a baseline for comparison with metalearning and other advanced algorithm selection strategies. 1 Introduction With the availability of a wide range of different classification algorithms, strategies for selecting the most adequate one in a particular data mining situation become more crucial. Many characteristics of both the learning algorithm and the kind of model generated by the algorithm potentially i...
Discriminant Trees
, 1999
"... In a previous work, we presented system Ltree, a multivariate tree that combines a decision tree with a linear discriminant by means of constructive induction. We have shown that it performs quite well, in terms of accuracy and learning times, in comparison with other multivariate systems like LMDT, ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
In a previous work, we presented system Ltree, a multivariate tree that combines a decision tree with a linear discriminant by means of constructive induction. We have shown that it performs quite well, in terms of accuracy and learning times, in comparison with other multivariate systems like LMDT, OC1, and CART. In this work, we extend the previous work by using two new discriminant functions: a quadratic discriminant and a logistic discriminant. Using the same architecture as Ltree we obtain two new multivariate trees Qtree and LgTree. The three systems have been evaluate on 17 UCI datasets. From the empirical study, we argue that these systems can be shown as a composition of classifiers with low correlation error. From a biasvariance analysis of the error rate, the error reduction of all the systems in comparison to a univariate tree, is due to a reduction on both components.
Data Transformation and Model Selection By Experimentation and MetaLearning
, 1998
"... n confidence level, the approach does indeed identify the best possible candidate and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms, some of which can be quite slow. Various proposals have been prese ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
n confidence level, the approach does indeed identify the best possible candidate and errs as expected. The disadvantage of this approach is that it is time consuming, due to the fact that it is necessary to evaluate all algorithms, some of which can be quite slow. Various proposals have been presented how to speed up this process. One possibility is to preselect some algorithms using certain criteria and then limit the experimentation to this subset. Some people have suggested that we should preferably use algorithms which behave rather differently form one another. One criteria for deciding this is by examining whether the algorithms lead to uncorrelated errors (Ali and Pazzani, 1996). Another possibility is to try to reduce the number of cycles of crossvalidation without effecting the reliability of the result. Moore and Lee (1994) have proposed a technique referred to as racing, which permits to terminate the evaluation of those algorithms which appear to
Combining Classifiers by Constructive Induction
, 1998
"... . Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the origina ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
. Using multiple classifiers for increasing learning accuracy is an active research area. In this paper we present a new general method for merging classifiers. The basic idea of Cascade Generalization is to sequentially run the set of classifiers, at each step performing an extension of the original data set by adding new attributes. The new attributes are derived from the probability class distribution given by a base classifier. This constructive step extends the representational language for the high level classifiers, relaxing their bias. Cascade Generalization produces a single but structured model for the data that combines the model class representation of the base classifiers. We have performed an empirical evaluation of Cascade composition of three well known classifiers: Naive Bayes, Linear Discriminant, and C4.5. Composite models show an increase of performance, sometimes impressive, when compared with the corresponding single models, with significant statistical confidenc...
Hybrid Decision Tree
, 2002
"... In this paper, a hybrid learning approach named HDT is proposed. HDT simulates human reasoning by using symbolic leaming to do qualitative analysis and using neural leaming to do subsequent quantitative analysis. It generates the trunk of a binary hybrid decision tree according to the binary informa ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this paper, a hybrid learning approach named HDT is proposed. HDT simulates human reasoning by using symbolic leaming to do qualitative analysis and using neural leaming to do subsequent quantitative analysis. It generates the trunk of a binary hybrid decision tree according to the binary information gain ratio criterion in an instance space defined by only original unordered attributes. If unordered attributes cannot further distinguish training examples falling into a leaf node whose diversity is beyond the diversitythreshold, then the node is marked as a dummy node. After all those dummy nodes are marked, a specific feedforward neural network named Fnqc that is trained in an instance space defined by only original ordered attributes is exploited to accomplish the leaming task. Moreover, this paper distinguishes three kinds of incremental learning tasks. Two incremental leaming procedures designed for exampleincremental leaming with different storage requirements are provided, which enables HDT to deal gracefully with data sets where new data are frequently appended. Also a hypothesisdriven constructive induction mechanism is provided, which enables HDT to generate compact concept descriptions.
Ranking Classification Algorithms Based on Relevant Performance Information
 MetaLearning: Building Automatic Advice Strategies for Model Selection and Method Combination, 2000
, 2000
"... . Given the wide variety of available classification algorithms and the volume of data today's organizations need to analyze, the selection of the right algorithm to use on a new problem is an important issue. In this paper we present zooming, a technique that, for a given dataset, selects ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
. Given the wide variety of available classification algorithms and the volume of data today's organizations need to analyze, the selection of the right algorithm to use on a new problem is an important issue. In this paper we present zooming, a technique that, for a given dataset, selects relevant past performance information. The selection process is based on the distance between the dataset at hand and other datasets processed in the past. The distance is calculated on the basis of statistical, information theoretic and other measures. The kNearest Neighbor algorithm is used for this purpose. Performance information for the algorithms on the selected datasets is then processed to generate advice in the form of a ranking indicating which algorithms should be applied in which order. Here we propose a ranking method that is based on accuracy and time information, referred to as adjusted ratio of ratios. The generalization power of this ranking method is analyzed using an ...