Results 1  10
of
77
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 718 (0 self)
 Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust nonparametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
An extension on ―statistical comparisons of classifiers over multiple data sets‖ for all pairwise comparisons
 Journal of Machine Learning Research
"... In a recently published paper in JMLR, Demˇsar (2006) recommends a set of nonparametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic ..."
Abstract

Cited by 158 (37 self)
 Add to MetaCart
(Show Context)
In a recently published paper in JMLR, Demˇsar (2006) recommends a set of nonparametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n×n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable pvalues in multiple comparison procedures.
Studies of tropical tuna swimming performance in a large water tunnel
 I. Energetics. J. exp. Biol
, 1994
"... The body temperatures (Tb) of nine yellowfin tuna (Thunnus albacares) were monitored while fish swam in a large water tunnel at controlled velocities (U) and ambient temperatures (Ta). Monitoring Tb during step changes in Ta at constant U permitted estimation of the thermal rate coefficient (k), an ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
The body temperatures (Tb) of nine yellowfin tuna (Thunnus albacares) were monitored while fish swam in a large water tunnel at controlled velocities (U) and ambient temperatures (Ta). Monitoring Tb during step changes in Ta at constant U permitted estimation of the thermal rate coefficient (k), an index of heat transfer. In the yellowfin, k is dependent on both Ta and the direction of the thermal gradient (i.e. whether Ta is greater or less than Tb). Modulation of k in response to Ta was further demonstrated during tests in which U was varied; the elevation of Tb in response to equal increases in U was 3–4 times less at 30 ˚C than at 25 and 20 ˚C. These experiments demonstrate that the yellowfin tuna can modulate heat transfer. This ability could prevent overheating during intense activity, retard heat loss during a descent into cool water and permit increased heat gain upon returning to warm surface waters (i.e. when Tb<Ta).
FURIA: An Algorithm For Unordered Fuzzy Rule Induction
"... This paper introduces a novel fuzzy rulebased classification method called FURIA, which is short for Fuzzy Unordered Rule Induction Algorithm. FURIA extends the wellknown RIPPER algorithm, a stateoftheart rule learner, while preserving its advantages, such as simple and comprehensible rule sets ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces a novel fuzzy rulebased classification method called FURIA, which is short for Fuzzy Unordered Rule Induction Algorithm. FURIA extends the wellknown RIPPER algorithm, a stateoftheart rule learner, while preserving its advantages, such as simple and comprehensible rule sets. In addition, it includes a number of modifications and extensions. In particular, FURIA learns fuzzy rules instead of conventional rules and unordered rule sets instead of rule lists. Moreover, to deal with uncovered examples, it makes use of an efficient rule stretching method. Experimental results show that FURIA significantly outperforms the original RIPPER, as well as other classifiers such as C4.5, in terms of classification accuracy. 1
FR3: A fuzzy rule learner for inducing reliable classifiers
 IEEE Transactions Fuzzy Systems
, 2009
"... This paper introduces a fuzzy rulebased classification method called FR3, which is short for ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
This paper introduces a fuzzy rulebased classification method called FR3, which is short for
Fast and scalable local kernel machines
 J. Mach Learn Res
, 2009
"... A computationally efficient approach to local learning with kernel methods is presented. The Fast Local Kernel Support Vector Machine (FaLKSVM) trains a set of local SVMs on redundant neighbourhoods in the training set and an appropriate model for each query point is selected at testing time accord ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
A computationally efficient approach to local learning with kernel methods is presented. The Fast Local Kernel Support Vector Machine (FaLKSVM) trains a set of local SVMs on redundant neighbourhoods in the training set and an appropriate model for each query point is selected at testing time according to a proximity strategy. Supported by a recent result by Zakai and Ritov (2009) relating consistency and localizability, our approach achieves high classification accuracies by dividing the separation function in local optimisation problems that can be handled very efficiently from the computational viewpoint. The introduction of a fast local model selection further speedsup the learning process. Learning and complexity bounds are derived for FaLKSVM, and the empirical evaluation of the approach (with data sets up to 3 million points) showed that it is much faster and more accurate and scalable than stateoftheart accurate and approximated SVM solvers at least for non highdimensional data sets. More generally, we show that locality can be an important factor to sensibly speedup learning approaches and kernel methods, differently from other recent techniques that tend to dismiss local information in order to improve scalability.
G.: A fast clusteringbased feature subset selection algorithm for high dimensional data
 Knowledge and Data Engineering, IEEE Transactions on 99
, 2011
"... Abstract—Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the ti ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Feature selection involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. While the efficiency concerns the time required to find a subset of features, the effectiveness is related to the quality of the subset of features. Based on these criteria, a fast clusteringbased feature selection algorithm, FAST, is proposed and experimentally evaluated in this paper. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graphtheoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. Features in different clusters are relatively independent, the clusteringbased strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimumspanning tree clustering method. The efficiency and effectiveness of the FAST algorithm are evaluated through an empirical study. Extensive experiments are carried out to compare FAST and several representative feature selection algorithms, namely, FCBF, ReliefF, CFS, Consist, and FOCUSSF, with respect to four types of wellknown classifiers, namely, the probabilitybased Naive Bayes, the treebased C4.5, the instancebased IB1, and the rulebased RIPPER before and after feature selection. The results, on 35 publicly available realworld high dimensional image, microarray, and text data, demonstrate that FAST not only produces smaller subsets of features but also improves the performances of the four types of classifiers. Index Terms—Feature subset selection, filter method, feature clustering, graphbased clustering 1
LETTER Why Not Use an Oracle When You Got One?
, 2006
"... Abstract—The primary goal of predictive modeling is to achieve high accuracy when the model is applied to novel data. For certain problems this requires the use of complex techniques like neural networks or ensembles, resulting in opaque models that are hard or impossible to interpret. For some doma ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
Abstract—The primary goal of predictive modeling is to achieve high accuracy when the model is applied to novel data. For certain problems this requires the use of complex techniques like neural networks or ensembles, resulting in opaque models that are hard or impossible to interpret. For some domains this is unacceptable, since models need to be comprehensible. To achieve comprehensibility, accuracy is often sacrificed by using simpler techniques; a tradeoff termed the accuracy vs. comprehensibility tradeoff. Another, frequently studied, alternative is rule extraction; i.e. the activity where another, transparent, model is generated from the opaque model. In this paper it is argued that existing rule extraction algorithms do not use all information available, and typically should benefit from also using oracle data; i.e. test set instances, together with corresponding predictions from the opaque model. The experiments, using fifteen publicly available data sets, clearly show that rules extracted using either just oracle data or training data augmented with oracle data, will explain the predictions significantly better than rules extracted in the standard way; i.e. using training data only. Keywords—Rule extraction, neural networks, accuracy vs. comprehensibility tradeoff 1.
Is an ordinal class structure useful in classifier learning
 Int. J. of Data Mining, Modelling and Management
, 2008
"... In recent years, a number of machine learning algorithms have been developed for the problem of ordinal classification. These algorithms try to exploit, in one way or the other, the order information of the problem, essentially relying on the assumption that the ordinal structure of the set of class ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
In recent years, a number of machine learning algorithms have been developed for the problem of ordinal classification. These algorithms try to exploit, in one way or the other, the order information of the problem, essentially relying on the assumption that the ordinal structure of the set of class labels is also reflected in the topology of the instance space. The purpose of this paper is to investigate, on an experimental basis, the validity of this assumption. Moreover, we seek to answer the question to what extent existing techniques and learning algorithms for ordinal classification are able to exploit order information, and which properties of these techniques are important in this regard.
One Class Random Forests
, 2013
"... One class classification is a binary classification task for which only one class of samples is available for learning. In some preliminary works, we have proposed One Class Random Forests (OCRF), a method based on a random forest algorithm and an original outlier generation procedure that makes use ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
One class classification is a binary classification task for which only one class of samples is available for learning. In some preliminary works, we have proposed One Class Random Forests (OCRF), a method based on a random forest algorithm and an original outlier generation procedure that makes use of classifier ensemble randomization principles. In this paper, we propose an extensive study of the behavior of OCRF, that includes experiments on various UCI public datasets and comparison to reference one class algorithms – namely, gaussian density models, Parzen estimators, gaussian mixture models and One Class SVMs – with statistical significance. Our aim is to show that the randomization principles embedded in a random forest algorithm make the outlier generation process more efficient, and allow in particular to break the curse of dimensionality. One Class Random Forests are shown to perform well in comparison to other methods, and in particular to maintain stable performance in higher dimension, while the other algorithms may fail. Keywords: One class classification, supervised learning, decision trees, ensemble methods, random forests, outlier generation, outlier detection 1.