Results 1  10
of
23
Solving multiclass learning problems via errorcorrecting output codes
 Journal of Artificial Intelligence Research
, 1995
"... Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning ..."
Abstract

Cited by 564 (9 self)
 Add to MetaCart
Multiclass learning problems involve nding a de nition for an unknown function f(x) whose range is a discrete set containing k>2values (i.e., k \classes"). The de nition is acquired by studying collections of training examples of the form hx i;f(x i)i. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decisiontree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of over tting avoidance techniques such as decisiontree pruning. Finally,we show thatlike the other methodsthe errorcorrecting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that errorcorrecting output codes provide a generalpurpose method for improving the performance of inductive learning programs on multiclass problems. 1.
Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms
, 1998
"... This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I err ..."
Abstract

Cited by 531 (8 self)
 Add to MetaCart
This article reviews five approximate statistical tests for determining whether one learning algorithm outperforms another on a particular learning task. These tests are compared experimentally to determine their probability of incorrectly detecting a difference when no difference exists (type I error). Two widely used statistical tests are shown to have high probability of type I error in certain situations and should never be used: a test for the difference of two proportions and a paireddifferences t test based on taking several random traintest splits. A third test, a paireddifferences t test based on 10fold crossvalidation, exhibits somewhat elevated probability of type I error. A fourth test, McNemar’s test, is shown to have low type I error. The fifth test is a new test, 5 × 2 cv, based on five iterations of twofold crossvalidation. Experiments show that this test also has acceptable type I error. The article also measures the power (ability to detect algorithm differences when they do exist) of these tests. The crossvalidated t test is the most powerful. The 5×2 cv test is shown to be slightly more powerful than McNemar’s test. The choice of the best test is determined by the computational cost of running the learning algorithm. For algorithms that can be executed only once, McNemar’s test is the only test with acceptable type I error. For algorithms that can be executed 10 times, the 5×2 cv test is recommended, because it is slightly more powerful and because it directly measures variation due to the choice of training set.
An experimental comparison of three methods for constructing ensembles of decision trees
 Bagging, boosting, and randomization. Machine Learning
, 2000
"... Abstract. Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base ” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approac ..."
Abstract

Cited by 428 (6 self)
 Add to MetaCart
Abstract. Bagging and boosting are methods that generate a diverse ensemble of classifiers by manipulating the training data given to a “base ” learning algorithm. Breiman has pointed out that they rely for their effectiveness on the instability of the base learning algorithm. An alternative approach to generating an ensemble is to randomize the internal decisions made by the base algorithm. This general approach has been studied previously by Ali and Pazzani and by Dietterich and Kong. This paper compares the effectiveness of randomization, bagging, and boosting for improving the performance of the decisiontree algorithm C4.5. The experiments show that in situations with little or no classification noise, randomization is competitive with (and perhaps slightly superior to) bagging but not as accurate as boosting. In situations with substantial classification noise, bagging is much better than boosting, and sometimes better than randomization.
In defense of onevsall classification
 Journal of Machine Learning Research
, 2004
"... Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This the ..."
Abstract

Cited by 202 (0 self)
 Add to MetaCart
Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.
ErrorCorrecting Output Coding Corrects Bias and Variance
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... Previous research has shown that a technique called errorcorrecting output coding (ECOC) can dramatically improve the classification accuracy of supervised learning algorithms that learn to classify data points into one of k AE 2 classes. This paper presents an investigation of why the ECOC techniq ..."
Abstract

Cited by 148 (5 self)
 Add to MetaCart
Previous research has shown that a technique called errorcorrecting output coding (ECOC) can dramatically improve the classification accuracy of supervised learning algorithms that learn to classify data points into one of k AE 2 classes. This paper presents an investigation of why the ECOC technique works, particularly when employed with decisiontree learning algorithms. It shows that the ECOC method like any form of voting or committeecan reduce the variance of the learning algorithm. Furthermoreunlike methods that simply combine multiple runs of the same learning algorithmECOC can correct for errors caused by the bias of the learning algorithm. Experiments show that this bias correction ability relies on the nonlocal behavior of C4.5. 1 Introduction Errorcorrecting output coding (ECOC) is a method for applying binary (twoclass) learning algorithms to solve kclass supervised learning problems. It works by converting the kclass supervised learning problem into a la...
MachineLearning Research  Four Current Directions
"... Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up super ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
Machine Learning research has been making great progress in many directions. This article summarizes four of these directions and discusses some current open problems. The four directions are (a) improving classification accuracy by learning ensembles of classifiers, (b) methods for scaling up supervised learning algorithms, (c) reinforcement learning, and (d) learning complex stochastic models.
An experimental comparison of the nearestneighbor and nearesthyperrectangle algorithms
 Machine Learning
, 1995
"... Abstract. Algorithms based on Nested Generalized Exemplar (NGE) theory (Salzberg, 1991) classify new data points by computing their distance to the nearest "generalized exemplar " (i.e., either a point or an axisparallel rectangle). They combine the distancebased character of nearest neighbor (NN) ..."
Abstract

Cited by 94 (5 self)
 Add to MetaCart
Abstract. Algorithms based on Nested Generalized Exemplar (NGE) theory (Salzberg, 1991) classify new data points by computing their distance to the nearest "generalized exemplar " (i.e., either a point or an axisparallel rectangle). They combine the distancebased character of nearest neighbor (NN) classifiers with the axisparallel rectangle representation employed in many rulelearning systems. An implementation of NGE was compared to the knearest neighbor (kNN) algorithm in I 1 domains and found to be significantly inferior to kNN in 9 of them. Several modifications of NGE were studied to understand the cause of its poor pefformance. These show that its performance can be substantially improved by preventing NGE from creating overlapping rectangles, while still allowing complete nesting of rectangles. Performalace can be further impr.oved by modifying the distancemetric to allow weights on each of the features (Salzberg, 1991). Best results Were obtained in this study when the weights were computed using mutual information between the features and the output class. The best version of NGE developed is a batch algorithm (BNGE FWMI) that has no usertunable parameters. BNGE FWMI'S performance is comparable to the firstnearest neighbor algorithm (also incorporating feature weights). However, the knearest neighbor algorithm is still significantly superior to BNGE F~VMI in 7 of the 11 domains, and inferior to it in only 2. We conclude that, even with our improvements, the NGE approach is very sensitive to the shape of the decision boundaries in classification problems. In domains where the decision boundaries are axisparallel, the NGE approach can produce excellent generalization with interpretable hypotheses. In all domains tested, NGE algorithms require much less memory to store generalized exemplars than is required by NN algorithms.
Everything Old Is New Again: A Fresh Look at Historical Approaches
 in Machine Learning. PhD thesis, MIT
, 2002
"... 2 Everything Old Is New Again: A Fresh Look at Historical ..."
Abstract

Cited by 88 (6 self)
 Add to MetaCart
2 Everything Old Is New Again: A Fresh Look at Historical
Training Conditional Random Fields via Gradient Tree Boosting
 In Proceedings of the 21th International Conference on Machine Learning (ICML
, 2004
"... Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001) provide a flexible and powerful model for learning to assign labels to elements of sequences in such applications as partofspeech tagging, texttospeech mapping, protein and DNA sequence analysis, and information extraction fro ..."
Abstract

Cited by 55 (2 self)
 Add to MetaCart
Conditional Random Fields (CRFs; Lafferty, McCallum, & Pereira, 2001) provide a flexible and powerful model for learning to assign labels to elements of sequences in such applications as partofspeech tagging, texttospeech mapping, protein and DNA sequence analysis, and information extraction from web pages. However, existing learning algorithms are slow, particularly in problems with large numbers of potential input features. This paper describes a new method...
Using ErrorCorrecting Codes For Text Classification
 In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text classifiers. We show that the accuracy of a Naive Bayes Classifier over text classification tasks can be significantly improved by taking advantage of the errorcorrecting properties of the code. W ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
This paper explores in detail the use of Error Correcting Output Coding (ECOC) for learning text classifiers. We show that the accuracy of a Naive Bayes Classifier over text classification tasks can be significantly improved by taking advantage of the errorcorrecting properties of the code. We also explore the use of different kinds of codes, namely ErrorCorrecting Codes, Random Codes, and Domain and Dataspecific codes and give experimental results for each of them. The ECOC method scales well to large data sets with a large number of classes. Experiments on a realworld data set show a reduction in classification error by up to 66% over the traditional Naive Bayes Classifier. We also compare our empirical results to semitheoretical results and find that the two closely agree. 1. Introduction Text Classification is the problem of grouping text documents into classes or categories. For the purpose of this paper, we define classification as categorizing documents in...