Results 11  20
of
61
Selecting a Classification Method by CrossValidation
 Machine Learning
, 1993
"... If we lack relevant problemspecific knowledge, crossvalidation methods may be used to select a classification method empirically. We examine this idea here to show in what senses crossvalidation does and does not solve the selection problem. As illustrated empirically, crossvalidation may lead t ..."
Abstract

Cited by 72 (0 self)
 Add to MetaCart
If we lack relevant problemspecific knowledge, crossvalidation methods may be used to select a classification method empirically. We examine this idea here to show in what senses crossvalidation does and does not solve the selection problem. As illustrated empirically, crossvalidation may lead to higher average performance than application of any single classification strategy and it also cuts the risk of poor performance. On the other hand, crossvalidation is no more or less a form of bias than simpler strategies and applying it appropriately ultimately depends in the same way on prior knowledge. In fact, crossvalidation may be seen as a way of applying partial information about the applicability of alternative classification strategies. Keywords: Crossvalidation, classification, decision trees, neural networks. 1 Introduction Machine learning researchers and statisticians have produced a host of approaches to the problem of classification including methods for inducing rul...
Predicting Nearly as Well as the Best Pruning of a Decision Tree
 Machine Learning
, 1995
"... . Many algorithms for inferring a decision tree from data involve a twophase process: First, a very large decision tree is grown which typically ends up "overfitting" the data. To reduce overfitting, in the second phase, the tree is pruned using one of a number of available methods. The ..."
Abstract

Cited by 71 (5 self)
 Add to MetaCart
. Many algorithms for inferring a decision tree from data involve a twophase process: First, a very large decision tree is grown which typically ends up "overfitting" the data. To reduce overfitting, in the second phase, the tree is pruned using one of a number of available methods. The final tree is then output and used for classification on test data. In this paper, we suggest an alternative approach to the pruning phase. Using a given unpruned decision tree, we present a new method of making predictions on test data, and we prove that our algorithm's performance will not be "much worse" (in a precise technical sense) than the predictions made by the best reasonably small pruning of the given decision tree. Thus, our procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust. Our method can be viewed as a synthesis of two previously studied techniques. First, we ...
Machine Learning Bias, Statistical Bias, and Statistical Variance of Decision Tree Algorithms
, 1995
"... The term "bias" is widely usedand with different meaningsin the fields of machine learning and statistics. This paper clarifies the uses of this term and shows how to measure and visualize the statistical bias and variance of learning algorithms. Statistical bias and variance can be ..."
Abstract

Cited by 50 (3 self)
 Add to MetaCart
The term "bias" is widely usedand with different meaningsin the fields of machine learning and statistics. This paper clarifies the uses of this term and shows how to measure and visualize the statistical bias and variance of learning algorithms. Statistical bias and variance can be applied to diagnose problems with machine learning bias, and the paper shows four examples of this. Finally, the paper discusses methods of reducing bias and variance. Methods based on voting can reduce variance, and the paper compares Breiman's bagging method and our own tree randomization method for voting decision trees. Both methods uniformly improve performance on data sets from the Irvine repository. Tree randomization yields perfect performance on the Letter Recognition task. A weighted nearest neighbor algorithm based on the infinite bootstrap is also introduced. In general, decision tree algorithms have moderatetohigh variance, so an important implication of this work is that variancerat...
Bayesian model averaging
 STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions tha ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved outofsample predictive performance. We also provide a catalogue of
An Extensible MetaLearning Approach for Scalable and Accurate Inductive Learning
, 1996
"... Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Som ..."
Abstract

Cited by 48 (8 self)
 Add to MetaCart
Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of ubiquitous network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Some learning algorithms assume that the entire data set fits into main memory, which is not feasible for massive amounts of data, especially for applications in data mining. One approach to handling a large data set is to partition the data set into subsets, run the learning algorithm on each of the subsets, and combine the results. Moreover, data can be inherently distributed across multiple sites on the network and merging all the data in one location can be expensive or prohibitive. In this thesis we propose, investigate, and evaluate a metalearning approach to integrating the results of mul...
Option Decision Trees with Majority Votes
 ICML97
, 1997
"... We describe an experimental study of Option Decision Trees with majority votes. Option Decision Trees generalize regular decision trees by allowing option nodes in addition to decision nodes; such nodes allow for several possible tests to be conducted instead of the commonly used single test. ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
We describe an experimental study of Option Decision Trees with majority votes. Option Decision Trees generalize regular decision trees by allowing option nodes in addition to decision nodes; such nodes allow for several possible tests to be conducted instead of the commonly used single test. Our goal was to explore when option nodes are most useful and to control the growth of the trees so that additional complexity of little utility is limited. Option Decision Trees can reduce the error of decision trees on realworld problems by combining multiple options, with the motivation similar to that of voting algorithms that learn multiple models and combine the predictions. However, unlikevoting algorithms, an Option Decision Tree provides a single structured classifier (one decision tree), which can be interpreted more easily by humans. Our results show that for the tested problems, we can achieve significant reduction in error rates for trees restricted to two levels of o...
On Pruning and Averaging Decision Trees
 In Proceedings of the Twelfth International Conference on Machine Learning
, 1995
"... Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While, there are many approaches to pruning, an alternative approach of averaging over decision trees has not received as much attention. We perform an empirical comparison of p ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
Pruning a decision tree is considered by some researchers to be the most important part of tree building in noisy domains. While, there are many approaches to pruning, an alternative approach of averaging over decision trees has not received as much attention. We perform an empirical comparison of pruning with the approach of averaging over decision trees. For this comparison we use a computationally efficient method of averaging, namely averaging over the extended fanned set of a tree. Since there are a wide range of approaches to pruning, we compare tree averaging with a traditional pruning approach, along with an optimal pruning approach.
Small Sample Statistics for Classification Error Rates I: Error Rate Measurements
 Dept. of Inf. and Comp. Sci
, 1996
"... Several methods (independent subsamples, leaveoneout, crossvalidation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explore ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
Several methods (independent subsamples, leaveoneout, crossvalidation, and bootstrapping) have been proposed for estimating the error rates of classifiers. The rationale behind the various estimators and the causes of the sometimes conflicting claims regarding their bias and precision are explored in this paper. The biases and variances of each of the estimators are examined empirically. Crossvalidation, 10fold or greater, seems to be the best approach; the other methods are biased, have poorer precision, or are inconsistent. Though unbiased for linear discriminant classifiers, the 632b bootstrap estimator is biased for nearest neighbors classifiers, more so for single nearest neighbor than for three nearest neighbors. The 632b estimator is also biased for Cartstyle decision trees. Weiss' loo* estimator is unbiased and has better precision than crossvalidation for discriminant and nearest neighbors classifiers, but its lack of bias and improved precision for those classifiers do...
Prototype Selection for Composite Nearest Neighbor Classifiers
, 1997
"... Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of realworld applications, ranging from protein sequence identificatio ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of realworld applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number o...
Arbitrating Among Competing Classifiers Using Learned Referees
 KNOWLEDGE AND INFORMATION SYSTEMS
, 1998
"... The situation in which the results of several different classifiers and learning algorithms are obtainable for a single classification problem is common. In this paper, we propose a method that takes a collection of existing classifiers and learning algorithms, together with a set of available da ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
The situation in which the results of several different classifiers and learning algorithms are obtainable for a single classification problem is common. In this paper, we propose a method that takes a collection of existing classifiers and learning algorithms, together with a set of available data, and creates a combined classifier that takes advantage of all of these sources of knowledge. The basic idea is that each classifier has a particular subdomain for which it is most reliable. Therefore, we induce a referee for each classifier, which describes its area of expertise. Given such a description, we arbitrate between the component classifiers by using the most reliable classifier for the examples in each subdomain. In experiments in several domains, we found such arbitration to be significantly more effective than various voting techniques which do not seek out subdomains of expertise. Our results further suggest that the more finegrained the analysis of the areas of expertise of the competing classifiers, the more effectively they can be combined. In particular, we find that classification accuracy increases greatly when using intermediate subconcepts from the classifiers themselves as features for the induction of referees.