Results 1  10
of
87
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features
 Machine Learning
, 1993
"... In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of t ..."
Abstract

Cited by 266 (3 self)
 Add to MetaCart
In the past, nearest neighbor algorithms for learning from examples have worked best in domains in which all features had numeric values. In such domains, the examples can be treated as points and distance metrics can use standard definitions. In symbolic domains, a more sophisticated treatment of the feature space is required. We introduce a nearest neighbor algorithm for learning in domains with symbolic features. Our algorithm calculates distance tables that allow it to produce realvalued distances between instances, and attaches weights to the instances to further modify the structure of feature space. We show that this technique produces excellent classification accuracy on three problems that have been studied by machine learning researchers: predicting protein secondary structure, identifying DNA promoter sequences, and pronouncing English text. Direct experimental comparisons with the other learning algorithms show that our nearest neighbor algorithm is comparable or superior ...
On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach
 Data Mining and Knowledge Discovery
, 1997
"... Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in stati ..."
Abstract

Cited by 157 (0 self)
 Add to MetaCart
Abstract. An important component of many data mining projects is finding a good classification algorithm, a process that requires very careful thought about experimental design. If not done very carefully, comparative studies of classification and other types of algorithms can easily result in statistically invalid conclusions. This is especially true when one is using data mining techniques to analyze very large databases, which inevitably contain some statistically unlikely data. This paper describes several phenomena that can, if ignored, invalidate an experimental comparison. These phenomena and the conclusions that follow apply not only to classification, but to computational experiments in almost any aspect of data mining. The paper also discusses why comparative analysis is more important in evaluating some types of algorithms than for others, and provides some suggestions about how to avoid the pitfalls suffered by many experimental studies.
Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning
, 1996
"... This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neuralnetwork, decisiontree, rulebased, and casebased classification techniques. The sp ..."
Abstract

Cited by 107 (1 self)
 Add to MetaCart
This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neuralnetwork, decisiontree, rulebased, and casebased classification techniques. The specific problem tested involves disambiguating six senses of the word "line" using the words in the current and proceeding sentence as context. The statistical and neuralnetwork methods perform the best on this particular problem and we discuss a potential reason for this ob served difference. We also discuss the role of bias in machine ]earning and its importance in explaining performance differences observed on specific problems.
ErrorCorrecting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs
 IN PROCEEDINGS OF AAAI91
, 1991
"... Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k ? 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form hx i ; f(x i )i. Existing approaches to this pro ..."
Abstract

Cited by 88 (7 self)
 Add to MetaCart
Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k ? 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form hx i ; f(x i )i. Existing approaches to this problem include (a) direct application of multiclass algorithms such as the decisiontree algorithms ID3 and CART, (b) application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and (c) application of binary concept learning algorithms with distributed output codes such as those employed by Sejnowski and Rosenberg in the NETtalk system. This paper compares these three approaches to a new technique in which BCH errorcorrecting codes are employed as a distributed output representation. We show that these output representations improve the performance of ID3 on the NETtalk task and of backpropagation on an isolatedletter speechrecognition t...
Induction of FirstOrder Decision Lists: Results on Learning the Past Tense of English Verbs
 Journal of Artificial Intelligence Research
, 1995
"... This paper presents a method for inducing logic programs from examples that learns a new class of concepts called firstorder decision lists, defined as ordered lists of clauses each ending in a cut. The method, called Foidl, is based on Foil (Quinlan, 1990) but employs intensional background knowle ..."
Abstract

Cited by 69 (15 self)
 Add to MetaCart
This paper presents a method for inducing logic programs from examples that learns a new class of concepts called firstorder decision lists, defined as ordered lists of clauses each ending in a cut. The method, called Foidl, is based on Foil (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the pasttense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. Foidl is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic). 1. Introduction Inductive logic programming (ILP) is a growing subtopic of machine learning that studies the induction of Prolog programs from examples in the presence of background knowledge (Muggleton, 1992; Lavrac & Dzeroski, 1994). Due to the expressiveness of firstorder...
Selecting a Classification Method by CrossValidation
 Machine Learning
, 1993
"... If we lack relevant problemspecific knowledge, crossvalidation methods may be used to select a classification method empirically. We examine this idea here to show in what senses crossvalidation does and does not solve the selection problem. As illustrated empirically, crossvalidation may lead t ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
If we lack relevant problemspecific knowledge, crossvalidation methods may be used to select a classification method empirically. We examine this idea here to show in what senses crossvalidation does and does not solve the selection problem. As illustrated empirically, crossvalidation may lead to higher average performance than application of any single classification strategy and it also cuts the risk of poor performance. On the other hand, crossvalidation is no more or less a form of bias than simpler strategies and applying it appropriately ultimately depends in the same way on prior knowledge. In fact, crossvalidation may be seen as a way of applying partial information about the applicability of alternative classification strategies. Keywords: Crossvalidation, classification, decision trees, neural networks. 1 Introduction Machine learning researchers and statisticians have produced a host of approaches to the problem of classification including methods for inducing rul...
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 64 (16 self)
 Add to MetaCart
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
Addressing the Selective Superiority Problem: Automatic Algorithm/Model Class Selection
, 1993
"... The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; it is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In such cases one must search th ..."
Abstract

Cited by 63 (2 self)
 Add to MetaCart
The results of empirical comparisons of existing learning algorithms illustrate that each algorithm has a selective superiority; it is best for some but not all tasks. Given a data set, it is often not clear beforehand which algorithm will yield the best performance. In such cases one must search the space of available algorithms to find the one that produces the best classifier. In this paper we present an approach that applies knowledge about the representational biases of a set of learning algorithms to conduct this search automatically. In addition, the approach permits the available algorithms' model classes to be mixed in a recursive treestructured hybrid. We describe an implementation of the approach, MCS, that performs a heuristic bestfirst search for the best hybrid classifier for a set of data. An empirical comparison of MCS to each of its primitive learning algorithms, and to the computationally intensive method of crossvalidation, illustrates that automatic selection of l...
StatLog: Comparison of Classification Algorithms on Large RealWorld Problems
, 1995
"... This paper describes work in the StatLog project comparing classification algorithms on large realworld problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, knearest neighbor, kernel density, linear discriminant, qua ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
This paper describes work in the StatLog project comparing classification algorithms on large realworld problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, knearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (backpropagation, radial basis functions). Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the dataset investigated. We therefore developed a set of dataset descriptors to help decide which algorithms are suited to particular datasets. For example, datasets with extreme distributions (skew ? 1 and kurtosis ? 7) and with many binary/categorical attributes (? 38%) tend to favor symbolic learning algorithms. We suggest how classification algorith...
Comparing Connectionist and Symbolic Learning Methods
 Computational Learning Theory and Natural Learning Systems: Constraints and Prospects
, 1994
"... : Experimental comparison of backpropagation and decision tree methods have provided many data points but less understanding of why one method works better for some tasks than for others. This paper observes that, just as there are sequential and parallel classification methods, there are certa ..."
Abstract

Cited by 48 (0 self)
 Add to MetaCart
: Experimental comparison of backpropagation and decision tree methods have provided many data points but less understanding of why one method works better for some tasks than for others. This paper observes that, just as there are sequential and parallel classification methods, there are certain classification tasks that lend themselves to methods of one or the other type. Introduction Numerous papers that have appeared over the last few years compare the performance of a variety of learning algorithms on real and constructed datasets. Such comparisons, uncovering the strengths and weaknesses of algorithms on different tasks, provide valuable data points that help to map and understand the inherent capabilities of the methods. One emerging theme is that these capabilities appear to be taskdependent  few researchers would claim that one method is uniformly superior to another. This paper focuses on two kinds of learning algorithms: symbolic methods, that represent what is le...