Results 1 - 10
of
37
Improved Heterogeneous Distance Functions
- Journal of Artificial Intelligence Research
, 1997
"... Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores cont ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes. 1. Introduction Instance-Based Learning (IBL) (Aha, ...
Reduction Techniques for Instance-Based Learning Algorithms
- Machine Learning
, 2000
"... . Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main p ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
. Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1--DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1--RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise. ...
Instance pruning techniques
- MACHINE LEARNING: PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE (ICML’97
, 1997
"... The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and ..."
Abstract
-
Cited by 55 (7 self)
- Add to MetaCart
The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and slow execution speed, as well as a sensitivity to noise. This paper provides a discussion of issues related to reducing the number of instances retained in memory while maintaining (and sometimes improving) generalization accuracy, and mentions algorithms other researchers have used to address this problem. It presents three intuitive noise-tolerant algorithms that can be used to prune instances from the training set. In experiments on 29 applications, the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.
Identifying Mislabeled Training Data
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1999
"... This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach ..."
Abstract
-
Cited by 55 (1 self)
- Add to MetaCart
This paper presents a new approach to identifying and eliminating mislabeled training instances for supervised learning. The goal of this approach is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. Our approach
Advances In Instance Selection for Instance-Based Learning Algorithms
- Data Mining and Knowledge Discovery
, 2002
"... The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
The basic nearest neighbour classifier suffers from the indiscriminate storage of all presented training instances. With a large database of instances classification response time can be slow. When noisy instances are present classification accuracy can suffer. Drawing on the large body of relevant work carried out in the past 30 years, we review the principle approaches to solving these problems. By deleting instances, both problems can be alleviated, but the criterion used is typically assumed to be all encompassing and effective over many domains. We argue against this position and introduce an algorithm that rivals the most successful existing algorithm. When evaluated on 30 different problems, neither algorithm consistently outperforms the other: consistency is very hard. To achieve the best results, we need to develop mechanisms that provide insights into the structure of class definitions. We discuss the possibility of these mechanisms and propose some initial measures that could be useful for the data miner.
Prototype Selection for Composite Nearest Neighbor Classifiers
, 1997
"... Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identificatio ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
Combining the predictions of a set of classifiers has been shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers. Increased accuracy has been shown in a variety of real-world applications, ranging from protein sequence identification to determining the fat content of ground meat. Despite such individual successes, the answers are not known to fundamental questions about classifier combination, such as "Can classifiers from any given model class be combined to create a composite classifier with higher accuracy?" or "Is it possible to increase the accuracy of a given classifier by combining its predictions with those of only a small number o...
Reduction Techniques for Exemplar-Based Learning Algorithms
- MACHINE LEARNING
, 2000
"... Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This pap ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce the number of exemplars retained in exemplar-based learning models. Second, it proposes six new reduction algorithms called DROP1-5 and DEL that can be used to prune instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 datasets. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest generalization accuracy in these experiments, especially in the presence of noise.
An Integrated Instance-Based Learning Algorithm
- Computational Intelligence
, 2000
"... The basic nearest-neighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This p ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
The basic nearest-neighbor rule generalizes well in many domains but has several shortcomings, including inappropriate distance functions, large storage requirements, slow execution time, sensitivity to noise, and an inability to adjust its decision boundaries after storing the training data. This paper proposes methods for overcoming each of these weaknesses and combines these methods into a comprehensive learning system called the Integrated Decremental Instance-Based Learning Algorithm (IDIBL) that seeks to reduce storage, improve execution speed, and increase generalization accuracy, when compared to the basic nearest neighbor algorithm and other learning models. IDIBL tunes its own parameters using a new measure of fitness that combines confidence and cross-validation (CVC) accuracy in order to avoid discretization problems with more traditional leave-one-out cross-validation (LCV). In our experiments IDIBL achieves higher generalization accuracy than other less comprehensive instance-based learning algorithms, while requiring less than onefourth the storage of the nearest neighbor algorithm and improving execution speed by a corresponding factor. In experiments on 21 datasets, IDIBL also achieves higher generalization accuracy than those reported for 16 major machine learning and neural network models.
A Teaching Strategy for Memory-Based Control
, 1997
"... Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforceme ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Combining different machine learning algorithms in the same system can produce benefits above and beyond what either method could achieve alone. This paper demonstrates that genetic algorithms can be used in conjunction with lazy learning to solve examples of a difficult class of delayed reinforcement learning problems better than either method alone. This class, the class of differential games, includes numerous important control problems that arise in robotics, planning, game playing, and other areas, and solutions for differential games suggest solution strategies for the general class of planning and control problems. We conducted a series of experiments applying three learning approaches---lazy Q-learning, k-nearest neighbor (k-NN), and a genetic algorithm---to a particular differential game called a pursuit game. Our experiments demonstrate that k-NN had great difficulty solving the problem, while a lazy version of Q-learning performed moderately well and the genetic algorithm pe...
Careful Abstraction from Instance Families in Memory-Based Language Learning
- Journal for Experimental and Theoretrical Artificial Intelligence
, 1999
"... ion from Instance Families in Memory-Based Language Learning Antal van den Bosch ILK Research Group, Computational Linguistics Tilburg University, The Netherlands email: Antal.vdnBosch@kub.nl Contact: Antal van den Bosch ILK Research Group / Computational Linguistics Faculty of Arts Tilburg Universi ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
ion from Instance Families in Memory-Based Language Learning Antal van den Bosch ILK Research Group, Computational Linguistics Tilburg University, The Netherlands email: Antal.vdnBosch@kub.nl Contact: Antal van den Bosch ILK Research Group / Computational Linguistics Faculty of Arts Tilburg University P.O. Box 90153 NL-5000 LE Tilburg The Netherlands phone (voice) +31.13.4668260 phone (fax) +31.13.4663110 Running heading: Careful abstraction from instance families Abstract Empirical studies in inductive language learning point at pure memory-based learning as a successful approach to many language learning tasks, often performing better than lerning methods that abstract from the learning material. The possibility is left open, however, that limited, careful abstraction in memory-based learning may be harmless to generalisation, as long as the disjunctivity of language data is preserved. We compare three types of careful abstraction: editing, oblivious (partial) decision-tree abstra...

