## Reduction Techniques for Instance-Based Learning Algorithms (2000)

Venue: | Machine Learning |

Citations: | 132 - 2 self |

### BibTeX

@INPROCEEDINGS{Wilson00reductiontechniques,

author = {D. Randall Wilson and Tony R. Martinez},

title = {Reduction Techniques for Instance-Based Learning Algorithms},

booktitle = {Machine Learning},

year = {2000},

pages = {257--286}

}

### Years of Citing Articles

### OpenURL

### Abstract

. Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1--DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1--RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise. ...

### Citations

2880 |
UCI Repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...surveyed in Section 3 and all of the techniques proposed in Section 4 were implemented and tested on 31 datasets from the Machine Learning Database Repository at the University of California, Irvine (=-=Merz & Murphy, 1996-=-). Those included in these experiments are CNN, SNN, ENN, RENN, All k-NN, IB2, IB3, ELGrow, Explore, DEL, and DROP1-DROP5. These experiments were limited to those algorithms that choose a subset S fro... |

1280 |
Combinatorial Optimization; Algorithms and Complexity
- Papadimitriou, Steiglitz
- 1998
(Show Context)
Citation Context ...cision surfaces, and requires modifications to handle disjoint geometric regions that belong to the same class. 3.2.7. Random Mutation Hill Climbing. Skalak (1994) used random mutation hill climbing (=-=Papadimitriou & Steiglitz, 1982-=-) to select instances to use in S. The method begins with m randomly-selected instances in S (where m is a parameter that is supplied by the user). Then for each iteration (called a mutation), one ran... |

1058 | Instance-based learning algorithms
- Aha, Kibler, et al.
- 1991
(Show Context)
Citation Context ... would be affected by the instance's removal. Like RNN and many of the other algorithms, it retains border points, but unlike RNN, this algorithm is sensitive to noise. 3.2.3. IB3. The IB3 algorithm (=-=Aha et al., 1991-=-; Aha 1992) is another incremental algorithm that addresses IB2's problem of keeping noisy instances by retaining only acceptable misclassified instances. The algorithm proceeds as shown below. 1. For... |

955 |
Features of similarity
- Tversky
- 1977
(Show Context)
Citation Context ...993), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp, & Diday, 1981; Diday, 1974); the Context-Similarity measure (Biberman, 1994); the Contrast Model (=-=Tversky, 1977-=-); hyperrectangle distance functions (Salzberg, 1991; Domingos, 1995) and others. Several of these functions are defined in figure 1 (Wilson & Martinez, 1997a).262 WILSON AND MARTINEZ Figure 1. Equat... |

901 |
Nearest neighbor pattern classification
- Cover, Hart
- 1967
(Show Context)
Citation Context ...xemplar-based learning algorithms that use original instances from the training set as exemplars. One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm (=-=Cover & Hart, 1967-=-; Hart, 1968; Dasarathy, 1991). During generalization, instance-based learning algorithms use a distance function to determine how close a new input vector ~y is to each stored instance, and use the n... |

500 |
A massively parallel architecture for a self-organizing neural pattern recognition machine
- Carpenter, Grossberg
- 1987
(Show Context)
Citation Context ... & Martinez, 1996, 1997b) and other radial basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; Wasserman, 1993), as well as counterpropagation networks (Hecht-Nielsen, 1987), ART (=-=Carpenter & Grossberg, 1987-=-), and competitive learnings(Rumelhart & McClelland, 1986). Exemplar-based learning algorithms must often decide what exemplars to store for use during generalization in order to avoid excessive stora... |

475 |
Towards Memory-Based Reasoning
- Stanfill, Waltz
- 1986
(Show Context)
Citation Context ...ch stored instance, and use the nearest instance or instances to predict the output class of ~y (i.e., to classify ~y). Other exemplar-based machine learning paradigms include memory-based reasonings(=-=Stanfill & Waltz, 1986-=-), exemplar-based generalization (Salzberg, 1991; Wettschereck & Dietterich, 1995), and case-based reasoning (CBR) (Watson & Marir, 1994). Such algorithms have had much success on a wide variety of do... |

437 |
Multivariable Functional Interpolation and Adaptive
- Broomhead, Lowe
- 1988
(Show Context)
Citation Context ... also several exemplar-based neural network learning algorithms, including probabilistic neural networks (PNN) (Specht, 1992; Wilson & Martinez, 1996, 1997b) and other radial basis function networks (=-=Broomhead & Lowe, 1988-=-; Renals & Rohwer, 1989; Wasserman, 1993), as well as counterpropagation networks (Hecht-Nielsen, 1987), ART (Carpenter & Grossberg, 1987), and competitive learnings(Rumelhart & McClelland, 1986). Exe... |

423 |
Practical Nonparametric Statistics
- Conover
- 1980
(Show Context)
Citation Context ... a further effort to verify whether differences in accuracy and storage requirements on this entire set of classification tasks were statistically significant, a onetailed Wilcoxon Signed Ranks test (=-=Conover, 1971; DeGroot,-=- 1986) was used to compare DROP3 with each of the other reduction techniques. The confidence level of a significant difference is shown in the "Wilcoxon" row of Table 1. Positive values indi... |

269 |
Nearest Neighbor (NN) Norms: NN Patterns Classification Techniques
- Dasarathy
- 1991
(Show Context)
Citation Context ...s that use original instances from the training set as exemplars. One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm (Cover & Hart, 1967; Hart, 1968; =-=Dasarathy, 1991-=-). During generalization, instance-based learning algorithms use a distance function to determine how close a new input vector y is to each stored instance, and use the nearest instance or instances ... |

255 |
Probabilistic neural networks
- Specht
- 1990
(Show Context)
Citation Context ...ir, 1994). Such algorithms have had much success on a wide variety of domains. There are also several exemplar-based neural network learning algorithms, including probabilistic neural networks (PNN) (=-=Specht, 1992-=-; Wilson & Martinez, 1996, 1997b) and other radial basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; Wasserman, 1993), as well as counterpropagation networks (Hecht-Nielsen, 1987... |

249 |
The Condensed Nearest Neighbor Rule
- Hart
- 1968
(Show Context)
Citation Context ...ng algorithms that use original instances from the training set as exemplars. One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm (Cover & Hart, 1967; =-=Hart, 1968-=-; Dasarathy, 1991). During generalization, instance-based learning algorithms use a distance function to determine how close a new input vector ~y is to each stored instance, and use the nearest insta... |

204 | Improved heterogeneous distance functions
- Wilson, Martinez
- 1997
(Show Context)
Citation Context ... in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1--DROP5 and DEL (three of which were first described in =-=Wilson & Martinez, 1997-=-c, as RT1--RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that ... |

187 | The need for biases in learning generalizations
- Mitchell
- 1980
(Show Context)
Citation Context ...and an output value. After learning from the training set, the learning algorithm is presented with additional input vectors, and the algorithm must generalize, i.e., it must use some inductive bias (=-=Mitchell, 1980-=-; Schaffer, 1994; Dietterich, 1989; Wolpert, 1993) to decide what the output value should be even if the new input vector was not in the training set. A large number of machine learning algorithms com... |

187 | Asymptotic Properties of Nearest Neighbor Rules Using Edited Data - Wilson - 1972 |

170 | A Nearest Hyperrectangle Learning Method
- Salzberg
- 1991
(Show Context)
Citation Context ...ances to predict the output class of ~y (i.e., to classify ~y). Other exemplar-based machine learning paradigms include memory-based reasonings(Stanfill & Waltz, 1986), exemplar-based generalization (=-=Salzberg, 1991-=-; Wettschereck & Dietterich, 1995), and case-based reasoning (CBR) (Watson & Marir, 1994). Such algorithms have had much success on a wide variety of domains. There are also several exemplar-based neu... |

155 | Addressing the Curse of Imbalanced Training Sets: OneSided Selection - Kubat, Matwin - 1997 |

151 |
Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms
- Aha
- 1992
(Show Context)
Citation Context ... are often faced with the problem of deciding how many exemplars to store, and what portion of the input space they should cover. Instance-based learning (IBL) algorithms (Aha, Kibler & Albert, 1991; =-=Aha, 1992-=-) are a subset of exemplar-based learning algorithms that use original instances from the training set as exemplars. One of the most straightforward instance-based learning algorithms is the nearest n... |

148 |
A conservation law for generalization performance
- Schaffer
- 1994
(Show Context)
Citation Context ...lue. After learning from the training set, the learning algorithm is presented with additional input vectors, and the algorithm must generalize, i.e., it must use some inductive bias (Mitchell, 1980; =-=Schaffer, 1994-=-; Dietterich, 1989; Wolpert, 1993) to decide what the output value should be even if the new input vector was not in the training set. A large number of machine learning algorithms compute a distance ... |

144 | Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms - Skalak - 1994 |

122 |
The Reduced Nearest Neighbor Rule
- Gates
- 1972
(Show Context)
Citation Context ... for examination at any time, so a search can be made to determine which instance would be best to remove during each step of the algorithm. Decremental algorithms discussed in Section 3 include RNN (=-=Gates, 1972-=-), SNN (Ritter, Woodruff, Lowry & Isenhour, 1975), ENN (Wilson, 1972), VSM (Lowe, 1995), and the Shrink (Subtractive)sAlgorithm (Kibler & Aha, 1987). RISE (Domingos, 1995) can also be viewed as a decr... |

122 |
Applications of Counterpropagation Networks
- Hecht-Nielsen
- 1988
(Show Context)
Citation Context ...(PNN) (Specht, 1992; Wilson & Martinez, 1996, 1997b) and other radial basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; Wasserman, 1993), as well as counterpropagation networks (=-=Hecht-Nielsen, 1987-=-), ART (Carpenter & Grossberg, 1987), and competitive learnings(Rumelhart & McClelland, 1986). Exemplar-based learning algorithms must often decide what exemplars to store for use during generalizatio... |

113 | Similarity metric learning for a variable-kernel classifier
- Lowe
- 1995
(Show Context)
Citation Context ... be best to remove during each step of the algorithm. Decremental algorithms discussed in Section 3 include RNN (Gates, 1972), SNN (Ritter, Woodruff, Lowry & Isenhour, 1975), ENN (Wilson, 1972), VSM (=-=Lowe, 1995-=-), and the Shrink (Subtractive)sAlgorithm (Kibler & Aha, 1987). RISE (Domingos, 1995) can also be viewed as a decremental algorithm, except that instead of simply removing instances from S, instances ... |

103 |
Refinements to Nearest-Neighbor Searching in k-Dimensional Trees
- Sproull
- 1987
(Show Context)
Citation Context ...i.e., those with errors in the input vector or output class, or those not representative of typical cases) are stored as well, which can degrade generalization accuracy. Techniques such as k-d trees (=-=Sproull, 1991-=-; Wess, Althoff & Richter, 1993) and projection (Papadimitriou & Bentley, 1980) can reduce the time required to find the nearest neighbor(s) of an input vector, but they do not reduce storage requirem... |

95 |
The distance-weighted k-nearest neighbor rule
- Dudani
- 1976
(Show Context)
Citation Context ...ave to be resolved arbitrarily or through some more complicated scheme. There are some algorithms which give closer neighbors more influence than further ones, such as the Distance-Weighted kNN Rule (=-=Dudani, 1976-=-). Such modifications reduce the sensitivity of the algorithm to the selection of k. Radial Basis Function networks (Wasserman, 1993) and Probabilistic Neural Networks (Specht, 1992) use a Gaussian we... |

94 | An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms
- Wettschereck, Dietterich
- 1995
(Show Context)
Citation Context ... the output class of ~y (i.e., to classify ~y). Other exemplar-based machine learning paradigms include memory-based reasonings(Stanfill & Waltz, 1986), exemplar-based generalization (Salzberg, 1991; =-=Wettschereck & Dietterich, 1995-=-), and case-based reasoning (CBR) (Watson & Marir, 1994). Such algorithms have had much success on a wide variety of domains. There are also several exemplar-based neural network learning algorithms, ... |

87 | Unifying instance-based and rule-based induction - Domingos - 1996 |

74 |
Finding prototypes for nearest neighbor classifiers
- Chang
- 1974
(Show Context)
Citation Context ...erich, 1995) use hyperrectangles to represent collections of instances; instances can be generalized into rules (Domingos, 1995, 1996); and prototypes can be used to represent a cluster of instances (=-=Chang, 1974-=-), even if no original instance occurred at the point where the prototype is located. On the other hand, many algorithms (i.e., instance-based algorithms) seek to retain a subset of the original insta... |

69 |
An experiment with the edited nearest neighbor rule
- Tomek
- 1976
(Show Context)
Citation Context ...tch mode. This involves deciding if each instance meets the removal criteria before removing any of them. Then all those that do meet the criteria are removed at once. For example, the All k-NN rule (=-=Tomek, 1976-=-) operates this way. This can relieve the algorithm from having to constantly update lists of nearest neighbors and other information when instances are individually removed. However, there are also d... |

69 | Instance pruning techniques
- Jankowski, Wilson, et al.
- 1997
(Show Context)
Citation Context ... in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1--DROP5 and DEL (three of which were first described in =-=Wilson & Martinez, 1997-=-c, as RT1--RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that ... |

68 |
Nearest Neighbor (NN) Norms
- Dasarathy
- 1991
(Show Context)
Citation Context ...s that use original instances from the training set as exemplars. One of the most straightforward instance-based learning algorithms is the nearest neighbor algorithm (Cover & Hart, 1967; Hart, 1968; =-=Dasarathy, 1991-=-). During generalization, instance-based learning algorithms use a distance function to determine how close a new input vector ~y is to each stored instance, and use the nearest instance or instances ... |

65 |
Learning representative exemplars of concepts: An initial case study
- Kibler, Aha
- 1987
(Show Context)
Citation Context ... Decremental algorithms discussed in Section 3 include RNN (Gates, 1972), SNN (Ritter, Woodruff, Lowry & Isenhour, 1975), ENN (Wilson, 1972), VSM (Lowe, 1995), and the Shrink (Subtractive)sAlgorithm (=-=Kibler & Aha, 1987-=-). RISE (Domingos, 1995) can also be viewed as a decremental algorithm, except that instead of simply removing instances from S, instances are generalized into rules. Similarly, Chang's prototype rule... |

64 | Addressing the selective superiority problem: Automatic algorithm/model class selection - Brodley - 1993 |

60 | Rule Induction and Instance-Based Learning: A Unified Approach
- Domingos
- 1995
(Show Context)
Citation Context ... new representation. For example, some algorithms (Salzberg, 1991; Wettschereck & Dietterich, 1995) use hyperrectangles to represent collections of instances; instances can be generalized into rules (=-=Domingos, 1995-=-, 1996); and prototypes can be used to represent a cluster of instances (Chang, 1974), even if no original instance occurred at the point where the prototype is located. On the other hand, many algori... |

53 | Selecting typical instances in instance-based learning - Zhang - 1992 |

50 |
Case-Based Reasoning: A
- Watson, Marir
- 1994
(Show Context)
Citation Context ...ed machine learning paradigms include memory-based reasonings(Stanfill & Waltz, 1986), exemplar-based generalization (Salzberg, 1991; Wettschereck & Dietterich, 1995), and case-based reasoning (CBR) (=-=Watson & Marir, 1994-=-). Such algorithms have had much success on a wide variety of domains. There are also several exemplar-based neural network learning algorithms, including probabilistic neural networks (PNN) (Specht, ... |

49 | Using kd-Trees to Improve the Retrieval Step in Case-Based Reasoning
- Wess, Althoff, et al.
(Show Context)
Citation Context ...h errors in the input vector or output class, or those not representative of typical cases) are stored as well, which can degrade generalization accuracy. Techniques such as k-d trees (Sproull, 1991; =-=Wess, Althoff, & Richter, 1993-=-) and projection (Papadimitriou & Bentley, 1980) can reduce the time required to find the nearest neighbor(s) of an input vector, but they do not reduce storage requirements, nor do they address the p... |

47 |
An Algorithm for a Selective Nearest Neighbor Decision Rule
- Ritter, Woodruff, et al.
- 1975
(Show Context)
Citation Context ... any time, so a search can be made to determine which instance would be best to remove during each step of the algorithm. Decremental algorithms discussed in Section 3 include RNN (Gates, 1972), SNN (=-=Ritter et al. 1975-=-), ENN (Wilson, 1972), VSM (Lowe, 1995), and the SHRINK (SUBTRACTIVE) Algorithm (Kibler & Aha, 1987). RISE (Domingos, 1995) can also be viewed as a decremental algorithm, except that instead of simply... |

43 | A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts - MICHALSKI, STEPP, et al. - 1981 |

43 |
Pattern recognition engineering
- Nadler, Smith
- 1993
(Show Context)
Citation Context ... the range or standard deviation of the attribute. A variety of other distance functions are also available for continuously-valued attributes, including the Minkowsky (Batchelor, 1978), Mahalanobis (=-=Nadler & Smith, 1993-=-), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp, & Diday, 1981; Diday, 1974); the Context-Similarity measure (Biberman, 1994); the Contrast Model (Tve... |

37 | A Review and Comparative Evaluation of Feature Weighting Methods for Lazy Learning Algorithms - Wettschereck, Aha, et al. - 1995 |

36 | A hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm - Wettschereck - 1994 |

33 | On overfitting avoidance as bias
- Wolpert
(Show Context)
Citation Context ...ing set, the learning algorithm is presented with additional input vectors, and the algorithm must generalize, i.e., it must use some inductive bias (Mitchell, 1980; Schaffer, 1994; Dietterich, 1989; =-=Wolpert, 1993-=-) to decide what the output value should be even if the new input vector was not in the training set. A large number of machine learning algorithms compute a distance between the input vector and stor... |

32 |
Advanced methods in neural computing
- Wasserman
- 1993
(Show Context)
Citation Context ...rning algorithms, including probabilistic neural networks (PNN) (Specht, 1992; Wilson & Martinez, 1996, 1997b) and other radial basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; =-=Wasserman, 1993-=-), as well as counterpropagation networks (Hecht-Nielsen, 1987), ART (Carpenter & Grossberg, 1987), and competitive learnings(Rumelhart & McClelland, 1986). Exemplar-based learning algorithms must oft... |

29 |
Limitations on inductive learning
- Dietterich
- 1989
(Show Context)
Citation Context ...ing from the training set, the learning algorithm is presented with additional input vectors, and the algorithm must generalize, i.e., it must use some inductive bias (Mitchell, 1980; Schaffer, 1994; =-=Dietterich, 1989-=-; Wolpert, 1993) to decide what the output value should be even if the new input vector was not in the training set. A large number of machine learning algorithms compute a distance between the input ... |

27 |
Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing
- Cameron-Jones
- 1995
(Show Context)
Citation Context ...hod the "Pre/All" method, since it is not truly incremental, but to better distinguish it from other techniques in this paper, we call it the Encoding Length Grow (ELGrow) method. The Explor=-=e method (Cameron-Jones, 1995-=-) begins by growing and reducing S using the ELGrow method, and then performs 1000 mutations to try to improve the classifier. Each mutation tries adding an instance to S, removing one from S, or swap... |

27 |
Phoneme classification experiments using radial basis functions
- Renals, Rohwer
- 1989
(Show Context)
Citation Context ...ased neural network learning algorithms, including probabilistic neural networks (PNN) (Specht, 1992; Wilson & Martinez, 1996, 1997b) and other radial basis function networks (Broomhead & Lowe, 1988; =-=Renals & Rohwer, 1989-=-; Wasserman, 1993), as well as counterpropagation networks (Hecht-Nielsen, 1987), ART (Carpenter & Grossberg, 1987), and competitive learnings(Rumelhart & McClelland, 1986). Exemplar-based learning al... |

22 |
A context similarity measure
- BIBERMAN
- 1994
(Show Context)
Citation Context ... 1978), Mahalanobis (Nadler & Smith, 1993), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp & Diday, 1981; Diday, 1974); the Context-Similarity measure (=-=Biberman, 1994-=-); the Contrast Model (Tversky, 1977); hyperrectangle distance functions (Salzberg, 1991; Domingos, 1995) and others. Several of these functions are defined in Figure 1 (Wilson & Martinez, 1997a). Whe... |

19 |
Case-based reasoning: A review. The Knowledge Engineering Review 9(4):355–381. http://www.ai-cbr.org/classroom/cbr-review.html
- Watson, Marir
- 1994
(Show Context)
Citation Context ...ed machine learning paradigms include memory-based reasoning (Stanfill & Waltz, 1986), exemplar-based generalization (Salzberg, 1991; Wettschereck & Dietterich, 1995), and case-based reasoning (CBR) (=-=Watson & Marir, 1994-=-). Such algorithms have had much success on a wide variety of domains. There are also several exemplarbased neural network learning algorithms, including probabilistic neural networks (PNN) (Specht, 1... |

18 |
Pattern Recognition: Ideas in Practice
- Batchelor
- 1978
(Show Context)
Citation Context ...dividual attribute distances by the range or standard deviation of the attribute. A variety of other distance functions are also available for continuously-valued attributes, including the Minkowsky (=-=Batchelor, 1978-=-), Mahalanobis (Nadler & Smith, 1993), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp, & Diday, 1981; Diday, 1974); the Context-Similarity measure (Bibe... |