#### DMCA

## Discriminant Adaptive Nearest Neighbor Classification (1994)

### Cached

### Download Links

- [cs.uvm.edu]
- [www-stat.stanford.edu]
- [utstat.toronto.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 318 - 1 self |

### Citations

5843 | Classification and Regression Trees, - Breiman, Friedman, et al. |

4788 | Pattern classification and scene analysis. - Duda, Hart - 1973 |

1355 |
Nearest neighbor pattern classification.
- Cover, Hart
- 1967
(Show Context)
Citation Context ...mbership of an observation with predictor vector x0 Nearest neighbor classification is a simple and appealing approach to this problem. We find the set of K nearest neighbors in the training set to x0 and then classify x0 as the most frequent class among the K neighbors. Nearest neighbors is an extremely flexible classification scheme, and does not involve any pre-processing (fitting) of the training data. This can offer both space and speed advantages in very large problems: see Cover (1968), Duda & Hart (1973), McLachlan (1992) for background material on nearest neighborhood classification. Cover & Hart (1967) show that the one nearest neighbour rule has asymptotic error rate at most twice the Bayes rate. However in finite samples the curse of dimensionality can severely hurt the nearest neighbor rule. The relative radius of the nearest-neighbor sphere grows like r1/p where p is the dimension and r the radius for p = 1, resulting in severe bias at the target point x. Figure 1 illustrates the situation for a simple example. Figure 1: The vertical strip denotes the NN region using only the X coordinate to find the nearest neighbor for the target point (solid dot). The sphere shows the NN region using... |

591 |
Discriminant Analysis and Statistical Pattern Recognition,
- McLachlan
- 1992
(Show Context)
Citation Context ...p predictors and the known class memberships. Our goal is to predict the class membership of an observation with predictor vector x0 Nearest neighbor classification is a simple and appealing approach to this problem. We find the set of K nearest neighbors in the training set to x0 and then classify x0 as the most frequent class among the K neighbors. Nearest neighbors is an extremely flexible classification scheme, and does not involve any pre-processing (fitting) of the training data. This can offer both space and speed advantages in very large problems: see Cover (1968), Duda & Hart (1973), McLachlan (1992) for background material on nearest neighborhood classification. Cover & Hart (1967) show that the one nearest neighbour rule has asymptotic error rate at most twice the Bayes rate. However in finite samples the curse of dimensionality can severely hurt the nearest neighbor rule. The relative radius of the nearest-neighbor sphere grows like r1/p where p is the dimension and r the radius for p = 1, resulting in severe bias at the target point x. Figure 1 illustrates the situation for a simple example. Figure 1: The vertical strip denotes the NN region using only the X coordinate to find the nea... |

271 | Efficient pattern recognition using a new transformation distance, - Simard, LeCun, et al. - 1993 |

181 | Neural Networks and Related Methods for Classification. - Ripley - 1994 |

84 |
The optimal distance measure for nearest neighbor classification’,
- Short, Fukanaga
- 1981
(Show Context)
Citation Context ...atellite Image Classification o 0°. o 5 10 15 20 25 ~rnee, s~n Figure 10: Miscla.ssification results as a function of subspace size, for the satellite image data determined locally in a neighborhood of size KM. In effect this extends the neighborhood infinitely in the null space of the local between class directions, but they restrict this neighborhood to the original KM observations. This amounts to projecting the local data onto the line joining the two local centroids. In our experiments this approach tended to perform on average 10% worse than our metric, and we did not pursue it further. Short & Fukanaga (1981) extended this J > 2 classes, but here their approach differs even more from ours. They computed a weighted average of the J local centroids from the overall average, and project the data onto it, a one dimensional projection. Even with e = 0 we project the data onto the subspace containing the local centroids, and deform the metric appropriately in that subspace. Myles & Hand (1990) recognized a shortfall of the Short and Fukanaga approach, since the averaging can cause cancellation, and proposed other metrics to avoid this. Although their metrics differ from ours, the Chi-squared motivation ... |

68 | Robust locally-weighted regression and smoothing scatterplots - Cleveland - 1979 |

65 | Slicing regression: a linkfree regression method’, - Duan, Li - 1991 |

38 |
Rates of convergence for nearest neighbor procedures,
- Cover
- 1968
(Show Context)
Citation Context ...measurements x = (zl,z2,...zp) on p predictors and the known class memberships. Our goal is to predict the class membership of an observation with predictor vector x0 Nearest neighbor classification is a simple and appealing approach to this problem. We find the set of K nearest neighbors in the training set to x0 and then classify x0 as the most frequent class among the K neighbors. Nearest neighbors is an extremely flexible classification scheme, and does not involve any pre-processing (fitting) of the training data. This can offer both space and speed advantages in very large problems: see Cover (1968), Duda & Hart (1973), McLachlan (1992) for background material on nearest neighborhood classification. Cover & Hart (1967) show that the one nearest neighbour rule has asymptotic error rate at most twice the Bayes rate. However in finite samples the curse of dimensionality can severely hurt the nearest neighbor rule. The relative radius of the nearest-neighbor sphere grows like r1/p where p is the dimension and r the radius for p = 1, resulting in severe bias at the target point x. Figure 1 illustrates the situation for a simple example. Figure 1: The vertical strip denotes the NN region using... |

32 | Learning prototype models for tangent distance, - IIastie, Simard, et al. - 1993 |

27 |
The multi-class metric problem in nearest neighbour discrimination rules’,
- Myles, Hand
- 1990
(Show Context)
Citation Context ...is amounts to projecting the local data onto the line joining the two local centroids. In our experiments this approach tended to perform on average 10% worse than our metric, and we did not pursue it further. Short & Fukanaga (1981) extended this J > 2 classes, but here their approach differs even more from ours. They computed a weighted average of the J local centroids from the overall average, and project the data onto it, a one dimensional projection. Even with e = 0 we project the data onto the subspace containing the local centroids, and deform the metric appropriately in that subspace. Myles & Hand (1990) recognized a shortfall of the Short and Fukanaga approach, since the averaging can cause cancellation, and proposed other metrics to avoid this. Although their metrics differ from ours, the Chi-squared motivation for our metric (3) was inspired by the metrics developed in their paper. We have not tested out their proposals, but they report results of experiments with far more modest improvements over standard nearest neighbors than we achieved. Friedman (1994) proposes a number of techniques for flexible metric nearest neighbor classification. These techniques use a recursive partitioning sty... |

16 | A new nearest neighbor distance measure - Short, Fukanaga |

12 |
Flexible metric nearest neighbour classification,
- Friedman
- 1994
(Show Context)
Citation Context ... e = 0 we project the data onto the subspace containing the local centroids, and deform the metric appropriately in that subspace. Myles & Hand (1990) recognized a shortfall of the Short and Fukanaga approach, since the averaging can cause cancellation, and proposed other metrics to avoid this. Although their metrics differ from ours, the Chi-squared motivation for our metric (3) was inspired by the metrics developed in their paper. We have not tested out their proposals, but they report results of experiments with far more modest improvements over standard nearest neighbors than we achieved. Friedman (1994) proposes a number of techniques for flexible metric nearest neighbor classification. These techniques use a recursive partitioning style strategy to adaptively shrink and shape rectangular neighborhoods around the test point. Friedman also uses derived variables in the process, including discriminant variates. With the latter variables, his procedures have some similarity to the discriminant adaptive nearest neighbor approach. Other recent work that is somewhat related to this is that of Lowe (1993). He estimates the covariance matrix in a variable kernel classifier using a neural network app... |

12 |
Similarity metric learning for variable kernel classifier,
- Lowe
- 1993
(Show Context)
Citation Context ...iments with far more modest improvements over standard nearest neighbors than we achieved. Friedman (1994) proposes a number of techniques for flexible metric nearest neighbor classification. These techniques use a recursive partitioning style strategy to adaptively shrink and shape rectangular neighborhoods around the test point. Friedman also uses derived variables in the process, including discriminant variates. With the latter variables, his procedures have some similarity to the discriminant adaptive nearest neighbor approach. Other recent work that is somewhat related to this is that of Lowe (1993). He estimates the covariance matrix in a variable kernel classifier using a neural network approach. There are a number of ways in which this work might be generalized. In some discrimination problems, it is natural to use specialized distance measures that capture invariances in the feature space. For example Simard, LeCun & Denker (1993), IIastie, Simard & Sackinger (1993), use a transformation-invariant metric to measure distance between digitized images of handwritten numerals in a nearest neighbor rule. The invariances include local transformations of images such as rotation, shear and s... |

10 | Nearest neighbor pattern classification', Proc - Cover - 1967 |

1 | Penalized discriminant analysis, To appear, Annals of Statistics - Hastie, Buja - 1994 |

1 | Nearest neighbor pattern classification ", Proc - Cover, Hart - 1967 |

1 | Penalized discriminant analysis", To appear, Annals of Statistics - Hastie, Buja, et al. - 1994 |

1 |
A new nearest neighbor distance measure,
- Fukanaga
- 1980
(Show Context)
Citation Context ...uded in the figure is the result for DANN, which has outperformed 5-NN. We also ran the subspace version of DANN, and figure 10 shows the sequence of of test-error results as a function of subspace size. Again, a low-dimensional subspace actually improves the misclassification error. Discussion We have developed an adaptive form of nearest neighbor classification that can offer substantial improvements over standard nearest neighbors method in some problems. We have also proposed a method that uses local discrimination information to estimate a subspace for global dimension reduction. Short & Fukanaga (1980) proposed a technique close to ours for the two class problem. In our terminology they used our metric with W = I and e = 0, with B 1The authors thank C. Taylor and D. Spiegelhalter for making these images and data available 148 KDD-95 ..h o Satellite Image Classification o 0°. o 5 10 15 20 25 ~rnee, s~n Figure 10: Miscla.ssification results as a function of subspace size, for the satellite image data determined locally in a neighborhood of size KM. In effect this extends the neighborhood infinitely in the null space of the local between class directions, but they restrict this neighborhood to... |