## Improved Heterogeneous Distance Functions (1997)

### Cached

### Download Links

- [arxiv.org]
- [www.cs.washington.edu]
- [axon.cs.byu.edu]
- [www.cs.cmu.edu]
- [www.jair.org]
- [axon.cs.byu.edu]
- [axon.cs.byu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Artificial Intelligence Research |

Citations: | 205 - 10 self |

### BibTeX

@ARTICLE{Wilson97improvedheterogeneous,

author = {D. Randall Wilson and Tony R. Martinez},

title = {Improved Heterogeneous Distance Functions},

journal = {Journal of Artificial Intelligence Research},

year = {1997},

volume = {6},

pages = {1--34}

}

### Years of Citing Articles

### OpenURL

### Abstract

Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes. 1. Introduction Instance-Based Learning (IBL) (Aha, ...

### Citations

3277 |
The Self-Organizing Map
- Kohonen
- 1990
(Show Context)
Citation Context ...al basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; Wasserman, 1993), counterpropagation networks (Hecht-Nielsen, 1987), ART (Carpenter & Grossberg, 1987), selforganizing maps (=-=Kohonen, 1990-=-) and competitive learning (Rumelhart & McClelland, 1986). Distance functions are also used in many fields besides machine learning and neural networks, including statistics (Atkeson, Moore & Schaal, ... |

2881 |
UCI repository of machine learning databases. [www.ics.uci.edu/âˆ¼mlearn/MLRepository.html
- Blake, Merz
- 1998
(Show Context)
Citation Context ...cy (Ventura & Martinez, 1995). Many real-world applications have both nominal and linear attributes, including, for example, over half of the datasets in the UCI Machine Learning Database Repository (=-=Merz & Murphy, 1996-=-). This paper introduces three new distance functions that are more appropriate than previous functions for applications with both nominal and continuous attributes. These new distance functions can b... |

1058 | Instance-based learning algorithms - Aha, Kibler, et al. - 1991 |

956 |
Features of similarity
- Tversky
- 1977
(Show Context)
Citation Context ...fields besides machine learning and neural networks, including statistics (Atkeson, Moore & Schaal, 1996), pattern recognition (Diday, 1974; Michalski, Stepp & Diday, 1981), and cognitive psychology (=-=Tversky, 1977-=-; Nosofsky, 1986). WILSON & MARTINEZ 2 There are many distance functions that have been proposed to decide which instance is closest to a given input vector (Michalski, Stepp & Diday, 1981; Diday, 197... |

901 |
Nearest neighbor pattern classification
- Cover, Hart
- 1967
(Show Context)
Citation Context ...red instance, and use the nearest instance or instances to predict the output class of y (i.e., to classify y). Some instance-based learning algorithms are referred to as nearest neighbor techniques (=-=Cover & Hart, 1967-=-; Hart, 1968; Dasarathy, 1991), and memorybased reasoning methods (Stanfill & Waltz, 1986; Cost & Salzberg, 1993; Rachlin et al., 1994) overlap significantly with the instance-based paradigm as well. ... |

744 | On estimation of probability density function and mode - Parzen - 1962 |

500 |
A massively parallel architecture for a self-organizing neural pattern recognition machine
- Carpenter, Grossberg
- 1987
(Show Context)
Citation Context ...also make use of distance functions, including radial basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; Wasserman, 1993), counterpropagation networks (Hecht-Nielsen, 1987), ART (=-=Carpenter & Grossberg, 1987-=-), selforganizing maps (Kohonen, 1990) and competitive learning (Rumelhart & McClelland, 1986). Distance functions are also used in many fields besides machine learning and neural networks, including ... |

475 |
Towards Memory-Based Reasoning
- Stanfill, Waltz
- 1986
(Show Context)
Citation Context ...f y (i.e., to classify y). Some instance-based learning algorithms are referred to as nearest neighbor techniques (Cover & Hart, 1967; Hart, 1968; Dasarathy, 1991), and memorybased reasoning methods (=-=Stanfill & Waltz, 1986-=-; Cost & Salzberg, 1993; Rachlin et al., 1994) overlap significantly with the instance-based paradigm as well. Such algorithms have had much success on a wide variety of applications (real-world class... |

459 | Locally weighted learning - Atkeson, Moore, et al. - 1997 |

437 |
Multivariable Functional Interpolation and Adaptive
- Broomhead, Lowe
- 1988
(Show Context)
Citation Context ...ms have had much success on a wide variety of applications (real-world classification tasks). Many neural network models also make use of distance functions, including radial basis function networks (=-=Broomhead & Lowe, 1988-=-; Renals & Rohwer, 1989; Wasserman, 1993), counterpropagation networks (Hecht-Nielsen, 1987), ART (Carpenter & Grossberg, 1987), selforganizing maps (Kohonen, 1990) and competitive learning (Rumelhart... |

433 | Attention, similarity, and the identification-categorization relationship
- Nosofsky
- 1986
(Show Context)
Citation Context ...machine learning and neural networks, including statistics (Atkeson, Moore & Schaal, 1996), pattern recognition (Diday, 1974; Michalski, Stepp & Diday, 1981), and cognitive psychology (Tversky, 1977; =-=Nosofsky, 1986-=-). WILSON & MARTINEZ 2 There are many distance functions that have been proposed to decide which instance is closest to a given input vector (Michalski, Stepp & Diday, 1981; Diday, 1974). Many of thes... |

290 |
Remarks on Some Nonparametric Estimates of a Density Function
- Rosenblatt
- 1956
(Show Context)
Citation Context ...ling points, a window of instances, centered on each training instance, is used for determining the probability at a given point. This technique is similar in concept to shifted histogram estimators (=-=Rosenblatt, 1956-=-) and to Parzen window techniques (Parzen, 1962). For each attribute the values are sorted (using an O(nlogn) sorting algorithm) so as to allow a sliding window to be used and thus collect the needed ... |

266 | A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features
- Cost, Salzberg
- 1993
(Show Context)
Citation Context ...). Some instance-based learning algorithms are referred to as nearest neighbor techniques (Cover & Hart, 1967; Hart, 1968; Dasarathy, 1991), and memorybased reasoning methods (Stanfill & Waltz, 1986; =-=Cost & Salzberg, 1993-=-; Rachlin et al., 1994) overlap significantly with the instance-based paradigm as well. Such algorithms have had much success on a wide variety of applications (real-world classification tasks). Many ... |

249 |
The Condensed Nearest Neighbor Rule
- Hart
- 1968
(Show Context)
Citation Context ...e the nearest instance or instances to predict the output class of y (i.e., to classify y). Some instance-based learning algorithms are referred to as nearest neighbor techniques (Cover & Hart, 1967; =-=Hart, 1968-=-; Dasarathy, 1991), and memorybased reasoning methods (Stanfill & Waltz, 1986; Cost & Salzberg, 1993; Rachlin et al., 1994) overlap significantly with the instance-based paradigm as well. Such algorit... |

187 | The need for biases in learning generalizations
- Mitchell
- 1980
(Show Context)
Citation Context ...each system provides. The choice of distance function influences the bias of a learning algorithm. A bias is "a rule or method that causes an algorithm to choose one generalized output over anoth=-=er" (Mitchell, 1980-=-). A learning algorithm must have a bias in order to generalize, and it has been shown that no learning algorithm can generalize more accurately than any other when summed over all possible problems (... |

187 |
Asymptotic Properties of Nearest Neighbor Rules Using Edited Data
- Wilson
- 1972
(Show Context)
Citation Context ...alzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingos, 1995), random mutation hill climbing (Skalak, 1994; Cameron-Jones, 1995) and others (Kibler & Aha, 1987; Tomek, 1976; =-=Wilson, 1972-=-). 8. Conclusions & Future Research Areas There are many learning systems that depend on a reliable distance function to achieve accurate generalization. The Euclidean distance function and many other... |

170 | A Nearest Hyperrectangle Learning Method
- Salzberg
- 1991
(Show Context)
Citation Context ... and Chi-square distance metrics (Michalski, Stepp & Diday, 1981; Diday, 1974); the Context-Similarity measure (Biberman, 1994); the Contrast Model (Tversky, 1977); hyperrectangle distance functions (=-=Salzberg, 1991-=-; Domingos, 1995) and others. Several of these functions are defined in Figure 1. Although there have been many distance functions proposed, by far the most commonly used is the Euclidean Distance fun... |

152 |
Tolerating noisy, irrelevant, and novel attributes in instance-based learning algorithms
- Aha
- 1992
(Show Context)
Citation Context ...curacy on average than three previous distance functions on those datasets that have both nominal and continuous attributes. 1. Introduction Instance-Based Learning (IBL) (Aha, Kibler & Albert, 1991; =-=Aha, 1992-=-; Wilson & Martinez, 1993; Wettschereck, Aha & Mohri, 1995; Domingos, 1995) is a paradigm of learning in which algorithms typically store some or all of the n available training examples (instances) f... |

148 |
A conservation law for generalization performance
- Schaffer
- 1994
(Show Context)
Citation Context ...). A learning algorithm must have a bias in order to generalize, and it has been shown that no learning algorithm can generalize more accurately than any other when summed over all possible problems (=-=Schaffer, 1994-=-) (unless information about the problem other than the training data is available). It follows then that no distance function can be strictly better than any other in terms of generalization ability, ... |

144 | Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms
- Skalak
- 1994
(Show Context)
Citation Context ...thm (Zhang, 1992), prototype methods (Chang, 1974), hyperrectangle techniques (Salzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingos, 1995), random mutation hill climbing (=-=Skalak, 1994-=-; Cameron-Jones, 1995) and others (Kibler & Aha, 1987; Tomek, 1976; Wilson, 1972). 8. Conclusions & Future Research Areas There are many learning systems that depend on a reliable distance function to... |

123 |
Counterpropagation networks
- Hecht-Nielsen
- 1987
(Show Context)
Citation Context ...Many neural network models also make use of distance functions, including radial basis function networks (Broomhead & Lowe, 1988; Renals & Rohwer, 1989; Wasserman, 1993), counterpropagation networks (=-=Hecht-Nielsen, 1987-=-), ART (Carpenter & Grossberg, 1987), selforganizing maps (Kohonen, 1990) and competitive learning (Rumelhart & McClelland, 1986). Distance functions are also used in many fields besides machine learn... |

122 | The Reduced Nearest Neighbor Rule - Gates - 1972 |

105 |
Unknown attribute values in induction
- Quinlan
- 1989
(Show Context)
Citation Context ...ance-weighted k-nearest neighbor, Dudani, 1976) that require the square root to be evaluated. Many applications contain unknown input values which must be handled appropriately in a practical system (=-=Quinlan, 1989-=-). The function da (x,y) therefore returns a distance of 1 if either x or y is unknown, as is done by Aha, Kibler & Albert (1991) and Giraud-Carrier & Martinez (1995). Other more complicated methods h... |

103 |
Refinements to Nearest-Neighbor Searching in k-Dimensional Trees
- Sproull
- 1987
(Show Context)
Citation Context ..., the generalization process can require a significant amount of time and/or computational resources as n grows large. Techniques such as k-d trees (Deng & Moore, 1995; Wess, Althoff & Derwand, 1993; =-=Sproull, 1991-=-) and projection (Papadimitriou & Bentley, 1980) can reduce the time required to locate nearest neighbors from the training set, though such algorithms may require modification to handle both continuo... |

95 | The distance-weighted k-nearest neighbor rule - Dudani - 1976 |

94 | An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms
- Wettschereck, Dietterich
- 1995
(Show Context)
Citation Context ...1972), the selective nearest neighbor rule (Rittler et al., 1975), typical instance based learning algorithm (Zhang, 1992), prototype methods (Chang, 1974), hyperrectangle techniques (Salzberg, 1991; =-=Wettschereck & Dietterich, 1995-=-), rule-based techniques (Domingos, 1995), random mutation hill climbing (Skalak, 1994; Cameron-Jones, 1995) and others (Kibler & Aha, 1987; Tomek, 1976; Wilson, 1972). 8. Conclusions & Future Researc... |

74 |
Finding prototypes for nearest neighbor classifiers
- Chang
- 1974
(Show Context)
Citation Context ... (Hart, 1968), the reduced nearest neighbor rule (Gates, 1972), the selective nearest neighbor rule (Rittler et al., 1975), typical instance based learning algorithm (Zhang, 1992), prototype methods (=-=Chang, 1974-=-), hyperrectangle techniques (Salzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingos, 1995), random mutation hill climbing (Skalak, 1994; Cameron-Jones, 1995) and others (Ki... |

69 |
An experiment with the edited nearest neighbor rule
- Tomek
- 1976
(Show Context)
Citation Context ...techniques (Salzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingos, 1995), random mutation hill climbing (Skalak, 1994; Cameron-Jones, 1995) and others (Kibler & Aha, 1987; =-=Tomek, 1976-=-; Wilson, 1972). 8. Conclusions & Future Research Areas There are many learning systems that depend on a reliable distance function to achieve accurate generalization. The Euclidean distance function ... |

68 |
Nearest Neighbor (NN) Norms
- Dasarathy
- 1991
(Show Context)
Citation Context ...t instance or instances to predict the output class of y (i.e., to classify y). Some instance-based learning algorithms are referred to as nearest neighbor techniques (Cover & Hart, 1967; Hart, 1968; =-=Dasarathy, 1991-=-), and memorybased reasoning methods (Stanfill & Waltz, 1986; Cost & Salzberg, 1993; Rachlin et al., 1994) overlap significantly with the instance-based paradigm as well. Such algorithms have had much... |

65 |
Learning representative exemplars of concepts: An initial case study
- Kibler, Aha
- 1987
(Show Context)
Citation Context ...74), hyperrectangle techniques (Salzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingos, 1995), random mutation hill climbing (Skalak, 1994; Cameron-Jones, 1995) and others (=-=Kibler & Aha, 1987-=-; Tomek, 1976; Wilson, 1972). 8. Conclusions & Future Research Areas There are many learning systems that depend on a reliable distance function to achieve accurate generalization. The Euclidean dista... |

65 | Selecting a classification method by cross-validation
- Schaffer
- 1993
(Show Context)
Citation Context ...HVDM as the distance metric. The system was tested on the heterogeneous datasets appearing in Table 1 using the three different normalization schemes discussed above, using ten-fold cross-validation (=-=Schaffer, 1993-=-), and the results are summarized in Table 2. All the normalization schemes used the same training sets and test sets for each trial. Bold entries indicate which scheme had the highest accuracy. An as... |

61 |
Multiresolution instance-based learning
- Deng, Moore
- 1995
(Show Context)
Citation Context ...nC) time. Though m and C are typically fairly small, the generalization process can require a significant amount of time and/or computational resources as n grows large. Techniques such as k-d trees (=-=Deng & Moore, 1995-=-; Wess, Althoff & Derwand, 1993; Sproull, 1991) and projection (Papadimitriou & Bentley, 1980) can reduce the time required to locate nearest neighbors from the training set, though such algorithms ma... |

60 | Rule Induction and Instance-Based Learning: A Unified Approach
- Domingos
- 1995
(Show Context)
Citation Context ...asets that have both nominal and continuous attributes. 1. Introduction Instance-Based Learning (IBL) (Aha, Kibler & Albert, 1991; Aha, 1992; Wilson & Martinez, 1993; Wettschereck, Aha & Mohri, 1995; =-=Domingos, 1995-=-) is a paradigm of learning in which algorithms typically store some or all of the n available training examples (instances) from a training set, T, during learning. Each instance has an input vector ... |

56 | Nonparametric Probability Density Estimation - Tapia, Thompson - 1978 |

52 | Using Local Models to Control Movement - Atkeson - 1989 |

52 | Computational Methods for Local Regression
- Cleveland, Grosse
- 1991
(Show Context)
Citation Context ...ication, in which an input vector is mapped into a discrete output class. These distance functions could also be used in systems that perform regression (Atkeson, Moore & Schaal, 1996; Atkeson, 1989; =-=Cleveland & Loader, 1994-=-), in which the output is a real value, often interpolated from nearby points, as in kernel regression (Deng & Moore, 1995). As mentioned in Section 6.2 and elsewhere, pruning techniques can be used t... |

49 | Using kd-Trees to Improve the Retrieval Step in Case-Based Reasoning - Wess, Althoff, et al. |

47 |
An Algorithm for a Selective Nearest Neighbor Decision Rule
- Ritter, Woodruff, et al.
- 1975
(Show Context)
Citation Context ...roduced, including IB3 (Aha, Kibler & Albert, 1991; Aha, 1992), the condensed nearest neighbor rule (Hart, 1968), the reduced nearest neighbor rule (Gates, 1972), the selective nearest neighbor rule (=-=Rittler et al., 1975-=-), typical instance based learning algorithm (Zhang, 1992), prototype methods (Chang, 1974), hyperrectangle techniques (Salzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingo... |

43 | A recent advance in data analysis: Clustering objects into classes characterized by conjunctive concepts - MICHALSKI, STEPP, et al. - 1981 |

43 |
Pattern recognition engineering
- Nadler, Smith
- 1993
(Show Context)
Citation Context ... many learning systems that depend upon a good distance function to be successful. A variety of distance functions are available for such uses, including the Minkowsky (Batchelor, 1978), Mahalanobis (=-=Nadler & Smith, 1993-=-), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp & Diday, 1981; Diday, 1974); the Context-Similarity measure (Biberman, 1994); the Contrast Model (Tver... |

37 | A Review and Comparative Evaluation of Feature Weighting Methods for Lazy Learning Algorithms - Wettschereck, Aha, et al. - 1995 |

34 | Towards a Better Understanding for Memory-Based Reasoning
- Rachlin, Kasif, et al.
- 1994
(Show Context)
Citation Context ...learning algorithms are referred to as nearest neighbor techniques (Cover & Hart, 1967; Hart, 1968; Dasarathy, 1991), and memorybased reasoning methods (Stanfill & Waltz, 1986; Cost & Salzberg, 1993; =-=Rachlin et al., 1994-=-) overlap significantly with the instance-based paradigm as well. Such algorithms have had much success on a wide variety of applications (real-world classification tasks). Many neural network models ... |

33 | On overfitting avoidance as bias
- Wolpert
(Show Context)
Citation Context ...ible problems with equal probability. However, when there is a higher probability of one class of problems occurring than another, some learning algorithms can generalize more accurately than others (=-=Wolpert, 1993-=-). This is not because they are better when summed over all problems, but because the problems on which they perform well are more likely to occur. In this sense, one algorithm or distance function ca... |

27 |
Instance Selection by Encoding Length Heuristic with Random Mutation Hill Climbing
- Cameron-Jones
- 1995
(Show Context)
Citation Context ...92), prototype methods (Chang, 1974), hyperrectangle techniques (Salzberg, 1991; Wettschereck & Dietterich, 1995), rule-based techniques (Domingos, 1995), random mutation hill climbing (Skalak, 1994; =-=Cameron-Jones, 1995-=-) and others (Kibler & Aha, 1987; Tomek, 1976; Wilson, 1972). 8. Conclusions & Future Research Areas There are many learning systems that depend on a reliable distance function to achieve accurate gen... |

27 |
Phoneme classification experiments using radial basis functions
- Renals, Rohwer
- 1989
(Show Context)
Citation Context ... on a wide variety of applications (real-world classification tasks). Many neural network models also make use of distance functions, including radial basis function networks (Broomhead & Lowe, 1988; =-=Renals & Rohwer, 1989-=-; Wasserman, 1993), counterpropagation networks (Hecht-Nielsen, 1987), ART (Carpenter & Grossberg, 1987), selforganizing maps (Kohonen, 1990) and competitive learning (Rumelhart & McClelland, 1986). D... |

26 | An Optimal Weighting Criterion of Case Indexing for Both Numeric and Symbolic Attributes. Workshop
- Mohri, Tanaka
- 1994
(Show Context)
Citation Context ...ing VDM on continuous attributes is discretization (Lebowitz, 1985; Schlimmer, 1987; Ventura, 1995). Some models that have used the VDM or variants of it (Cost & Salzberg, 1993; Rachlin et al., 1994; =-=Mohri & Tanaka, 1994-=-) have discretized continuous attributes into a somewhat arbitrary number of discrete ranges, and then treated these values as nominal (discrete unordered) values. This method has the advantage of gen... |

22 |
A context similarity measure
- BIBERMAN
- 1994
(Show Context)
Citation Context ... 1978), Mahalanobis (Nadler & Smith, 1993), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp & Diday, 1981; Diday, 1974); the Context-Similarity measure (=-=Biberman, 1994-=-); the Contrast Model (Tversky, 1977); hyperrectangle distance functions (Salzberg, 1991; Domingos, 1995) and others. Several of these functions are defined in Figure 1. Although there have been many ... |

22 | Exploiting context when learning to classify - Turney - 2002 |

18 |
Pattern Recognition: Ideas in Practice
- Batchelor
- 1978
(Show Context)
Citation Context ... in the introduction, there are many learning systems that depend upon a good distance function to be successful. A variety of distance functions are available for such uses, including the Minkowsky (=-=Batchelor, 1978-=-), Mahalanobis (Nadler & Smith, 1993), Camberra, Chebychev, Quadratic, Correlation, and Chi-square distance metrics (Michalski, Stepp & Diday, 1981; Diday, 1974); the Context-Similarity measure (Biber... |

17 | An e cient metric for heterogeneous inductive learning applications in the attribute-value language - Giraud-Carrier, Martinez - 1994 |