Results 1 - 10
of
61
Improved Heterogeneous Distance Functions
- Journal of Artificial Intelligence Research
, 1997
"... Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores cont ..."
Abstract
-
Cited by 173 (9 self)
- Add to MetaCart
Instance-based learning techniques typically handle continuous and linear input values well, but often do not handle nominal input attributes appropriately. The Value Difference Metric (VDM) was designed to find reasonable distance values between nominal attribute values, but it largely ignores continuous attributes, requiring discretization to map continuous values into nominal values. This paper proposes three new heterogeneous distance functions, called the Heterogeneous Value Difference Metric (HVDM), the Interpolated Value Difference Metric (IVDM), and the Windowed Value Difference Metric (WVDM). These new distance functions are designed to handle applications with nominal attributes, continuous attributes, or both. In experiments on 48 applications the new distance metrics achieve higher classification accuracy on average than three previous distance functions on those datasets that have both nominal and continuous attributes. 1. Introduction Instance-Based Learning (IBL) (Aha, ...
A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms
- ARTIFICIAL INTELLIGENCE REVIEW
, 1997
"... Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN v ..."
Abstract
-
Cited by 94 (0 self)
- Add to MetaCart
Many lazy learning algorithms are derivatives of the k-nearest neighbor (k-NN) classifier, which uses a distance function to generate predictions from stored instances. Several studies have shown that k-NN's performance is highly sensitive to the definition of its distance function. Many k-NN variants have been proposed to reduce this sensitivity by parameterizing the distance function with feature weights. However, these variants have not been categorized nor empirically compared. This paper reviews a class of weight-setting methods for lazy learning algorithms. We introduce a framework for distinguishing these methods and empirically compare them. We observed four trends from our experiments and conducted further studies to highlight them. Our results suggest that methods which use performance feedback to assign weight settings demonstrated three advantages over other methods: they require less pre-processing, perform better in the presence of interacting features, and generally require less training data to learn good settings. We also found that continuous weighting methods tend to outperform feature selection algorithms for tasks where some features are useful but less important than others.
Reduction Techniques for Instance-Based Learning Algorithms
- Machine Learning
, 2000
"... . Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main p ..."
Abstract
-
Cited by 93 (2 self)
- Add to MetaCart
. Instance-based learning algorithms are often faced with the problem of deciding which instances to store for use during generalization. Storing too many instances can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce storage requirements in instance-based learning algorithms and other exemplar-based algorithms. Second, it proposes six additional reduction algorithms called DROP1--DROP5 and DEL (three of which were first described in Wilson & Martinez, 1997c, as RT1--RT3) that can be used to remove instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 classification tasks. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest average generalization accuracy in these experiments, especially in the presence of uniform class noise. ...
Stacking classifiers for anti-spam filtering of e-mail
- Carnegie Mellon University
, 2001
"... We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposi ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the efficiency of automatically induced anti-spam filters, and that such filters can be used in reallife applications.
Feature Weighting for Lazy Learning Algorithms
, 1998
"... : Learning algorithms differ in the degree to which they process their inputs prior to their use in performance tasks. Many algorithms eagerly compile input samples and use only the compilations to make decisions. Others are lazy: they perform less precompilation and use the input samples to guide ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
: Learning algorithms differ in the degree to which they process their inputs prior to their use in performance tasks. Many algorithms eagerly compile input samples and use only the compilations to make decisions. Others are lazy: they perform less precompilation and use the input samples to guide decision making. The performance of many lazy learners significantly degrades when samples are defined by features containing little or misleading information. Distinguishing feature relevance is a critical issue for these algorithms, and many solutions have been developed that assign weights to features. This chapter introduces a categorization framework for feature weighting approaches used in lazy similarity learners and briefly surveys some examples in each category. 1.1 INTRODUCTION Lazy learning algorithms are machine learning algorithms (Mitchell, 1997) that are welcome members of procrastinators anonymous. Purely lazy learners typically display the following characteristics (Aha, 19...
K-Nearest Neighbor Classification on Spatial Data Streams Using P-Trees
, 2002
"... Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data ar ..."
Abstract
-
Cited by 28 (11 self)
- Add to MetaCart
Classification of spatial data has become important due to the fact that there are huge volumes of spatial data now available holding a wealth of valuable information. In this paper we consider the classification of spatial data streams, where the training dataset changes often. New training data arrive continuously and are added to the training set. For these types of data streams, building a new classifier each time can be very costly with most techniques. In this situation, k-nearest neighbor (KNN) classification is a very good choice, since no residual classifier needs to be built ahead of time. For that reason KNN is called a lazy classifier. KNN is extremely simple to implement and lends itself to a wide variety of variations. The traditional k-nearest neighbor classifier finds the k nearest neighbors based on some distance metric by finding the distance of the target data point from the training dataset, then finding the class from those nearest neighbors by some voting mechanism. There is a problem associated with KNN classifiers. They increase the classification time significantly relative to other non-lazy methods. To overcome this problem, in this paper we propose a new method of KNN classification for spatial data streams using a new, rich, data-mining-ready structure, the Peano-count-tree or Ptree. In our method, we merely perform some logical AND/OR operations on P-trees to find the nearest neighbor set of a new sample and assign the class label. We have fast and efficient algorithms for AND/OR operations on P-trees, which reduce the classification time significantly, compared with traditional KNN classifiers. Instead of taking exactly the k nearest neighbors we form a closed-KNN set. Our experimental results show closed-KNN yields higher classification ac...
Memory-Based Learning: Using Similarity for Smoothing
, 1997
"... This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
This paper analyses the relation between the use of similarity in Memory-Based Learning and the notion of backed-off smoothing in statistical language modeling. We show that the two approaches are closely related, and we argue that feature weighting methods in the Memory-Based paradigm can offer the advantage of automatically specifying a suitable domain-specific hierarchy between most specific and most general conditioning information without the need for a large number of parameters. We report two applications of this approach: PP-attachment and POS-tagging. Our method achieves state-of-the-art performance in both domains, and allows the easy integration of diverse information sources, such as rich lexical representations.
An evidence-theoretic k-NN rule with parameter optimization
, 1998
"... This paper presents a learning procedure for optimizing the parameters in the evidencetheoretic k-nearest neighbor rule, a pattern classification method based on the Dempster-Shafer theory of belief functions. In this approach, each neighbor of a pattern to be classified is considered as an item of ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
This paper presents a learning procedure for optimizing the parameters in the evidencetheoretic k-nearest neighbor rule, a pattern classification method based on the Dempster-Shafer theory of belief functions. In this approach, each neighbor of a pattern to be classified is considered as an item of evidence supporting certain hypotheses concerning the class membership of that pattern. Based on this evidence, basic belief masses are assigned to each subset of the set of classes. Such masses are obtained for each of the k nearest neighbors of the pattern under consideration and aggregated using the Dempster's rule of combination. In many situations, this method was found experimentally to yield lower error rates than other methods using the same information. However, the problem of tuning the parameters of the classification rule was so far unresolved. In this paper, we propose to determine optimal or near-optimal parameter values from the data by minimizing an error function. This ref...
Reduction Techniques for Exemplar-Based Learning Algorithms
- MACHINE LEARNING
, 2000
"... Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This pap ..."
Abstract
-
Cited by 19 (2 self)
- Add to MetaCart
Exemplar-based learning algorithms are often faced with the problem of deciding which instances or other exemplars to store for use during generalization. Storing too many exemplars can result in large memory requirements and slow execution speed, and can cause an oversensitivity to noise. This paper has two main purposes. First, it provides a survey of existing algorithms used to reduce the number of exemplars retained in exemplar-based learning models. Second, it proposes six new reduction algorithms called DROP1-5 and DEL that can be used to prune instances from the concept description. These algorithms and 10 algorithms from the survey are compared on 31 datasets. Of those algorithms that provide substantial storage reduction, the DROP algorithms have the highest generalization accuracy in these experiments, especially in the presence of noise.

