Results 1 -
4 of
4
Hit Miss Networks with Applications to Instance Selection
"... In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new network-based representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation between each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1-NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1-NN classifier by removing instances from the training set (instance selection). We introduce three new algorithms based on HMN for instance selection. HMN-C, which removes instances without affecting accuracy of 1-NN on the training set, HMN-E, which removes more instances than HMN-C, and HMN-EI, which applies iteratively HMN-E. Their performance is assessed on 22 artificial and real life datasets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise containt, and presence of redundant variables. Results of experiments on these datasets show that accuracy of 1-NN classifier increases significantly when HMN-EI is applied. Comparison with state-of-the-art editing algorithms for instance selection on these datasets indicates best generalization performance of HMN-EI and no significant difference in storage requirements. In general, these results seem to show that HMN’s provide a powerful graph-based representation of a training set, which can be successfully applied for performing noise and redundance reduction in instance-based learning. Keywords: Graph-based training set representation, nearest neighbor, instance selection for instance-based learning. 1
Pruning Classification Rules with Reference Vector Selection Methods
"... Abstract. Attempts to extract logical rules from data often lead to large sets of classification rules that need to be pruned. Training two classifiers, the C4.5 decision tree and the Non-Nested Generalized Exemplars (NNGE) covering algorithm, on datasets that have been reduced earlier with the EkP ..."
Abstract
- Add to MetaCart
Abstract. Attempts to extract logical rules from data often lead to large sets of classification rules that need to be pruned. Training two classifiers, the C4.5 decision tree and the Non-Nested Generalized Exemplars (NNGE) covering algorithm, on datasets that have been reduced earlier with the EkP instance compressor leads to statistically significantly lower number of derived rules with nonsignificant degradation of results. Similar results have been observed with other popular instance filters used for data pruning. Numerical experiments presented here illustrate that it is possible to extract more interesting and simpler sets of rules from filtered datasets. This enables a better understanding of knowledge structures when data is explored using algorithms that tend to induce a large number of classification rules. 1
Improving accuracy of LVQ algorithm by instance weighting.
"... Abstract. Similarity-based methods belong to the most accurate data mining approaches. A large group of such methods is based on instance selection and optimization, with Learning Vector Quantization (LVQ) algorithm being a prominent example. Accuracy of LVQ highly depends on proper initialization o ..."
Abstract
- Add to MetaCart
Abstract. Similarity-based methods belong to the most accurate data mining approaches. A large group of such methods is based on instance selection and optimization, with Learning Vector Quantization (LVQ) algorithm being a prominent example. Accuracy of LVQ highly depends on proper initialization of prototypes and the optimization mechanism. Prototype initialization based on context dependent clustering is introduced, and modification of the LVQ cost function that utilizes additional information about class-dependent distribution of training vectors. The new method is illustrated on 6 benchmark datasets, finding simple and accurate models of data in form of prototype-based rules. 1

