Results 1  10
of
11
Hit Miss Networks with Applications to Instance Selection
"... In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
(Show Context)
In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new networkbased representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation between each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the one nearest neighbor (1NN) decision rule, such as being border or central point. This motivates us to use HMN’s for improving the performance of a 1NN classifier by removing instances from the training set (instance selection). We introduce three new algorithms based on HMN for instance selection. HMNC, which removes instances without affecting accuracy of 1NN on the training set, HMNE, which removes more instances than HMNC, and HMNEI, which applies iteratively HMNE. Their performance is assessed on 22 artificial and real life datasets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise containt, and presence of redundant variables. Results of experiments on these datasets show that accuracy of 1NN classifier increases significantly when HMNEI is applied. Comparison with stateoftheart editing algorithms for instance selection on these datasets indicates best generalization performance of HMNEI and no significant difference in storage requirements. In general, these results seem to show that HMN’s provide a powerful graphbased representation of a training set, which can be successfully applied for performing noise and redundance reduction in instancebased learning. Keywords: Graphbased training set representation, nearest neighbor, instance selection for instancebased learning. 1
Adaptive local linear regression with application to printer color management
 IEEE Trans. on Image Processing
"... Abstract—Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global “optimal ” value, chosen by cross validation. This paper proposes adapt ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global “optimal ” value, chosen by cross validation. This paper proposes adapting the number of neighbors used for estimation to the local geometry of the data, without need for cross validation. The term enclosing neighborhood is introduced to describe a set of neighbors whose convex hull contains the test point when possible. It is proven that enclosing neighborhoods yield bounded estimation variance under some assumptions. Three such enclosing neighborhood definitions are presented: natural neighbors, natural neighbors inclusive, and enclosing kNN. The effectiveness of these neighborhood definitions with local linear regression is tested for estimating lookup tables for color management. Significant improvements in
Geometrybased ensembles: Towards a structural characterization of the classification boundary
 IEEE Transactions on Pattern Analysis and Machine Intelligence
"... Abstract—This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points—points that belong to ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—This paper introduces a novel binary discriminative learning technique based on the approximation of the nonlinear decision boundary by a piecewise linear smooth additive model. The decision border is geometrically defined by means of the characterizing boundary points—points that belong to the optimal boundary under a certain notion of robustness. Based on these points, a set of locally robust linear classifiers is defined and assembled by means of a Tikhonov regularized optimization procedure in an additive model to create a finalsmooth decision rule. As a result, a very simple and robust classifier with a strong geometrical meaning and nonlinear behavior is obtained. The simplicity of the method allows its extension to cope with some of today’s machine learning challenges, such as online learning, largescale learning or parallelization, with linear computational complexity. We validate our approach on the UCI database, comparing with several stateoftheart classification techniques. Finally, we apply our technique in online and largescale scenarios and in six reallife computer vision and pattern recognition problems: gender recognition based on face images, intravascular ultrasound tissue classification, speed traffic sign detection, Chagas ’ disease myocardial damage severity detection, old musical scores clef classification, and action recognition using 3D accelerometer data from a wearable device. The results are promising and this paper opens a line of research that deserves further attention. Index Terms—Classification, ensemble of classifiers, Gabriel neighboring rule, visual object recognition. 1
Pruning Classification Rules with Reference Vector Selection Methods
"... Abstract. Attempts to extract logical rules from data often lead to large sets of classification rules that need to be pruned. Training two classifiers, the C4.5 decision tree and the NonNested Generalized Exemplars (NNGE) covering algorithm, on datasets that have been reduced earlier with the EkP ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Attempts to extract logical rules from data often lead to large sets of classification rules that need to be pruned. Training two classifiers, the C4.5 decision tree and the NonNested Generalized Exemplars (NNGE) covering algorithm, on datasets that have been reduced earlier with the EkP instance compressor leads to statistically significantly lower number of derived rules with nonsignificant degradation of results. Similar results have been observed with other popular instance filters used for data pruning. Numerical experiments presented here illustrate that it is possible to extract more interesting and simpler sets of rules from filtered datasets. This enables a better understanding of knowledge structures when data is explored using algorithms that tend to induce a large number of classification rules. 1
Mach Learn DOI 10.1007/s1099401051702 Bayesian instance selection for the nearest neighbor rule
"... Abstract The nearest neighbors rules are commonly used in pattern recognition and statistics. The performance of these methods relies on three crucial choices: a distance metric, a set of prototypes and a classification scheme. In this paper, we focus on the second, challenging issue: instance sel ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract The nearest neighbors rules are commonly used in pattern recognition and statistics. The performance of these methods relies on three crucial choices: a distance metric, a set of prototypes and a classification scheme. In this paper, we focus on the second, challenging issue: instance selection. We apply a maximum a posteriori criterion to the evaluation of sets of instances and we propose a new optimization algorithm. This gives birth to Eva, a new instance selection method. We benchmark this method on real datasets and perform a multicriteria analysis: we evaluate the compression rate, the predictive accuracy, the reliability and the computational time. We also carry out experiments on synthetic datasets in order to discriminate the respective contributions of the criterion and the algorithm, and to illustrate the advantages of Eva over the stateoftheart algorithms. The study shows that Eva outputs smaller and more reliable sets of instances, in a competitive time, while preserving the predictive accuracy of the related classifier.
ProximityGraph InstanceBased Learning, Support Vector Machines, and High Dimensionality: An Empirical Comparison
"... Abstract. Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instancebased learning are among the best machine learning algorithms for pattern classification applications. However, as the dimensionality of the data grows large, all data points in the trai ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instancebased learning are among the best machine learning algorithms for pattern classification applications. However, as the dimensionality of the data grows large, all data points in the training set tend to become Gabriel neighbors of each other, bringing the efficacy of this method into question. Indeed, it has been conjectured that for highdimensional data, proximity graph methods that use sparser graphs, such as relative neighbor graphs (RNG) and minimum spanning trees (MST) would have to be employed in order to maintain their privileged status. Here the performance of proximity graph methods, in instancebased learning, that employ Gabriel graphs, relative neighborhood graphs, and minimum spanning trees, are compared experimentally on highdimensional data sets. These methods are also compared empirically against the traditional kNN rule and support vector machines (SVMs), the leading competitors of proximity graph methods.
Hit Miss Networks
"... In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy ..."
Abstract
 Add to MetaCart
(Show Context)
In supervised learning, a training set consisting of labeled instances is used by a learning algorithm for generating a model (classifier) that is subsequently employed for deciding the class label of new instances (for generalization). Characteristics of the training set, such as presence of noisy instances and size, influence the learning algorithm and affect generalization performance. This paper introduces a new networkbased representation of a training set, called hit miss network (HMN), which provides a compact description of the nearest neighbor relation between each pair of classes. We show that structural properties of HMN’s correspond to properties of training points related to the nearest neighbor (1NN) decision rule, such as being border or central point. We use HMN to develop two new algorithms for improving the performance of a 1NN classifier by removing instances from the training set (instance selection). The two proposed algorithms are compared with one popular noise reduction algorithm and one stateoftheart instance selection algorithm. Their performance is assessed on 22 artificial and real life datasets with different characteristics, such as input dimension, cardinality, class balance, number of classes, noise containt, and presence of redundant variables. Results of experiments on these datasets indicate best performance of HMNE with significant improvement of both average accuracy and storage requirement over the other algorithms. In general, these results indicate that HMN provides a powerful graphbased representation of a training set, which can be successfully applied for noise and redundance reduction in instancebased learning. Keywords: Graphbased training set representation, nearest neighbor, instance selection for instancebased learning. 1