Results 1 -
8 of
8
Multicategory Classification by Support Vector Machines
- Computational Optimizations and Applications
, 1999
"... We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programm ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
We examine the problem of how to discriminate between objects of three or more classes. Specifically, we investigate how two-class discrimination methods can be extended to the multiclass case. We show how the linear programming (LP) approaches based on the work of Mangasarian and quadratic programming (QP) approaches based on Vapnik's Support Vector Machines (SVM) can be combined to yield two new approaches to the multiclass problem. In LP multiclass discrimination, a single linear program is used to construct a piecewise linear classification function. In our proposed multiclass SVM method, a single quadratic program is used to construct a piecewise nonlinear classification function. Each piece of this function can take the form of a polynomial, radial basis function, or even a neural network. For the k > 2 class problems, the SVM method as originally proposed required the construction of a two-class SVM to separate each class from the remaining classes. Similarily, k two-class linear programs can be used for the multiclass problem. We performed an empirical study of the original LP method, the proposed k LP method, the proposed single QP method and the original k QP methods. We discuss the advantages and disadvantages of each approach. 1 1
Multicategory Discrimination via Linear Programming
- OPTIMIZATION METHODS AND SOFTWARE
, 1992
"... A single linear program is proposed for discriminating between the elements of k disjoint point sets in the n-dimensional real space R n : When the conical hulls of the k sets are (k \Gamma 1)-point disjoint in R n+1 , a k-piece piecewise-linear surface generated by the linear program completely ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
A single linear program is proposed for discriminating between the elements of k disjoint point sets in the n-dimensional real space R n : When the conical hulls of the k sets are (k \Gamma 1)-point disjoint in R n+1 , a k-piece piecewise-linear surface generated by the linear program completely separates the k sets. This improves on a previous linear programming approach which required that each set be linearly separable from the remaining k \Gamma 1 sets. When the conical hulls of the k sets are not (k \Gamma 1)-point disjoint, the proposed linear program generates an error-minimizing piecewise-linear separator for the k sets. For this case it is shown that the null solution is never a unique solver of the linear program and occurs only under the rather rare condition when the mean of each point set equals the mean of the means of the other k \Gamma 1 sets. This makes the proposed linear computational programming formulation useful for approximately discriminating between k sets...
Serial and Parallel Multicategory Discrimination
- SIAM Journal on Optimization
, 1994
"... A parallel algorithm is proposed for a fundamental problem of machine learning, that of multicategory discrimination. The algorithm is based on minimizing an error function associated with a set of highly structured linear inequalities. These inequalities characterize piecewiselinear separation of k ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
A parallel algorithm is proposed for a fundamental problem of machine learning, that of multicategory discrimination. The algorithm is based on minimizing an error function associated with a set of highly structured linear inequalities. These inequalities characterize piecewiselinear separation of k sets by the maximum of k affine functions. The error function has a Lipschitz continuous gradient that allows the use of fast serial and parallel unconstrained minimization algorithms. A serial quasi-Newton algorithm is considerably faster than previous linear programming formulations. A parallel gradient distribution algorithm is used to parallelize the error-minimization problem. Preliminary computational results are given for both a DECstation 5000/125 and a Thinking Machines Corporation CM-5 multiprocessor. 1 Introduction We consider a fundamental problem of machine learning and pattern recognition, that of discriminating between k sets. Given k disjoint sets, A i ; i = 1; : : : ; k;...
The Performance Of Statistical Pattern Recognition Methods In High Dimensional Settings
- IEEE Signal Processing Workshop on Higher Order Statistics. Ceasarea
, 1994
"... We report on an extensive simulation study comparing eight statistical classification methods, focusing on problems where the number of observations is less than the number of variables. Using a wide range of artificial and real data, two types of classifiers were contrasted; methods that classify u ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We report on an extensive simulation study comparing eight statistical classification methods, focusing on problems where the number of observations is less than the number of variables. Using a wide range of artificial and real data, two types of classifiers were contrasted; methods that classify using all variables, and methods that first reduce the number of dimensions to two or three. The full feature space methods include linear, quadratic and regularized discriminant analysis, and the nearest neighbour method. The four dimensionality reducing classifiers are characterized by the transform they implement. The four transforms compared are the Fisher discriminant plane, the Fisher-Fukunaga-Koonz, the Fisher-radius, and the Fisher-variance transforms. The FisherFukunaga and the Fisher-radius transform based classifiers have recently been proposed for two class classification problems. We also present an extension to these transforms such that they can be applied to classification pro...
Exploratory observation machine (XOM) with Kullback-Leibler divergence for dimensionality reduction and visualization
- European Symposium on Artificial Neural Networks (ESANN 2010
, 2010
"... Abstract. We present an extension of the Exploratory Observation Machine (XOM) for structure-preserving dimensionality reduction. Based on minimizing the Kullback-Leibler divergence of neighborhood functions in data and image spaces, this Neighbor Embedding XOM (NE-XOM) creates a link between fast s ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We present an extension of the Exploratory Observation Machine (XOM) for structure-preserving dimensionality reduction. Based on minimizing the Kullback-Leibler divergence of neighborhood functions in data and image spaces, this Neighbor Embedding XOM (NE-XOM) creates a link between fast sequential online learning known from topologypreserving mappings and principled direct divergence optimization approaches. We quantitatively evaluate our method on real world data using multiple embedding quality measures. In this comparison, NE-XOM performs as a competitive trade-off between high embedding quality and low computational expense, which motivates its further use in real-world settings throughout science and engineering. 1
From Approximative to Descriptive Models: A Realistic Case Study
, 2000
"... This paper presents the results of an application study on an effective and efficient technique which translates rules that use approximative sets to rules that use descriptive sets and linguistic hedges of predefined meaning. The translated descriptive rules are established to be functionally equiv ..."
Abstract
- Add to MetaCart
This paper presents the results of an application study on an effective and efficient technique which translates rules that use approximative sets to rules that use descriptive sets and linguistic hedges of predefined meaning. The translated descriptive rules are established to be functionally equivalent to the original approximative ones, or the closest equivalence possible, while reecting their underlying semantics. It is shown that descriptive models can be obtained via taking the advantages of existing approaches to approximative modelling that are efficient and accurate.
MKNN: Modified K-Nearest Neighbor
"... Abstract — In this paper, a new classification method for enhancing the performance of K-Nearest Neighbor is proposed which uses robust neighbors in training data. This new classification method is called Modified K-Nearest Neighbor, MKNN. Inspired the traditional KNN algorithm, the main idea is cla ..."
Abstract
- Add to MetaCart
Abstract — In this paper, a new classification method for enhancing the performance of K-Nearest Neighbor is proposed which uses robust neighbors in training data. This new classification method is called Modified K-Nearest Neighbor, MKNN. Inspired the traditional KNN algorithm, the main idea is classifying the test samples according to their neighbor tags. This method is a kind of weighted KNN so that these weights are determined using a different procedure. The procedure computes the fraction of the same labeled neighbors to the total number of neighbors. The proposed method is evaluated on five different data sets. Experiments show the excellent improvement in accuracy in comparison with KNN method.
Adaptive Local Dissimilarity Measures for Discriminative Dimension Reduction of Labeled Data
"... Due to the tremendous increase of electronic information with respect to the size of data sets as well as their dimension, dimension reduction and visualization of high-dimensional data has become one of the key problems of data mining. Since embedding in lower dimensions necessarily includes a loss ..."
Abstract
- Add to MetaCart
Due to the tremendous increase of electronic information with respect to the size of data sets as well as their dimension, dimension reduction and visualization of high-dimensional data has become one of the key problems of data mining. Since embedding in lower dimensions necessarily includes a loss of information, methods to explicitly control the information kept by a specific dimension reduction technique are highly desirable. The incorporation of supervised class information constitutes an important specific case. The aim is to preserve and potentially enhance the discrimination of classes in lower dimensions. In this contribution we use an extension of prototype-based local distance learning, which results in a nonlinear discriminative dissimilarity measure for a given labeled data manifold. The learned local distance measure can be used as basis for other unsupervised dimension reduction techniques, which take into account neighborhood information. We show the combination of different dimension reduction techniques with a discriminative similarity measure learned by an extension of Learning Vector Quantization (LVQ) and their behavior with different parameter settings. The methods are introduced and discussed in terms of artificial and real world data sets.

