Results 1 - 10
of
70
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
- Data Mining and Knowledge Discovery
, 1997
"... Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial ne ..."
Abstract
-
Cited by 122 (1 self)
- Add to MetaCart
Decision trees have proved to be valuable tools for the description, classification and generalization of data. Work on constructing decision trees from data exists in multiple disciplines such as statistics, pattern recognition, decision theory, signal processing, machine learning and artificial neural networks. Researchers in these disciplines, sometimes working on quite different problems, identified similar issues and heuristics for decision tree construction. This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art. Keywords: classification, tree-structured classifiers, data compaction 1. Introduction Advances in data collection methods, storage and processing technology are providing a unique challenge and opportunity for automated data exploration techniques. Enormous amounts of data are being collected daily from major scientific projects e.g., Human Genome...
The Lack of A Priori Distinctions Between Learning Algorithms
, 1996
"... This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in ..."
Abstract
-
Cited by 103 (5 self)
- Add to MetaCart
This is the first of two papers that use off-training set (OTS) error to investigate the assumption -free relationship between learning algorithms. This first paper discusses the senses in which there are no a priori distinctions between learning algorithms. (The second paper discusses the senses in which there are such distinctions.) In this first paper it is shown, loosely speaking, that for any two algorithms A and B, there are "as many" targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the learning algorithm with largest cross-validation error). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the Vapnik-Chervonenkis dimension of your generalizer is small; and the trainin...
Supervised Classification in High Dimensional Space: Geometrical, Statistical, and Asymptotical Properties of Multivariate Data
- IEEE Transactions on System, Man, and Cybernetics, Volume 28, Part C
, 1998
"... © 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other w ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
© 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This paper appeared in the IEEE Transactions on Systems, Man, and
Towards a Better Understanding of Memory-Based Reasoning Systems
- In Proceedings of the Eleventh International Machine Learning Conference
, 1994
"... We quantify both experimentally and analytically the performance of memorybased reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR algorithm using a value difference metric to a popular Bayesian classifier. These two approaches are similar ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
We quantify both experimentally and analytically the performance of memorybased reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR algorithm using a value difference metric to a popular Bayesian classifier. These two approaches are similar in that they both make certain independence assumptions about the data. However, whereas MBR uses specific cases to perform classification, Bayesian methods summarize the data probabilistically. We demonstrate that a particular MBR system called Pebls works comparatively well on a wide range of domains using both real and artificial data. With respect to the artificial data, we consider distributions where the concept classes are separated by functional discriminants, as well as time-series data generated by Markov models of varying complexity. Finally, we show formally that Pebls can learn (in the limit) natural concept classes that the Bayesian classifier cannot learn, and that it will at...
A surface-based approach for classification of 3d neuroanatomical structures
- INTELLIGENT DATA ANALYSIS
, 2004
"... We present a new framework for 3D surface object classification that combines a powerful shape description method with suitable pattern classification techniques. Spherical harmonic parameterization and normalization techniques are used to describe a surface shape and derive a dual high dimensional ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
We present a new framework for 3D surface object classification that combines a powerful shape description method with suitable pattern classification techniques. Spherical harmonic parameterization and normalization techniques are used to describe a surface shape and derive a dual high dimensional landmark representation. A point distribution model is applied to reduce the dimensionality. Fisher’s linear discriminants and support vector machines are used for classification. Several feature selection schemes are proposed for learning better classifiers. After showing the effectiveness of this framework using simulated shape data, we apply it to real hippocampal data in schizophrenia and perform extensive experimental studies by examining different combinations of techniques. We achieve best leave-one-out cross-validation accuracies of 93 % (whole set, N = 56) and 90 % (right-handed males, N = 39), respectively, which are competitive with the best results in previous studies using different techniques on similar types of data. Furthermore, to help medical diagnosis in practice, we employ a threshold-free receiver operating characteristic (ROC) approach as an alternative evaluation of classification results as well as propose a new method for visualizing discriminative patterns.
Kernel-based methods for hyperspectral image classification
- IEEE Transactions on Geoscience and Remote Sensing
, 2005
"... Abstract—This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their properties in the hyperspectral domain. In particular, we a ..."
Abstract
-
Cited by 21 (5 self)
- Add to MetaCart
Abstract—This paper presents the framework of kernel-based methods in the context of hyperspectral image classification, illustrating from a general viewpoint the main characteristics of different kernel-based approaches and analyzing their properties in the hyperspectral domain. In particular, we assess performance of regularized radial basis function neural networks (Reg-RBFNN), standard support vector machines (SVMs), kernel Fisher discriminant (KFD) analysis, and regularized AdaBoost (Reg-AB). The novelty of this work consists in: 1) introducing Reg-RBFNN and Reg-AB for hyperspectral image classification; 2) comparing kernel-based methods by taking into account the peculiarities of hyperspectral images; and 3) clarifying their theoretical relationships. To these purposes, we focus on the accuracy of methods when working in noisy environments, high input dimension, and limited training sets. In addition, some other important issues are discussed, such as the sparsity of the solutions, the computational burden, and the capability of the methods to provide outputs that can be directly interpreted as probabilities. Index Terms—AdaBoost, feature space, hyperspectral classification, kernel-based methods, kernel Fisher discriminant analysis, radial basis function neural networks, regularization, support vector machines. I.
Support Vector Machine Classifiers as Applied to AVIRIS Data
, 1999
"... INTRODUCTION The Support Vector Machine #SVM# is a relatively recent approachintroduced by Boser, Guyon, and Vapnik #Boser et al., 1992#, #Vapnik, 1995# for solving supervised classi#cation and regression problems, or more colloquially learning from examples. In the following we will discuss only c ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
INTRODUCTION The Support Vector Machine #SVM# is a relatively recent approachintroduced by Boser, Guyon, and Vapnik #Boser et al., 1992#, #Vapnik, 1995# for solving supervised classi#cation and regression problems, or more colloquially learning from examples. In the following we will discuss only classi#cation and its application to hyperspectral data from AVIRIS. Traditionally, classi#ers model the underlying densityofthevarious classes and then #nd a separating surface. However density estimation in high-dimensional spaces su#ers from the Hughes e#ect #Hughes, 1968#, #Landgrebe, 1999#: For a #xed amount of training data the classi#cation accuracy as a function of number of bands reaches a maximum and then declines, because there is limited amount of training data to estimate the large number of parameters needed. Thus usually, a feature selection step is #rst performed on the high-dimensional data to reduce its dimensionality. As we will demon
Off-Training Set Error And a Priori Distinctions Between . . .
, 1995
"... This paper uses off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. It is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower expected OTS error than B as vice-ver ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper uses off-training set (OTS) error to investigate the assumption-free relationship between learning algorithms. It is shown, loosely speaking, that for any two algorithms A and B, there are as many targets (or priors over targets) for which A has lower expected OTS error than B as vice-versa, for loss functions like zero-one loss. In particular, this is true if A is cross-validation and B is "anti-cross-validation" (choose the generalizer with largest cross-validation error). On the other hand, for loss functions other than zero-one (e.g., quadratic loss), there are a priori distinctions between algorithms. However even for such loss functions, any algorithm is equivalent on average to its "randomized" version, and in this still has no first principles justification in terms of average error. Nonetheless, it may be that (for example) cross-validation has better minimax properties than anti-cross-validation, even for zero-one loss. This paper also analyzes averages over hypotheses rather than targets. Such analyses hold for all possible priors. Accordingly they prove, as a particular example, that cross-validation can not be justified as a Bayesian procedure. In fact, for a very natural restriction of the class of learning algorithms, one should use anti-crossvalidation rather than cross-validation (!). This paper ends with a discussion of the implications of these results for computational learning theory. It is shown that one can not say: if empirical misclassification rate is low; the VC dimension of your generalizer is small; and the training set is large, then with high probability your OTS error is small. Other implications for "membership queries " algorithms and "punting" algorithms are also discussed.
Multimodal oriented discriminant analysis
, 2005
"... Linear discriminant analysis (LDA) has been an active topic of research during the last century. However, the existing algorithms have several limitations when applied to visual data. LDA is only optimal for Gaussian distributed classes with equal covariance matrices, and only classes-1 features can ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Linear discriminant analysis (LDA) has been an active topic of research during the last century. However, the existing algorithms have several limitations when applied to visual data. LDA is only optimal for Gaussian distributed classes with equal covariance matrices, and only classes-1 features can be extracted. On the other hand, LDA does not scale well to high dimensional data (overfitting), and it cannot handle optimally multimodal distributions. In this paper, we introduce Multimodal Oriented Discriminant Analysis (MODA), a LDA extension which can overcome these drawbacks. A new formulation

