Results 1  10
of
42
Bagging Predictors
 Machine Learning
, 1996
"... Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making ..."
Abstract

Cited by 2479 (1 self)
 Add to MetaCart
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. 1. Introduction A learning set of L consists of data f(y n ; x n ), n = 1; : : : ; Ng where the y's are either class labels or a numerical response. We have a procedure for using this learning set to form a predictor '(x; L)  if the input is x we ...
Feature Selection via Mathematical Programming
, 1997
"... The problem of discriminating between two finite point sets in ndimensional feature space by a separating plane that utilizes as few of the features as possible, is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in th ..."
Abstract

Cited by 59 (22 self)
 Add to MetaCart
The problem of discriminating between two finite point sets in ndimensional feature space by a separating plane that utilizes as few of the features as possible, is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints (LPEC). Computational tests of these three approaches on publicly available realworld databases have been carried out and compared with an adaptation of the optimal brain damage (OBD) method for reducing neural network complexity. One feature selection algorithm via concave minimization (FSV) reduced crossvalidation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4. Feature selection is an important problem in machine learning [18, 15, 1...
SLAVE: A genetic learning system based on an iterative approach
 IEEE TRANSACTIONS ON FUZZY SYSTEMS
, 1999
"... SLAVE is an inductive learning algorithm that uses concepts based on fuzzy logic theory. This theory has been shown to be a useful representational tool for improving the understanding of the knowledge obtained from a human point of view. Furthermore, SLAVE uses an iterative approach for learning ba ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
SLAVE is an inductive learning algorithm that uses concepts based on fuzzy logic theory. This theory has been shown to be a useful representational tool for improving the understanding of the knowledge obtained from a human point of view. Furthermore, SLAVE uses an iterative approach for learning based on the use of a genetic algorithm (GA) as a search algorithm. In this paper, we propose a modification of the initial iterative approach used in SLAVE. The main idea is to include more information in the process of learning one individual rule. This information is included in the iterative approach through a different proposal of calculus of the positive and negative example to a rule. Furthermore, we propose the use of a new fitness function and additional genetic operators that reduce the time needed for learning and improve the understanding of the rules obtained.
A survey of kernel and spectral methods for clustering
, 2008
"... Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of ..."
Abstract

Cited by 45 (3 self)
 Add to MetaCart
Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., Kmeans, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel Kmeans clustering algorithm.
A visualization technique for SelfOrganizing Maps with vector fields to obtain the cluster structure at desired levels of detail
"... SelfOrganizing Maps (SOMs) are a prominent tool for exploratory data analysis. One core task within the utilization of SOMs is the identification of the cluster structure on the map for which several visualization methods have been proposed, yet different application domains may require additional ..."
Abstract

Cited by 25 (11 self)
 Add to MetaCart
SelfOrganizing Maps (SOMs) are a prominent tool for exploratory data analysis. One core task within the utilization of SOMs is the identification of the cluster structure on the map for which several visualization methods have been proposed, yet different application domains may require additional representation of the cluster structure. In this paper, we propose such a method based on pairwise distance calculation. It can be plotted on top of the map lattice with arrows that point to the closest cluster center. A parameter is provided that determines the granularity of the clustering. We provide experimental results and discuss the general applicability of our method, along with a comparison to related techniques.
The Separability Of Split Value Criterion
 In Proceedings of the 5th Conference on Neural Networks and Their Applications
, 2000
"... The Separability of Split Value (SSV) criterion is a simple and efficient tool for building classification trees and extraction of logical rules. It deals with both continuous and discrete features describing data vectors and requires no user interaction in the learning process. Extensions of method ..."
Abstract

Cited by 24 (10 self)
 Add to MetaCart
The Separability of Split Value (SSV) criterion is a simple and efficient tool for building classification trees and extraction of logical rules. It deals with both continuous and discrete features describing data vectors and requires no user interaction in the learning process. Extensions of methods based on this criterion are presented. They aim at improvement of reliability and efficiency of the methods and extension of their applications area. Good results for several benchmark datasets were obtained.
A knearest neighbor classification rule based on dempstershafer theory
 IEEE TRANS. ON SYSTEMS, MAN AND CYBERNETICS
, 1995
"... In this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of DempsterShafer theory. Each neighbor of a sample to be classified is considered as an item of evidence that supports certain hypotheses re ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
In this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of DempsterShafer theory. Each neighbor of a sample to be classified is considered as an item of evidence that supports certain hypotheses regarding the class membership of that pattern. The degree of support is defined as a function of the distance between the two vectors. The evidence of the k nearest neighbors is then pooled by means of Dempster’s rule of combination. This approach provides a global treatment of such issues as ambiguity and distance rejection, and imperfect knowledge regarding the class membership of training patterns. The effectiveness of this classification scheme as compared to the voting and distanceweighted kNN procedures is demonstrated using several sets of simulated and realworld data.
Bayesian Neural Networks for Classification: How Useful is the Evidence Framework?
, 1998
"... This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four realworld classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the evidence criterion is only tenable if the number of training examples exceeds the number of network weights by a factor of five or ten. With this number of available examples, however, crossvalidation is a viable alternative. The ARD feature selection scheme is only useful in networks with many hidden units and for data sets containing many irrelevant variables. ARD is also useful as a hard feature selection method. Results on applying the evidence framework to the realworld data sets showed that committees of Bayesian networks achieved classification accuracies similar to the best alternative methods. Importantly, this was achievable with a minimum of human intervention. 1 Introduction ...
Slice sampling covariance hyperparameters of latent Gaussian models
 IN ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 23
, 2010
"... The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations fo ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
The Gaussian process (GP) is a popular way to specify dependencies between random variables in a probabilistic model. In the Bayesian framework the covariance structure can be specified using unknown hyperparameters. Integrating over these hyperparameters considers different possible explanations for the data when making predictions. This integration is often performed using Markov chain Monte Carlo (MCMC) sampling. However, with nonGaussian observations standard hyperparameter sampling approaches require careful tuning and may converge slowly. In this paper we present a slice sampling approach that requires little tuning while mixing well in both strong and weakdata regimes.
An Empirical Evaluation of Bayesian Sampling with Hybrid Monte Carlo for Training Neural Network Classifiers
 Neural Networks
, 1998
"... This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hy ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hyperparameters, and to evaluate the efficiency of the socalled automatic relevance determination (ARD) method. The paper concludes with a comparison of the achieved classification results with those obtained with (i) the evidence scheme and (ii) with nonBayesian methods. Keywords Bayesian statistics, prior and posterior distribution, parameters and hyperparameters, Gibbs sampling, hybrid Monte Carlo, automatic relevance determination (ARD), evidence approximation, classification problems, benchmarking. 1 Theory: Sampling of network weights and hyperparameters from the posterior distribution The objective of this section is to give a concise yet selfcontained overview of the Bayesian app...