Results 1 - 10
of
31
Bagging Predictors
- Machine Learning
, 1996
"... Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making ..."
Abstract
-
Cited by 1998 (1 self)
- Add to MetaCart
Bagging predictors is a method for generating multiple versions of a predictor and using these to get an aggregated predictor. The aggregation averages over the versions when predicting a numerical outcome and does a plurality vote when predicting a class. The multiple versions are formed by making bootstrap replicates of the learning set and using these as new learning sets. Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy. The vital element is the instability of the prediction method. If perturbing the learning set can cause significant changes in the predictor constructed, then bagging can improve accuracy. 1. Introduction A learning set of L consists of data f(y n ; x n ), n = 1; : : : ; Ng where the y's are either class labels or a numerical response. We have a procedure for using this learning set to form a predictor '(x; L) --- if the input is x we ...
Feature Selection via Mathematical Programming
, 1997
"... The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible, is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in th ..."
Abstract
-
Cited by 51 (22 self)
- Add to MetaCart
The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible, is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints (LPEC). Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage (OBD) method for reducing neural network complexity. One feature selection algorithm via concave minimization (FSV) reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4. Feature selection is an important problem in machine learning [18, 15, 1...
SLAVE: A genetic learning system based on an iterative approach
- IEEE Transactions on Fuzzy Systems
, 1999
"... SLAVE (Structural Learning Algorithm in Vague Environment) is an inductive learning algorithm that uses concepts based on fuzzy logic theory. This theory has shown be an useful representation tool for improving the understanding under a human point of view, of the knowledge obtained. Furthermore, SL ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
SLAVE (Structural Learning Algorithm in Vague Environment) is an inductive learning algorithm that uses concepts based on fuzzy logic theory. This theory has shown be an useful representation tool for improving the understanding under a human point of view, of the knowledge obtained. Furthermore, SLAVE uses an iterative approach for learning with genetic algorithms. This method is an alternative approach from the classical Pittsburgh and Michigan approaches. In this work, we propose some modifications of the original SLAVE learning algorithm, including new genetic operators for reducing the time needed for learning and improving the understanding of the rules obtained. Furthermore, we propose a new way for penalizing the rules in the iterative approach that permits to improve the behaviour of the system. Keywords: machine learning, fuzzy logic, genetic algorithms. 1 Introduction Inductive learning tries to extract a knowledge base that permits to describe the behaviour of a system fro...
The Separability Of Split Value Criterion
- In Proceedings of the 5th Conference on Neural Networks and Their Applications
, 2000
"... The Separability of Split Value (SSV) criterion is a simple and efficient tool for building classification trees and extraction of logical rules. It deals with both continuous and discrete features describing data vectors and requires no user interaction in the learning process. Extensions of method ..."
Abstract
-
Cited by 24 (12 self)
- Add to MetaCart
The Separability of Split Value (SSV) criterion is a simple and efficient tool for building classification trees and extraction of logical rules. It deals with both continuous and discrete features describing data vectors and requires no user interaction in the learning process. Extensions of methods based on this criterion are presented. They aim at improvement of reliability and efficiency of the methods and extension of their applications area. Good results for several benchmark datasets were obtained.
A visualization technique for Self-Organizing Maps with vector fields to obtain the cluster structure at desired levels of detail
"... Self-Organizing Maps (SOMs) are a prominent tool for exploratory data analysis. One core task within the utilization of SOMs is the identification of the cluster structure on the map for which several visualization methods have been proposed, yet different application domains may require additional ..."
Abstract
-
Cited by 19 (11 self)
- Add to MetaCart
Self-Organizing Maps (SOMs) are a prominent tool for exploratory data analysis. One core task within the utilization of SOMs is the identification of the cluster structure on the map for which several visualization methods have been proposed, yet different application domains may require additional representation of the cluster structure. In this paper, we propose such a method based on pairwise distance calculation. It can be plotted on top of the map lattice with arrows that point to the closest cluster center. A parameter is provided that determines the granularity of the clustering. We provide experimental results and discuss the general applicability of our method, along with a comparison to related techniques.
Bayesian Neural Networks for Classification: How Useful is the Evidence Framework?
, 1998
"... This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four real-world classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
This paper presents an empirical assessment of the Bayesian evidence framework for neural networks using four synthetic and four real-world classification problems. We focus on three issues; model selection, automatic relevance determination (ARD) and the use of committees. Model selection using the evidence criterion is only tenable if the number of training examples exceeds the number of network weights by a factor of five or ten. With this number of available examples, however, cross-validation is a viable alternative. The ARD feature selection scheme is only useful in networks with many hidden units and for data sets containing many irrelevant variables. ARD is also useful as a hard feature selection method. Results on applying the evidence framework to the real-world data sets showed that committees of Bayesian networks achieved classification accuracies similar to the best alternative methods. Importantly, this was achievable with a minimum of human intervention. 1 Introduction ...
A k-nearest neighbor classification rule based on dempster-shafer theory
- IEEE TRANS. ON SYSTEMS, MAN AND CYBERNETICS
, 1995
"... In this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of Dempster-Shafer theory. Each neighbor of a sample to be classified is considered as an item of evidence that supports certain hypotheses re ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
In this paper, the problem of classifying an unseen pattern on the basis of its nearest neighbors in a recorded data set is addressed from the point of view of Dempster-Shafer theory. Each neighbor of a sample to be classified is considered as an item of evidence that supports certain hypotheses regarding the class membership of that pattern. The degree of support is defined as a function of the distance between the two vectors. The evidence of the k nearest neighbors is then pooled by means of Dempster’s rule of combination. This approach provides a global treatment of such issues as ambiguity and distance rejection, and imperfect knowledge regarding the class membership of training patterns. The effectiveness of this classification scheme as compared to the voting and distance-weighted k-NN procedures is demonstrated using several sets of simulated and real-world data.
An Empirical Evaluation of Bayesian Sampling with Hybrid Monte Carlo for Training Neural Network Classifiers
- Neural Networks
, 1998
"... This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hy ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
This article gives a concise overview of Bayesian sampling for neural networks, and then presents an extensive evaluation on a set of various benchmark classification problems. The main objective is to study the sensitivity of this scheme to changes in the prior distribution of the parameters and hyperparameters, and to evaluate the efficiency of the so-called automatic relevance determination (ARD) method. The paper concludes with a comparison of the achieved classification results with those obtained with (i) the evidence scheme and (ii) with non-Bayesian methods. Keywords Bayesian statistics, prior and posterior distribution, parameters and hyperparameters, Gibbs sampling, hybrid Monte Carlo, automatic relevance determination (ARD), evidence approximation, classification problems, benchmarking. 1 Theory: Sampling of network weights and hyperparameters from the posterior distribution The objective of this section is to give a concise yet self-contained overview of the Bayesian app...
Evolving Heterogeneous Neural Agents by Local Selection
, 2000
"... Evolutionary algorithms have been appied to the synthesis of neural architectures... ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Evolutionary algorithms have been appied to the synthesis of neural architectures...
Boosting Interval Based Literals
, 2001
"... A supervised classification method for time series, even multivariate, is presented. It is based on boosting very simple classifiers: clauses with one literal in the body. The background predicates are based on temporal intervals. Two types of predicates are used: i) relative predicates, such as "in ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
A supervised classification method for time series, even multivariate, is presented. It is based on boosting very simple classifiers: clauses with one literal in the body. The background predicates are based on temporal intervals. Two types of predicates are used: i) relative predicates, such as "increases" and "stays", and ii) region predicates, such as "always" and "sometime", which operate over regions in the domain of the variable. Experiments on different data sets, several of them obtained from the UCI ML and KDD repositories, show that the proposed method is highly competitive with previous approaches.

