Results 1  10
of
19
Arcing Classifiers
, 1998
"... Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging (Breiman [1996a] ) Here, modified training sets are formed by resampling from the original training set, classifiers con ..."
Abstract

Cited by 337 (6 self)
 Add to MetaCart
Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging (Breiman [1996a] ) Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting. Freund and Schapire [1995,1996] propose an algorithm the basis of which is to adaptively resample and combine (hence the acronymarcing) so that the weights in the resampling are increased for those cases most often misclassified and the combining is done by weighted voting. Arcing is more successful than bagging in test set error reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. We introduce the definitions of bias and variance for a classifier as components of the test set error. Unstable classifiers can have low bias on a large range of data sets....
Logistic Model Trees
, 2006
"... Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regr ..."
Abstract

Cited by 154 (2 self)
 Add to MetaCart
Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into ‘model trees’, i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other stateoftheart learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.
Kernel matching pursuit
 Machine Learning
, 2002
"... Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, a ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
Matching Pursuit algorithms learn a function that is a weighted sum of basis functions, by sequentially appending functions to an initially empty basis, to approximate a target function in the leastsquares sense. We show how matching pursuit can be extended to use nonsquared error loss functions, and how it can be used to build kernelbased solutions to machinelearning problems, while keeping control of the sparsity of the solution. We also derive MDL motivated generalization bounds for this type of algorithm, and compare them to related SVM (Support Vector Machine) bounds. Finally, links to boosting algorithms and RBF training procedures, as well as an extensive experimental comparison with SVMs for classification are given, showing comparable results with typically sparser models. 1
Arcing the edge
, 1997
"... Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decrease generalization error. Procedures of this type were called arcing by Breiman[1996]. The first successful arcing ..."
Abstract

Cited by 64 (0 self)
 Add to MetaCart
Recent work has shown that adaptively reweighting the training set, growing a classifier using the new weights, and combining the classifiers constructed to date can significantly decrease generalization error. Procedures of this type were called arcing by Breiman[1996]. The first successful arcing procedure was introduced by Freund and Schapire[1995,1996] and called Adaboost. In an effort to explain why Adaboost works, Schapire et.al. [1997] derived a bound on the generalization error of a convex combination of classifiers in terms of the margin. We introduce a function called the edge, which differs from the margin only if there are more than two classes. A framework for understanding arcing algorithms is defined. In this framework, we see that the arcing algorithms currently in the literature are optimization algorithms which minimize some function of the edge. A relation is derived between the optimal reduction in the maximum value of the edge and the PAC concept of weak learner. Two algorithms are described which achieve the optimal reduction. Tests on both synthetic and real data cast doubt on the Schapire et.al. explanation.
Randomizing Outputs To Increase Prediction Accuracy
, 2000
"... Introduction In recent research in combining predictors, it has been recognized that the critical thing to success in combining lowbias predictors such as trees and neural nets has been through methods that reduce the variability in the predictor due to training set variability. Assume that the tr ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
Introduction In recent research in combining predictors, it has been recognized that the critical thing to success in combining lowbias predictors such as trees and neural nets has been through methods that reduce the variability in the predictor due to training set variability. Assume that the training set consists of N independent draws from the same underlying distribution. Conceptually, training sets of size N can be drawn repeatedly and the same algorithm used to construct a predictor on each training set. These predictors will vary, and the extent of the variability is a dominant factor in the generalization prediction error. 2 Given a training set {(y n ,x n ),n=1,...N} where the y's are either class labels or numerical values, the most common way of reducing variability is by perturbing the training set to produce alternative training sets, growing a predictor on
Artificial neural network ensembles and their application in pooled flood frequency analysis
 Water Resources Research
"... [1] Recent theoretical and empirical studies show that the generalization ability of artificial neural networks can be improved by combining several artificial neural networks in redundant ensembles. In this paper, a review is given of popular ensemble methods. Six approaches for creating artificial ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
[1] Recent theoretical and empirical studies show that the generalization ability of artificial neural networks can be improved by combining several artificial neural networks in redundant ensembles. In this paper, a review is given of popular ensemble methods. Six approaches for creating artificial neural network ensembles are applied in pooled flood frequency analysis for estimating the index flood and the 10year flood quantile. The results show that artificial neural network ensembles generate improved flood estimates and are less sensitive to the choice of initial parameters when compared with a single artificial neural network. Factors that may affect the generalization of an artificial neural network ensemble are analyzed. In terms of the methods for creating ensemble members, the model diversity introduced by varying the initial conditions of the base artificial neural networks to reduce the prediction error is comparable with more sophisticated methods, such as bagging and boosting. When the same method for creating ensemble members is used, combining member networks using stacking is generally better than using simple averaging. An ensemble size of at least 10 artificial neural networks is suggested to achieve sufficient generalization ability. In comparison with parametric regression methods, properly designed artificial neural network ensembles can significantly reduce
New Ensemble Machine Learning Method for Classification and Prediction on Gene Expression Data
 Proceedings of the international conference of the IEEE Engineering in Medicine and Biology Society 2 (2006) 3478–3481
"... Abstract –A reliable and precise classification of tumours is essential for successful treatment of cancer. Recent researches have confirmed the utility of ensemble machine learning algorithms for gene expression data analysis. In this paper, a new ensemble machine learning algorithm is proposed for ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract –A reliable and precise classification of tumours is essential for successful treatment of cancer. Recent researches have confirmed the utility of ensemble machine learning algorithms for gene expression data analysis. In this paper, a new ensemble machine learning algorithm is proposed for classification and prediction on gene expression data. The algorithm is tested and compared with three popular adopted ensembles, i.e. bagging, boosting and arcing. The results show that the proposed algorithm greatly outperforms existing methods, achieving high accuracy over 12 gene expression datasets. Index Terms – ensemble machine learning, pattern recognition, microarray I.
Learning collaboration strategies for committees of learning agents
 Journal of Autonomous Agents and MultiAgent Systems
"... A main issue in cooperation in multiagent systems is how an agent decides in which situations is better to cooperate with other agents, and with which agents does the agent cooperate. Specifically in this paper we focus on the following problem: given a multiagent system composed of learning agent ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
A main issue in cooperation in multiagent systems is how an agent decides in which situations is better to cooperate with other agents, and with which agents does the agent cooperate. Specifically in this paper we focus on the following problem: given a multiagent system composed of learning agents, and given that one of the agents in the system has as a goal to predict the correct solution of a given problem, the agent has to decide whether to solve the problem individually or to ask for collaboration to other agents. We will see that learning agents can collaborate forming committees in order to improve performance. Moreover, in this paper we will present a proactive learning approach that will allow the agents to learn when to convene a committee and with which agents to invite to join the committee. Our experiments show that learning results in smaller committees while maintaining (and sometimes improving) the problem solving accuracy than forming committees composed of all agents. 1.
THE JACKKNIFE IN CLASSIFICATION
"... Breiman �1996�, in an important contribution to the �eld of classi�cation, introduced the notion of resampling for improving classi�cation rules. Freund and Schapire �1996 � developed an other algorithm that exploits the resampling. This paper presents a Jackknifetype approach combined with the Fre ..."
Abstract
 Add to MetaCart
Breiman �1996�, in an important contribution to the �eld of classi�cation, introduced the notion of resampling for improving classi�cation rules. Freund and Schapire �1996 � developed an other algorithm that exploits the resampling. This paper presents a Jackknifetype approach combined with the Freund and Schapire �FS�
unknown title
"... A multigene approach to differentiate papillary thyroid carcinoma from benign lesions: gene selection using support vector machines with bootstrapping ..."
Abstract
 Add to MetaCart
A multigene approach to differentiate papillary thyroid carcinoma from benign lesions: gene selection using support vector machines with bootstrapping