Results 1  10
of
26
Soft Margins for AdaBoost
, 1998
"... Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this ..."
Abstract

Cited by 328 (22 self)
 Add to MetaCart
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we find that AdaBoost achieves  doing gradient descent in an error function with respect to the margin  asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hardtolearn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a suboptimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a difficult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin  a ...
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
(Show Context)
Efficient Margin Maximizing with Boosting
, 2002
"... AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, whic ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, which can be viewed as hyperplane in feature space where the base hypotheses form the features.
Maximizing the Margin with Boosting
, 2002
"... AdaBoost produces a linear combination of weak hypotheses. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current linear combination, i.e. by a hyperplane in feature space spanned by the weak hypotheses ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
(Show Context)
AdaBoost produces a linear combination of weak hypotheses. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current linear combination, i.e. by a hyperplane in feature space spanned by the weak hypotheses. The improvement is attributed to the experimental observation that the distances (margins) of the examples to the separating hyperplane are increasing even when the training error is already zero, that is all examples are on the correct side of the hyperplane. We give an iterative version of AdaBoost that explicitly maximizes the minimum margin of the examples. We bound the number of iterations and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Our modified algorithm essentially retains the exponential convergence properties of AdaBoost and our result does not depend on the size of the hypothesis class. 1
Barrier Boosting
"... Boosting algorithms like AdaBoost and ArcGV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms. ..."
Abstract

Cited by 22 (7 self)
 Add to MetaCart
Boosting algorithms like AdaBoost and ArcGV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms.
Learning first order logic time series classifiers
 PROCEEDINGS OF THE 4TH EUROPEAN CONFERENCE ON PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY (PKDD’00
, 2000
"... ..."
An improvement of AdaBoost to avoid overfitting
 Proc. of the Int. Conf. on Neural Information Processing
, 1998
"... Recent work has shown that combining multiple versions of weak classifiers such as decision trees or neural networks results in reduced test set error. To study this in greater detail, we analyze the asymptotic behavior of AdaBoost. The theoretical analysis establishes the relation between the distr ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Recent work has shown that combining multiple versions of weak classifiers such as decision trees or neural networks results in reduced test set error. To study this in greater detail, we analyze the asymptotic behavior of AdaBoost. The theoretical analysis establishes the relation between the distribution of margins of the training examples and the generated voting classification rule. The paper shows asymptotic experimental results with RBF networks for the binary classification case underlining the theoretical findings. Our experiments show that AdaBoost does overfit, indeed. In order to avoid this and to get better generalization performance, we propose a regularized improved version of AdaBoost, which is called AdaBoostreg . We show the usefulness of this improvement in numerical simulations. KEYWORDS: ensemble learning, AdaBoost, margin distribution, generalization, support vectors, RBF networks 1. Introduction An ensemble is a collection of neural networks or other types of c...
Regularizing AdaBoost
, 1999
"... Boosting methods maximize a hard classification margin and are known as powerful techniques that do not exhibit overfitting for low noise cases. Also for noisy data boosting will try to enforce a hard margin and thereby give too much weight to outliers, which then leads to the dilemma of nonsmooth ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Boosting methods maximize a hard classification margin and are known as powerful techniques that do not exhibit overfitting for low noise cases. Also for noisy data boosting will try to enforce a hard margin and thereby give too much weight to outliers, which then leads to the dilemma of nonsmooth fits and overfitting. Therefore we propose three algorithms to allow for soft margin classification by introducing regularization with slack variables into the boosting concept: (1) AdaBoost reg and regularized versions of (2) linear and (3) quadratic programming AdaBoost. Experiments show the usefulness of the proposed algorithms in comparison to another soft margin classifier: the support vector machine.
Exploiting the Essential Assumptions of Analogybased Effort Estimation
"... Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogybased effort estim ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogybased effort estimation: i.e. the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of supertrees vs smaller subtrees. Results: For ten data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: (a) the estimation variance of cluster subtrees is usually larger than that of cluster supertrees; (b) if analogy is restricted to the cluster trees with lower variance then effort estimates have a significantly lower error (measured using MRE and a Wilcoxon test, 95 % confidence, compared to nearestneighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.
The graph neural network model
 IEEE Transactions on Neural Networks
, 2009
"... The graph neural network model Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural netw ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
The graph neural network model Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an mdimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.