Results 1  10
of
18
An introduction to kernelbased learning algorithms
 IEEE TRANSACTIONS ON NEURAL NETWORKS
, 2001
"... This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and ..."
Abstract

Cited by 373 (48 self)
 Add to MetaCart
This paper provides an introduction to support vector machines (SVMs), kernel Fisher discriminant analysis, and
Large scale multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... While classical kernelbased learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We s ..."
Abstract

Cited by 222 (18 self)
 Add to MetaCart
While classical kernelbased learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semiinfinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and oneclass classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million realworld splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox SHOGUN for which the source code is publicly available at
AdaBoosting neural networks
 Neural Computation
, 1997
"... Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multilayer artificial neural networks. We show that training multilayer neural networks in which the number of ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multilayer artificial neural networks. We show that training multilayer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors. 1
Efficient Margin Maximizing with Boosting
, 2002
"... AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, whic ..."
Abstract

Cited by 35 (7 self)
 Add to MetaCart
AdaBoost produces a linear combination of base hypotheses and predicts with the sign of this linear combination. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current signed linear combination, which can be viewed as hyperplane in feature space where the base hypotheses form the features.
Partial Least Squares Regression for Graph Mining
"... Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect informative subgraph patterns for a given learning task. We propose an iterative mining method based on partial least squares r ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
Attributed graphs are increasingly more common in many application domains such as chemistry, biology and text processing. A central issue in graph mining is how to collect informative subgraph patterns for a given learning task. We propose an iterative mining method based on partial least squares regression (PLS). To apply PLS to graph data, a sparse version of PLS is developed first and then it is combined with a weighted pattern mining algorithm. The mining algorithm is iteratively called with different weight vectors, creating one latent component per one mining call. Our method, graph PLS, is efficient and easy to implement, because the weight vector is updated with elementary matrix calculations. In experiments, our graph PLS algorithm showed competitive prediction accuracies in many chemical datasets and its efficiency was significantly superior to graph boosting (gBoost) and the naive method based on frequent graph mining.
Barrier Boosting
"... Boosting algorithms like AdaBoost and ArcGV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms. ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
Boosting algorithms like AdaBoost and ArcGV are iterative strategies to minimize a constrained objective function, equivalent to Barrier algorithms.
Adapting Codes and Embeddings for Polychotomies
, 2003
"... In this paper we consider formulations of multiclass problems based on a generalized notion of a margin and using output coding. This includes, but is not restricted to, standard multiclass SVM formulations. Differently from many previous approaches we learn the code as well as the embedding f ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
In this paper we consider formulations of multiclass problems based on a generalized notion of a margin and using output coding. This includes, but is not restricted to, standard multiclass SVM formulations. Differently from many previous approaches we learn the code as well as the embedding function. We illustrate how this can lead to a formulation that allows for solving a wider range of problems with e.g. many classes or even "missing classes". To keep our optimization problems tractable we propose an algorithm capable of solving them using twoclass classifiers, similar in spirit to Boosting.
Maximizing the Margin with Boosting
, 2002
"... AdaBoost produces a linear combination of weak hypotheses. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current linear combination, i.e. by a hyperplane in feature space spanned by the weak hypotheses ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
AdaBoost produces a linear combination of weak hypotheses. It has been observed that the generalization error of the algorithm continues to improve even after all examples are classified correctly by the current linear combination, i.e. by a hyperplane in feature space spanned by the weak hypotheses. The improvement is attributed to the experimental observation that the distances (margins) of the examples to the separating hyperplane are increasing even when the training error is already zero, that is all examples are on the correct side of the hyperplane. We give an iterative version of AdaBoost that explicitly maximizes the minimum margin of the examples. We bound the number of iterations and the number of hypotheses used in the final linear combination which approximates the maximum margin hyperplane with a certain precision. Our modified algorithm essentially retains the exponential convergence properties of AdaBoost and our result does not depend on the size of the hypothesis class. 1
Ceccatto, “Neural network ensembles: Evaluation of aggregation algorithms
 Artif. Intell
, 2005
"... Ensembles of artificial neural networks show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. An important problem is, then, how to tune the aggregate members ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Ensembles of artificial neural networks show improved generalization capabilities that outperform those of single networks. However, for aggregation to be effective, the individual networks must be as accurate and diverse as possible. An important problem is, then, how to tune the aggregate members in order to have an optimal compromise between these two conflicting conditions. We present here an extensive evaluation of several algorithms for ensemble construction, including new proposals and comparing them with standard methods in the literature. We also discuss a potential problem with sequential aggregation algorithms: the nonfrequent but damaging selection through their heuristics of particularly bad ensemble members. We introduce modified algorithms that cope with this problem by allowing individual weighting of aggregate members. Our algorithms and their weighted modifications are favorably tested against other methods in the literature, producing a sensible improvement in performance on most of the standard statistical databases used as benchmarks.
On the convergence of leveraging
 In Advances in Neural Information Processing Systems (NIPS
, 2002
"... We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the LeastSquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly com ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
We give an unified convergence analysis of ensemble learning methods including e.g. AdaBoost, Logistic Regression and the LeastSquareBoost algorithm for regression. These methods have in common that they iteratively call a base learning algorithm which returns hypotheses that are then linearly combined. We show that these methods are related to the GaussSouthwell method known from numerical optimization and state nonasymptotical convergence results for all these methods. Our analysis includes ℓ1norm regularized cost functions leading to a clean and general way to regularize ensemble learning. 1