Results 1  10
of
77
Large scale multiple kernel learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... While classical kernelbased learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We s ..."
Abstract

Cited by 222 (18 self)
 Add to MetaCart
While classical kernelbased learning algorithms are based on a single kernel, in practice it is often desirable to use multiple kernels. Lanckriet et al. (2004) considered conic combinations of kernel matrices for classification, leading to a convex quadratically constrained quadratic program. We show that it can be rewritten as a semiinfinite linear program that can be efficiently solved by recycling the standard SVM implementations. Moreover, we generalize the formulation and our method to a larger class of problems, including regression and oneclass classification. Experimental results show that the proposed algorithm works for hundred thousands of examples or hundreds of kernels to be combined, and helps for automatic model selection, improving the interpretability of the learning result. In a second part we discuss general speed up mechanism for SVMs, especially when used with sparse feature maps as appear for string kernels, allowing us to train a string kernel SVM on a 10 million realworld splice data set from computational biology. We integrated multiple kernel learning in our machine learning toolbox SHOGUN for which the source code is publicly available at
Activity recognition from accelerometer data
 In Proceedings of the Seventeenth Conference on Innovative Applications of Artificial Intelligence(IAAI
, 2005
"... Activity recognition fits within the bigger framework of context awareness. In this paper, we report on our efforts to recognize user activity from accelerometer data. Activity recognition is formulated as a classification problem. Performance of baselevel classifiers and metalevel classifiers is ..."
Abstract

Cited by 88 (2 self)
 Add to MetaCart
Activity recognition fits within the bigger framework of context awareness. In this paper, we report on our efforts to recognize user activity from accelerometer data. Activity recognition is formulated as a classification problem. Performance of baselevel classifiers and metalevel classifiers is compared. Plurality Voting is found to perform consistently well across different settings.
On the Rate of Convergence of Regularized Boosting Classifiers
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite ..."
Abstract

Cited by 46 (10 self)
 Add to MetaCart
A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n (V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimensionindependent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps.
Boosting for Text Classification with Semantic Features
 IN PROCEEDINGS OF THE MSW 2004 WORKSHOP AT THE 10TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2004
"... Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic level than single words for text classification purposes. In this paper we propose such an enhancement of the classical docume ..."
Abstract

Cited by 39 (2 self)
 Add to MetaCart
Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic level than single words for text classification purposes. In this paper we propose such an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting, a successful machine learning technique is used for classification. Comparative experimental evaluations in three different settings support our approach through consistent improvement of the results. An analysis of the results shows that this improvement is due to two separate effects.
Boosting algorithms: Regularization, prediction and model fitting
 Statistical Science
, 2007
"... Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and correspo ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
Abstract. We present a statistical perspective on boosting. Special emphasis is given to estimating potentially complex parametric or nonparametric models, including generalized linear and additive models as well as regression models for survival analysis. Concepts of degrees of freedom and corresponding Akaike or Bayesian information criteria, particularly useful for regularization and variable selection in highdimensional covariate spaces, are discussed as well. The practical aspects of boosting procedures for fitting statistical models are illustrated by means of the dedicated opensource software package mboost. This package implements functions which can be used for model fitting, prediction and variable selection. It is flexible, allowing for the implementation of new boosting algorithms optimizing userspecified loss functions. Key words and phrases: Generalized linear models, generalized additive models, gradient boosting, survival analysis, variable selection, software. 1.
How boosting the margin can also boost classifier complexity
 In Proceedings of the 23rd International Conference on Machine Learning
, 2006
"... Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this e ..."
Abstract

Cited by 32 (4 self)
 Add to MetaCart
Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arcgv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman’s compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arcgv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially baseclassifier complexity. 1.
The dynamics of adaboost: Cyclic behavior and convergence of margins
 Journal of Machine Learning Research
, 2004
"... In order to study the convergence properties of the AdaBoost algorithm, we reduce AdaBoost to a nonlinear iterated map and study the evolution of its weight vectors. This dynamical systems approach allows us to understand AdaBoost’s convergence properties completely in certain cases; for these cases ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
In order to study the convergence properties of the AdaBoost algorithm, we reduce AdaBoost to a nonlinear iterated map and study the evolution of its weight vectors. This dynamical systems approach allows us to understand AdaBoost’s convergence properties completely in certain cases; for these cases we find stable cycles, allowing us to explicitly solve for AdaBoost’s output. Using this unusual technique, we are able to show that AdaBoost does not always converge to a maximum margin combined classifier, answering an open question. In addition, we show that “nonoptimal ” AdaBoost (where the weak learning algorithm does not necessarily choose the best weak classifier at each iteration) may fail to converge to a maximum margin classifier, even if “optimal ” AdaBoost produces a maximum margin. Also, we show that if AdaBoost cycles, it cycles among “support vectors”, i.e., examples that achieve the same smallest margin.
Generic face alignment using boosted appearance model
 in Proc. IEEE Computer Vision and Pattern Recognition
, 2007
"... This paper proposes a discriminative framework for efficiently aligning images. Although conventional Active Appearance Models (AAM)based approaches have achieved some success, they suffer from the generalization problem, i.e., how to align any image with a generic model. We treat the iterative ima ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
This paper proposes a discriminative framework for efficiently aligning images. Although conventional Active Appearance Models (AAM)based approaches have achieved some success, they suffer from the generalization problem, i.e., how to align any image with a generic model. We treat the iterative image alignment problem as a process of maximizing the score of a trained twoclass classifier that is able to distinguish correct alignment (positive class) from incorrect alignment (negative class). During the modeling stage, given a set of images with ground truth landmarks, we train a conventional Point Distribution Model (PDM) and a boostingbased classifier, which we call Boosted Appearance Model (BAM). When tested on an image with the initial landmark locations, the proposed algorithm iteratively updates the shape parameters of the PDM via the gradient ascent method such that the classification score of the warped image is maximized. The proposed framework is applied to the face alignment problem. Using extensive experimentation, we show that, compared to the AAMbased approach, this framework greatly improves the robustness, accuracy and efficiency of face alignment by a large margin, especially for unseen data. 1.
Learning the unified kernel machines for classification
 In Proc. KDD
, 2006
"... Kernel machines have been shown as the stateoftheart learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semisupervise ..."
Abstract

Cited by 27 (14 self)
 Add to MetaCart
Kernel machines have been shown as the stateoftheart learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semisupervised kernel learning, and active learning in a unified solution. In the suggested framework, we particularly focus our attention on designing a new semisupervised kernel learning method, i.e., Spectral Kernel Learning (SKL), which is built on the principles of kernel target alignment and unsupervised kernel design. Our algorithm is related to an equivalent quadratic programming problem that can be efficiently solved. Empirical results have shown that our method is more effective and robust to learn the semisupervised kernels than traditional approaches. Based on the framework, we present a specific paradigm of unified kernel machines with respect to Kernel Logistic Regresions (KLR), i.e., Unified Kernel Logistic Regression (UKLR). We evaluate our proposed UKLR classification scheme in comparison with traditional solutions. The promising results show that our proposed UKLR paradigm is more effective than the traditional classification approaches.
On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms
 IN: PROCEEDINGS OF THE 21ST ANNUAL CONFERENCE ON COMPUTATIONAL LEARNING THEORY
"... Boosting algorithms build highly accurate prediction mechanisms from a collection of lowaccuracy predictors. To do so, they employ the notion of weaklearnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ1 margin. Whil ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Boosting algorithms build highly accurate prediction mechanisms from a collection of lowaccuracy predictors. To do so, they employ the notion of weaklearnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ1 margin. While this equivalence is a direct consequence of von Neumann’s minimax theorem, we derive the equivalence directly using Fenchel duality. We then use our derivation to describe a family of relaxations to the weaklearnability assumption that readily translates to a family of relaxations of linear separability with margin. This alternative perspective sheds new light on known softmargin boosting algorithms and also enables us to derive several new relaxations of the notion of linear separability. Last, we describe and analyze an efficient boosting framework that can be used for minimizing the loss functions derived from our family of relaxations. In particular, we obtain efficient boosting algorithms for maximizing hard and soft versions of the ℓ1 margin.