Results 1  10
of
479
Sharing Visual Features for Multiclass And Multiview Object Detection
, 2004
"... We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each clas ..."
Abstract

Cited by 177 (4 self)
 Add to MetaCart
We consider the problem of detecting a large number of different classes of objects in cluttered scenes. Traditional approaches require applying a battery of different classifiers to the image, at multiple locations and scales. This can be slow and can require a lot of training data, since each classifier requires the computation of many different image features. In particular, for independently trained detectors, the (runtime) computational complexity, and the (trainingtime) sample complexity, scales linearly with the number of classes to be detected. It seems unlikely that such an approach will scale up to allow recognition of hundreds or thousands of objects.
Stochastic Gradient Boosting
 Computational Statistics and Data Analysis
, 1999
"... Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current "pseudo"residuals by leastsquares at each iteration. The pseudoresiduals are the gradient of the loss functional being minimized, with respect to the model ..."
Abstract

Cited by 145 (1 self)
 Add to MetaCart
Gradient boosting constructs additive regression models by sequentially fitting a simple parameterized function (base learner) to current "pseudo"residuals by leastsquares at each iteration. The pseudoresiduals are the gradient of the loss functional being minimized, with respect to the model values at each training data point, evaluated at the current step. It is shown that both the approximation accuracy and execution speed of gradient boosting can be substantially improved by incorporating randomization into the procedure. Specifically, at each iteration a subsample of the training data is drawn at random (without replacement) from the full training data set. This randomly selected subsample is then used in place of the full sample to fit the base learner and compute the model update for the current iteration. This randomized approach also increases robustness against overcapacity of the base learner. 1 Gradient Boosting In the function estimation problem one has a system con...
Using the Forest to See the Trees: A Graphical Model Relating Features, Objects, and Scenes
, 2003
"... Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field ..."
Abstract

Cited by 144 (12 self)
 Add to MetaCart
Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ambiguities. We present a conditional random field for jointly solving the tasks of object detection and scene classification.
Visual Tracking with Online Multiple Instance Learning
, 2009
"... In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called “tracking by detection” have been shown to give promising results at realtime speeds. These methods train a discriminative classifier in an online ..."
Abstract

Cited by 129 (12 self)
 Add to MetaCart
In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called “tracking by detection” have been shown to give promising results at realtime speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using the current tracker state to extract positive and negative examples from the current frame. Slight inaccuracies in the tracker can therefore lead to incorrectly labeled training examples, which degrades the classifier and can cause further drift. In this paper we show that using Multiple Instance Learning (MIL) instead of traditional supervised learning avoids these problems, and can therefore lead to a more robust tracker with fewer parameter tweaks. We present a novel online MIL algorithm for object tracking that achieves superior results with realtime performance. 1.
FloatBoost Learning and Statistical Face Detection
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2004
"... A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential fun ..."
Abstract

Cited by 125 (4 self)
 Add to MetaCart
A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first realtime multiview face detection system reported.
Boosting with the L_2Loss: Regression and Classification
, 2001
"... This paper investigates a variant of boosting, L 2 Boost, which is constructed from a functional gradient descent algorithm with the L 2 loss function. Based on an explicit stagewise re tting expression of L 2 Boost, the case of (symmetric) linear weak learners is studied in detail in both regressi ..."
Abstract

Cited by 121 (16 self)
 Add to MetaCart
This paper investigates a variant of boosting, L 2 Boost, which is constructed from a functional gradient descent algorithm with the L 2 loss function. Based on an explicit stagewise re tting expression of L 2 Boost, the case of (symmetric) linear weak learners is studied in detail in both regression and twoclass classification. In particular, with the boosting iteration m working as the smoothing or regularization parameter, a new exponential biasvariance trade off is found with the variance (complexity) term bounded as m tends to infinity. When the weak learner is a smoothing spline, an optimal rate of convergence result holds for both regression and twoclass classification. And this boosted smoothing spline adapts to higher order, unknown smoothness. Moreover, a simple expansion of the 01 loss function is derived to reveal the importance of the decision boundary, bias reduction, and impossibility of an additive biasvariance decomposition in classification. Finally, simulation and real data set results are obtained to demonstrate the attractiveness of L 2 Boost, particularly with a novel componentwise cubic smoothing spline as an effective and practical weak learner.
Boosting Algorithms as Gradient Descent
, 2000
"... Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier h ..."
Abstract

Cited by 114 (2 self)
 Add to MetaCart
Much recent attention, both experimental and theoretical, has been focussed on classification algorithms which produce voted combinations of classifiers. Recent theoretical work has shown that the impressive generalization performance of algorithms like AdaBoost can be attributed to the classifier having large margins on the training data. We present an abstract algorithm for finding linear combinations of functions that minimize arbitrary cost functionals (i.e functionals that do not necessarily depend on the margin). Many existing voting methods can be shown to be special cases of this abstract algorithm. Then, following previous theoretical results bounding the generalization performance of convex combinations of classifiers in terms of general cost functions of the margin, we present a new algorithm (DOOM II) for performing a gradient descent optimization of such cost functions. Experiments on
Empirical margin distributions and bounding the generalization error of combined classifiers
 Ann. Statist
, 2002
"... Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such ..."
Abstract

Cited by 112 (8 self)
 Add to MetaCart
Dedicated to A.V. Skorohod on his seventieth birthday We prove new probabilistic upper bounds on generalization error of complex classifiers that are combinations of simple classifiers. Such combinations could be implemented by neural networks or by voting methods of combining the classifiers, such as boosting and bagging. The bounds are in terms of the empirical distribution of the margin of the combined classifier. They are based on the methods of the theory of Gaussian and empirical processes (comparison inequalities, symmetrization method, concentration inequalities) and they improve previous results of Bartlett (1998) on bounding the generalization error of neural networks in terms of ℓ1norms of the weights of neurons and of Schapire, Freund, Bartlett and Lee (1998) on bounding the generalization error of boosting. We also obtain rates of convergence in Lévy distance of empirical margin distribution to the true margin distribution uniformly over the classes of classifiers and prove the optimality of these rates.
Predicting clicks: Estimating the clickthrough rate for new ads
 In Proceedings of the 16th International World Wide Web Conference (WWW07
, 2007
"... Search engine advertising has become a significant element of the Web browsing experience. Choosing the right ads for the query and the order in which they are displayed greatly affects the probability that a user will see and click on each ad. This ranking has a strong impact on the revenue the sea ..."
Abstract

Cited by 100 (1 self)
 Add to MetaCart
Search engine advertising has become a significant element of the Web browsing experience. Choosing the right ads for the query and the order in which they are displayed greatly affects the probability that a user will see and click on each ad. This ranking has a strong impact on the revenue the search engine receives from the ads. Further, showing the user an ad that they prefer to click on improves user satisfaction. For these reasons, it is important to be able to accurately estimate the clickthrough rate of ads in the system. For ads that have been displayed repeatedly, this is empirically measurable, but for new ads, other means must be used. We show that we can use features of ads, terms, and advertisers to learn a model that accurately predicts the clickthough rate for new ads. We also show that using our model improves the convergence and performance of an advertising system. As a result, our model increases both revenue and user satisfaction.
Boosting as a Regularized Path to a Maximum Margin Classifier
 Journal of Machine Learning Research
, 2004
"... In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l 1 constraint on the coefficient vector. This helps understand the success of boosting with ..."
Abstract

Cited by 68 (18 self)
 Add to MetaCart
In this paper we study boosting methods from a new perspective. We build on recent work by Efron et al. to show that boosting approximately (and in some cases exactly) minimizes its loss criterion with an l 1 constraint on the coefficient vector. This helps understand the success of boosting with early stopping as regularized fitting of the loss criterion. For the two most commonly used criteria (exponential and binomial loglikelihood), we further show that as the constraint is relaxedor equivalently as the boosting iterations proceedthe solution converges (in the separable case) to an "l 1 optimal" separating hyperplane. We prove that this l 1 optimal separating hyperplane has the property of maximizing the minimal l 1 margin of the training data, as defined in the boosting literature.