Results 1  10
of
12
A DecisionTheoretic Generalization of onLine Learning and an Application to Boosting
, 1997
"... In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worstcase online framework. The model we study can be interpreted as a broad, abstract extension of the wellstudied online prediction model to a general decisiontheoretic set ..."
Abstract

Cited by 2307 (59 self)
 Add to MetaCart
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worstcase online framework. The model we study can be interpreted as a broad, abstract extension of the wellstudied online prediction model to a general decisiontheoretic setting. We show that the multiplicative weightupdate rule of Littlestone and Warmuth [20] can be adapted to this model yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multipleoutcome prediction, repeated games and prediction of points in R n . In the second part of the paper we apply the multiplicative weightupdate technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of...
A Geometric Approach to Leveraging Weak Learners
 Computational Learning Theory: 4th European Conference (EuroCOLT '99
, 1998
"... . AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over t ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
. AdaBoost is a popular and effective leveraging procedure for improving the hypotheses generated by weak learning algorithms. AdaBoost and many other leveraging algorithms can be viewed as performing a constrained gradient descent over a potential function. At each iteration the distribution over the sample given to the weak learner is the direction of steepest descent. We introduce a new leveraging algorithm based on a natural potential function. For this potential function, the direction of steepest descent can have negative components. Therefore we provide two transformations for obtaining suitable distributions from these directions of steepest descent. The resulting algorithms have bounds that are incomparable to AdaBoost's, and their empirical performance is similar to AdaBoost's. 1 Introduction Algorithms like AdaBoost [7] that are able to improve the hypotheses generated by weak learning methods have great potential and practical benefits. We call any such algorithm a leverag...
A Study in Machine Learning from Imbalanced Data for Sentence Boundary Detection in Speech
 Computer Speech and Language
, 2006
"... Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information. ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information. Since there are more nonsentence boundaries than sentence boundaries in the data, the prosody model, which is implemented as a decision tree classifier, must be constructed to effectively learn from the imbalanced data distribution. To address this problem, we investigate a variety of sampling approaches and a bagging scheme. A pilot study was carried out to select methods to apply to the full NIST sentence boundary evaluation task across two corpora (conversational telephone speech and broadcast news speech), using both human transcriptions and recognition output. In the pilot study, when classification error rate is the performance measure, using the original training set achieves the best performance among the sampling methods, and an ensemble of multiple classifiers from different downsampled training sets achieves
On boosting kernel regression
, 2008
"... In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the NadarayaWatson estimator with L2boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast, and the variance diver ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this paper we propose a simple multistep regression smoother which is constructed in an iterative manner, by learning the NadarayaWatson estimator with L2boosting. We find, in both theoretical analysis and simulation experiments, that the bias converges exponentially fast, and the variance diverges exponentially slow. The first boosting step is analyzed in more detail, giving asymptotic expressions as functions of the smoothing parameter, and relationships with previous work are explored. Practical performance is illustrated by both simulated and real data.
Multistep kernel regression smoothing by boosting
, 2005
"... Summary. In this paper we propose a simple multistep regression smoother which is constructed in a boosting fashion, by learning the Nadaraya–Watson estimator with L2Boosting. Differently from the usual approach, we do not focus on L2Boosting for ever. Given a kernel smoother as a learner, we explor ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Summary. In this paper we propose a simple multistep regression smoother which is constructed in a boosting fashion, by learning the Nadaraya–Watson estimator with L2Boosting. Differently from the usual approach, we do not focus on L2Boosting for ever. Given a kernel smoother as a learner, we explore the boosting capability to build estimators using a finite number of boosting iterations. This approach appears fruitful since it simplifies the boosting interpretation and application. We find, in both theoretical analysis and simulation experiments, that higher order bias properties emerge. Relationships between our smoother and previous work are explored. Moreover, we suggest a way to successfully employ our method for estimating probability density functions (pdf) and cumulative distribution functions (cdf) via binning procedures and the smoothing of the empirical cumulative distribution function, respectively. The practical performance of the method is illustrated by a large simulation study which shows an encouraging finite sample behaviour paricularly in comparison with other methods.
unknown title
, 2000
"... Abstract We propose a new boosting algorithm. This boosting algorithm is an adaptive version of the boost bymajority algorithm and combines bounded goals of the boost by majority algorithm with the adaptivity of AdaBoost.The method used for making boostbymajority adaptive is to consider the limit ..."
Abstract
 Add to MetaCart
Abstract We propose a new boosting algorithm. This boosting algorithm is an adaptive version of the boost bymajority algorithm and combines bounded goals of the boost by majority algorithm with the adaptivity of AdaBoost.The method used for making boostbymajority adaptive is to consider the limit in which each of the boosting iterations makes an infinitesimally small contribution to the process as a whole. This limit canbe modeled using the differential equations that govern Brownian motion. The new boosting algorithm, named BrownBoost, is based on finding solutions to these differential equations.The paper describes two methods for finding approximate solutions to the differential equations. The first is a method that results in a provably polynomial time algorithm. The second method, based onthe NewtonRaphson minimization procedure, is much more efficient in practice but is not known to be polynomial.
Random Forests (RF), and Multivariate Adaptive
"... The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the techni ..."
Abstract
 Add to MetaCart
The task of modeling the distribution of a large number of tree species under future climate scenarios presents unique challenges. First, the model must be robust enough to handle climate data outside the current range without producing unacceptable instability in the output. In addition, the technique should have automatic search mechanisms built in to select the most appropriate values for input model parameters for each species so that minimal effort is required when these parameters are finetuned for individual tree species. We evaluated four statistical models—Regression
To Center or Not to Center: That Is Not the Question—An Ancillarity–Sufficiency Interweaving Strategy (ASIS) for Boosting MCMC Efficiency
"... For a broad class of multilevel models, there exist two wellknown competing parameterizations, the centered parameterization (CP) and the noncentered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to comprom ..."
Abstract
 Add to MetaCart
For a broad class of multilevel models, there exist two wellknown competing parameterizations, the centered parameterization (CP) and the noncentered parameterization (NCP), for effective MCMC implementation. Much literature has been devoted to the questions of when to use which and how to compromise between them via partial CP/NCP. This article introduces an alternative strategy for boosting MCMC efficiency via simply interweaving—but not alternating—the two parameterizations. This strategy has the surprising property that failure of both the CP and NCP chains to converge geometrically does not prevent the interweaving algorithm from doing so. It achieves this seemingly magical property by taking advantage of the discordance of the two parameterizations, namely, the sufficiency of CP and the ancillarity of NCP, to substantially reduce the Markovian dependence, especially when the original CP and NCP form a “beauty and beast ” pair (i.e., when one chain mixes far more rapidly than the other). The ancillarity–sufficiency reformulation of the CP–NCP dichotomy allows us to borrow insight from the wellknown Basu’s theorem on the independence of (complete) sufficient and ancillary statistics, albeit a Bayesian version of Basu’s theorem is currently