Results 1  10
of
95
Improved Boosting Algorithms Using Confidencerated Predictions
 MACHINE LEARNING
, 1999
"... We describe several improvements to Freund and Schapire’s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find impr ..."
Abstract

Cited by 705 (26 self)
 Add to MetaCart
We describe several improvements to Freund and Schapire’s AdaBoost boosting algorithm, particularly in a setting in which hypotheses may assign confidences to each of their predictions. We give a simplified analysis of AdaBoost in this setting, and we show how this analysis can be used to find improved parameter settings as well as a refined criterion for training weak hypotheses. We give a specific method for assigning confidences to the predictions of decision trees, a method closely related to one used by Quinlan. This method also suggests a technique for growing decision trees which turns out to be identical to one proposed by Kearns and Mansour. We focus next on how to apply the new boosting algorithms to multiclass classification problems, particularly to the multilabel case in which each example may belong to more than one class. We give two boosting methods for this problem, plus a third method based on output coding. One of these leads to a new method for handling the singlelabel case which is simpler but as effective as techniques suggested by Freund and Schapire. Finally, we give some experimental results comparing a few of the algorithms discussed in this paper.
BoosTexter: A Boostingbased System for Text Categorization
 MACHINE LEARNING
, 2000
"... This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categor ..."
Abstract

Cited by 496 (21 self)
 Add to MetaCart
This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other textcategorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic calltype identification from unconstrained spoken customer responses.
ContextSensitive Learning Methods for Text Categorization
 ACM Transactions on Information Systems
, 1996
"... this article, we will investigate the performance of two recently implemented machinelearning algorithms on a number of large text categorization problems. The two algorithms considered are setvalued RIPPER, a recent rulelearning algorithm [Cohen A earlier version of this article appeared in Proc ..."
Abstract

Cited by 252 (13 self)
 Add to MetaCart
this article, we will investigate the performance of two recently implemented machinelearning algorithms on a number of large text categorization problems. The two algorithms considered are setvalued RIPPER, a recent rulelearning algorithm [Cohen A earlier version of this article appeared in Proceedings of the 19th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR) pp. 307315
Tracking the best expert
 In Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Abstract. We generalize the recent relative loss bounds for online algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound th ..."
Abstract

Cited by 197 (18 self)
 Add to MetaCart
Abstract. We generalize the recent relative loss bounds for online algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k +1and the sequence is of length ℓ, we can bound the additional loss of our algorithm over the best partition by O(k log n + k log(ℓ/k)). For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes O(k log n + k log(L/k)), where L is the loss of the best partition with k +1segments. Our algorithms for tracking the predictions of the best expert are simple adaptations of Vovk’s original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.
Efficient Algorithms for Online Decision Problems
 J. Comput. Syst. Sci
, 2003
"... In an online decision problem, one makes a sequence of decisions without knowledge of the future. Tools from learning such as Weighted Majority and its many variants [13, 18, 4] demonstrate that online algorithms can perform nearly as well as the best single decision chosen in hindsight, even when t ..."
Abstract

Cited by 135 (3 self)
 Add to MetaCart
In an online decision problem, one makes a sequence of decisions without knowledge of the future. Tools from learning such as Weighted Majority and its many variants [13, 18, 4] demonstrate that online algorithms can perform nearly as well as the best single decision chosen in hindsight, even when there are exponentially many possible decisions. However, the naive application of these algorithms is inefficient for such large problems. For some problems with nice structure, specialized efficient solutions have been developed [10, 16, 17, 6, 3].
Boosting and Rocchio Applied to Text Filtering
 In Proceedings of ACM SIGIR
, 1998
"... We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that ..."
Abstract

Cited by 103 (2 self)
 Add to MetaCart
We discuss two learning algorithms for text filtering: modified Rocchio and a boosting algorithm called AdaBoost. We show how both algorithms can be adapted to maximize any general utility matrix that associates cost (or gain) for each pair of machine prediction and correct label. We first show that AdaBoost significantly outperforms another highly effective text filtering algorithm. We then compare AdaBoost and Rocchio over three large text filtering tasks. Overall both algorithms are comparable and are quite effective. AdaBoost produces better classifiers than Rocchio when the training collection contains a very large number of relevant documents. However, on these tasks, Rocchio runs much faster than AdaBoost. 1
PACBayesian Model Averaging
 In Proceedings of the Twelfth Annual Conference on Computational Learning Theory
, 1999
"... PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Building on earlier methods for PACBayesian model selection, this paper presents a method for PACBayesian model averaging. The main result is a bound on generalization error of a ..."
Abstract

Cited by 80 (2 self)
 Add to MetaCart
PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Building on earlier methods for PACBayesian model selection, this paper presents a method for PACBayesian model averaging. The main result is a bound on generalization error of an arbitrary weighted mixture of concepts that depends on the empirical error of that mixture and the KLdivergence of the mixture from the prior. A simple characterization is also given for the error bound achieved by the optimal weighting. 1
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 65 (10 self)
 Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
PACBayesian stochastic model selection
 Machine Learning
, 2003
"... Abstract PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Stochastic model selection predicts a class label by stochastically sampling a classifier according to a "posterior distribution " on classifiers. This p ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
Abstract PACBayesian learning methods combine the informative priors of Bayesian methods with distributionfree PAC guarantees. Stochastic model selection predicts a class label by stochastically sampling a classifier according to a &quot;posterior distribution &quot; on classifiers. This paper gives a PACBayesian performance guarantee for stochastic model selection that is superior to analogous guarantees for deterministic model selection. The guarantee is stated in terms of the training error of the stochastic classifier and the KLdivergence of the posterior from the prior. It is shown that the posterior optimizing the performance guarantee is a Gibbs distribution. Simpler posterior distributions are also derived that have nearly optimal performance guarantees.