Results 1  10
of
89
The Weighted Majority Algorithm
, 1994
"... We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of kn ..."
Abstract

Cited by 678 (39 self)
 Add to MetaCart
We study the construction of prediction algorithms in a situation in which a learner faces a sequence of trials, with a prediction to be made in each, and the goal of the learner is to make few mistakes. We are interested in the case that the learner has reason to believe that one of some pool of known algorithms will perform well, but the learner does not know which one. A simple and effective method, based on weighted voting, is introduced for constructing a compound algorithm in such a circumstance. We call this method the Weighted Majority Algorithm. We show that this algorithm is robust in the presence of errors in the data. We discuss various versions of the Weighted Majority Algorithm and prove mistake bounds for them that are closely related to the mistake bounds of the best algorithms of the pool. For example, given a sequence of trials, if there is an algorithm in the pool A that makes at most m mistakes then the Weighted Majority Algorithm will make at most c(log jAj + m) mi...
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 293 (22 self)
 Add to MetaCart
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms. 1
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 247 (12 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Tracking the best expert
 In Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Abstract. We generalize the recent relative loss bounds for online algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound th ..."
Abstract

Cited by 198 (18 self)
 Add to MetaCart
Abstract. We generalize the recent relative loss bounds for online algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k +1and the sequence is of length ℓ, we can bound the additional loss of our algorithm over the best partition by O(k log n + k log(ℓ/k)). For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes O(k log n + k log(L/k)), where L is the loss of the best partition with k +1segments. Our algorithms for tracking the predictions of the best expert are simple adaptations of Vovk’s original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.
A Dynamic Disk SpinDown Technique for Mobile Computing
, 1996
"... We address the problem of deciding when to spin down the disk of a mobile computer in order to extend battery life. Since one of the most critical resources in mobile computing environments is battery life, good energy conservation methods can dramatically increase the utility of mobile systems. We ..."
Abstract

Cited by 154 (7 self)
 Add to MetaCart
We address the problem of deciding when to spin down the disk of a mobile computer in order to extend battery life. Since one of the most critical resources in mobile computing environments is battery life, good energy conservation methods can dramatically increase the utility of mobile systems. We use a simple and efficient algorithm based on machine learning techniques that has excellent performance in practice. Our experimental results are based on traces collected from HP C2474s disks. Using this data, the algorithm outperforms several algorithms that are theoretically optimal in under various worstcase assumptions, as well as the best fixed timeout strategy. In particular, the algorithm reduces the power consumption of the disk to about half (depending on the disk's properties) of the energy consumed by a one minute fixed timeout. Since the algorithm adapts to usage patterns, it uses as little as 88% of the energy consumed by the best fixed timeout computed in retrospect. 1 In...
Empirical Support for Winnow and WeightedMajority Algorithms: Results on a Calendar Scheduling Domain
 Machine Learning
, 1995
"... This paper describes experimental results on using Winnow and WeightedMajority based algorithms on a realworld calendar scheduling domain. These two algorithms have been highly studied in the theoretical machine learning literature. We show here that these algorithms can be quite competitive pract ..."
Abstract

Cited by 127 (4 self)
 Add to MetaCart
This paper describes experimental results on using Winnow and WeightedMajority based algorithms on a realworld calendar scheduling domain. These two algorithms have been highly studied in the theoretical machine learning literature. We show here that these algorithms can be quite competitive practically, outperforming the decisiontree approach currently in use in the Calendar Apprentice system in terms of both accuracy and speed. One of the contributions of this paper is a new variant on the Winnow algorithm (used in the experiments) that is especially suited to conditions with stringvalued classifications, and we give a theoretical analysis of its performance. In addition we show how Winnow can be applied to achieve a good accuracy/coverage tradeoff and explore issues that arise such as concept drift. We also provide an analysis of a policy for discarding predictors in WeightedMajority that allows it to speed up as it learns. Keywords: Winnow, WeightedMajority, Multiplicative alg...
Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension
 Machine Learning
, 1994
"... In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the l ..."
Abstract

Cited by 108 (12 self)
 Add to MetaCart
In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models. 1 Introduction Consider a simple concept learning model in which the learner attempts to infer an unknown target concept f , chosen from a known concept class F of f0; 1gvalued functions over an instance space X....
Active learning with committees for text categorization
 In proceedings of the Fourteenth National Conference on Artificial Intelligence
, 1997
"... In many realworld domains, supervised learning requires a large number of training examples. In this paper, we describe an active learning method that uses a committee of learners to reduce the number of training examples required for learning. Our approach is similar to the Query by Committee fram ..."
Abstract

Cited by 81 (0 self)
 Add to MetaCart
In many realworld domains, supervised learning requires a large number of training examples. In this paper, we describe an active learning method that uses a committee of learners to reduce the number of training examples required for learning. Our approach is similar to the Query by Committee framework, where disagreement among the committee members on the predicted label for the input part of the example is used to signal the need for knowing the actual value of the label. Our experiments are conducted in the text categorization domain, which is characterized by a large number of features, many ofwhich are irrelevant. We report here on experiments using a committee of Winnowbased learners and demonstrate that this approach can reduce the number of labeled training examples required over that used by a single Winnow learner by 12 orders of magnitude. 1.
Tracking the Best Disjunction
 Machine Learning
, 1995
"... . Littlestone developed a simple deterministic online learning algorithm for learning kliteral disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for a ..."
Abstract

Cited by 74 (11 self)
 Add to MetaCart
. Littlestone developed a simple deterministic online learning algorithm for learning kliteral disjunctions. This algorithm (called Winnow) keeps one weight for each of the n variables and does multiplicative updates to its weights. We develop a randomized version of Winnow and prove bounds for an adaptation of the algorithm for the case when the disjunction may change over time. In this case a possible target disjunction schedule T is a sequence of disjunctions (one per trial) and the shift size is the total number of literals that are added/removed from the disjunctions as one progresses through the sequence. We develop an algorithm that predicts nearly as well as the best disjunction schedule for an arbitrary sequence of examples. This algorithm that allows us to track the predictions of the best disjunction is hardly more complex than the original version. However the amortized analysis needed for obtaining worstcase mistake bounds requires new techniques. In some cases our low...
The Relaxed Online Maximum Margin Algorithm
 Machine Learning
, 2000
"... We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
We describe a new incremental algorithm for training linear threshold functions: the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctly with the maximum margin. It is known that such a maximummargin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be eciently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the more computationally intensive maximummargin algorithm also satis es this mistake bound; this is the rst worstcase performance guarantee for this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwr...