Results 1  10
of
87
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2831 (123 self)
 Add to MetaCart
(Show Context)
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Tracking the best expert
, 1998
"... We generalize the recent relative loss bounds for online algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additi ..."
Abstract

Cited by 248 (22 self)
 Add to MetaCart
We generalize the recent relative loss bounds for online algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k +1and the sequence is of length ℓ, we can bound the additional loss of our algorithm over the best partition by O(k log n + k log(ℓ/k)). For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes O(k log n + k log(L/k)), where L is the loss of the best partition with k +1segments. Our algorithms for tracking the predictions of the best expert are simple adaptations of Vovk’s original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.
Supervised Machine Learning: A Review of Classification Techniques. Informatica 31:249–268
, 2007
"... Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels i ..."
Abstract

Cited by 188 (0 self)
 Add to MetaCart
Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general hypotheses, which then make predictions about future instances. In other words, the goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single article cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored. Povzetek: Podan je pregled metod strojnega učenja. 1
Using Confidence Bounds for ExploitationExploration Tradeoffs
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeoff. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm ..."
Abstract

Cited by 182 (4 self)
 Add to MetaCart
We show how a standard tool from statistics  namely confidence bounds  can be used to elegantly deal with situations which exhibit an exploitationexploration tradeoff. Our technique for designing and analyzing algorithms for such situations is general and can be applied when an algorithm has to make exploitationversusexploration decisions based on uncertain information provided by a random process. We apply our
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 176 (7 self)
 Add to MetaCart
(Show Context)
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
Adaptive and SelfConfident OnLine Learning Algorithms
, 2000
"... We study online learning in the linear regression framework. Most of the performance bounds for online algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the wh ..."
Abstract

Cited by 97 (8 self)
 Add to MetaCart
We study online learning in the linear regression framework. Most of the performance bounds for online algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the whole sequence of examples and thus it is not available to any strictly online algorithm. We introduce new techniques for adaptively tuning the learning rate as the data sequence is progressively revealed. Our techniques allow us to prove essentially the same bounds as if we knew the optimal learning rate in advance. Moreover, such techniques apply to a wide class of online algorithms, including pnorm algorithms for generalized linear regression and Weighted Majority for linear regression with absolute loss. Our adaptive tunings are radically dierent from previous techniques, such as the socalled doubling trick. Whereas the doubling trick restarts the online algorithm several ti...
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 97 (15 self)
 Add to MetaCart
(Show Context)
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
Online portfolio selection using multiplicative updates
 Mathematical Finance
, 1998
"... We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algo ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
(Show Context)
We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio determined in hindsight from the actual market outcomes. The algorithm employs a multiplicative update rule derived using a framework introduced by Kivinen and Warmuth. Our algorithm is very simple to implement and requires only constant storage and computing time per stock ineach trading period. We tested the performance of our algorithm on real stock data from the New York Stock Exchange accumulated during a 22year period. On this data, our algorithm clearly outperforms the best single stock aswell as Cover's universal portfolio selection algorithm. We also present results for the situation in which the We present an online investment algorithm which achieves almost the same wealth as the best constantrebalanced portfolio investment strategy. The algorithm employsamultiplicative update rule derived using a framework introduced by Kivinen and Warmuth [20]. Our algorithm is very simple to implement and its time and storage requirements grow linearly in the number of stocks.
A secondorder perceptron algorithm
, 2005
"... Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptr ..."
Abstract

Cited by 83 (23 self)
 Add to MetaCart
(Show Context)
Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptron, and analyze its performance within the mistake bound model of online learning. The bound achieved by our algorithm depends on the sensitivity to secondorder data information and is the best known mistake bound for (efficient) kernelbased linearthreshold classifiers to date. This mistake bound, which strictly generalizes the wellknown Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the secondorder Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.
Tracking the Best Linear Predictor
 Journal of Machine Learning Research
, 2001
"... In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of ex ..."
Abstract

Cited by 79 (16 self)
 Add to MetaCart
(Show Context)
In most online learning research the total online loss of the algorithm is compared to the total loss of the best offline predictor u from a comparison class of predictors. We call such bounds static bounds. The interesting feature of these bounds is that they hold for an arbitrary sequence of examples. Recently some work has been done where the predictor u t at each trial t is allowed to change with time, and the total online loss of the algorithm is compared to the sum of the losses of u t at each trial plus the total "cost" for shifting to successive predictors. This is to model situations in which the examples change over time, and different predictors from the comparison class are best for different segments of the sequence of examples. We call such bounds shifting bounds. They hold for arbitrary sequences of examples and arbitrary sequences of predictors. Naturally shifting bounds are much harder to prove. The only known bounds are for the case when the comparison class consists of a sequences of experts or boolean disjunctions. In this paper we develop the methodology for lifting known static bounds to the shifting case. In particular we obtain bounds when the comparison class consists of linear neurons (linear combinations of experts). Our essential technique is to project the hypothesis of the static algorithm at the end of each trial into a suitably chosen convex region. This keeps the hypothesis of the algorithm wellbehaved and the static bounds can be converted to shifting bounds.