Results 1  10
of
56
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 293 (22 self)
 Add to MetaCart
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms. 1
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 133 (8 self)
 Add to MetaCart
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
A New Approximate Maximal Margin Classification Algorithm
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2001
"... A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p 2 for a set of linearly separable data. Our algorithm, called alma p (Approximate Large Margin algorithm w.r.t. norm p), takes O (p 1) 2 2 corrections to separate the data wi ..."
Abstract

Cited by 87 (6 self)
 Add to MetaCart
A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p 2 for a set of linearly separable data. Our algorithm, called alma p (Approximate Large Margin algorithm w.r.t. norm p), takes O (p 1) 2 2 corrections to separate the data with pnorm margin larger than (1 ) , where is the (normalized) pnorm margin of the data. alma p avoids quadratic (or higherorder) programming methods. It is very easy to implement and is as fast as online algorithms, such as Rosenblatt's Perceptron algorithm. We performed extensive experiments on both realworld and artificial datasets. We compared alma 2 (i.e., alma p with p = 2) to standard Support vector Machines (SVM) and to two incremental algorithms: the Perceptron algorithm and Li and Long's ROMMA. The accuracy levels achieved by alma 2 are superior to those achieved by the Perceptron algorithm and ROMMA, but slightly inferior to SVM's. On the other hand, alma 2 is quite faster and easier to implement than standard SVM training algorithms. When learning sparse target vectors, alma p with p > 2 largely outperforms Perceptronlike algorithms, such as alma 2 .
General convergence results for linear discriminant updates
 Machine Learning
, 1997
"... Abstract. The problem of learning lineardiscriminant concepts can be solved by various mistakedriven update procedures, including the Winnow family of algorithms and the wellknown Perceptron algorithm. In this paper we define the general class of “quasiadditive ” algorithms, which includes Perce ..."
Abstract

Cited by 84 (0 self)
 Add to MetaCart
Abstract. The problem of learning lineardiscriminant concepts can be solved by various mistakedriven update procedures, including the Winnow family of algorithms and the wellknown Perceptron algorithm. In this paper we define the general class of “quasiadditive ” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge. Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically ” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additiveupdate behavior of Perceptron and the multiplicativeupdate behavior of Winnow.
Relative Loss Bounds for Multidimensional Regression Problems
 MACHINE LEARNING
, 2001
"... We study online generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. The ..."
Abstract

Cited by 72 (12 self)
 Add to MetaCart
We study online generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. The weight vectors used to produce the linear activations are represented indirectly by maintaining separate parameter vectors. We get the weight vector by applying a particular parameterization function to the parameter vector. Updating the parameter vectors upon seeing new examples is done additively, as in the usual gradient descent update. However, by using a nonlinear parameterization function between the parameter vectors and the weight vectors, we can make the resulting update of the weight vector quite different from a true gradient descent update. To analyse such updates, we define a notion of a matching loss function and apply it both to the transfer function and to the parameterization function. The loss function that matches the transfer function is used to measure the goodness of the predictions of the algorithm. The loss function that matches the parameterization function can be used both as a measure of divergence between models in motivating the update rule of the algorithm and as a measure of progress in analyzing its relative performance compared to an arbitrary fixed model. As a result, we have a unified treatment that generalizes earlier results for the gradient descent and exponentiated gradient algorithms to multidimensional outputs, including multiclass logistic regression.
Adaptive and SelfConfident OnLine Learning Algorithms
, 2000
"... We study online learning in the linear regression framework. Most of the performance bounds for online algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the wh ..."
Abstract

Cited by 62 (7 self)
 Add to MetaCart
We study online learning in the linear regression framework. Most of the performance bounds for online algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the whole sequence of examples and thus it is not available to any strictly online algorithm. We introduce new techniques for adaptively tuning the learning rate as the data sequence is progressively revealed. Our techniques allow us to prove essentially the same bounds as if we knew the optimal learning rate in advance. Moreover, such techniques apply to a wide class of online algorithms, including pnorm algorithms for generalized linear regression and Weighted Majority for linear regression with absolute loss. Our adaptive tunings are radically dierent from previous techniques, such as the socalled doubling trick. Whereas the doubling trick restarts the online algorithm several ti...
Path Kernels and Multiplicative Updates
 Journal of Machine Learning Research
, 2003
"... Kernels are typically applied to linear algorithms whose weight vector is a linear combination of the feature vectors of the examples. Online versions of these algorithms are sometimes called "additive updates" because they add a multiple of the last feature vector to the current weight vector. ..."
Abstract

Cited by 61 (7 self)
 Add to MetaCart
Kernels are typically applied to linear algorithms whose weight vector is a linear combination of the feature vectors of the examples. Online versions of these algorithms are sometimes called "additive updates" because they add a multiple of the last feature vector to the current weight vector.
Dual averaging methods for regularized stochastic learning and online optimization
 In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nes ..."
Abstract

Cited by 60 (3 self)
 Add to MetaCart
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
A secondorder perceptron algorithm
, 2005
"... Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptr ..."
Abstract

Cited by 59 (21 self)
 Add to MetaCart
Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptron, and analyze its performance within the mistake bound model of online learning. The bound achieved by our algorithm depends on the sensitivity to secondorder data information and is the best known mistake bound for (efficient) kernelbased linearthreshold classifiers to date. This mistake bound, which strictly generalizes the wellknown Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the secondorder Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.
WorstCase Analysis of Selective Sampling for Linear Classification
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linearthreshold classification algorithms from the ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
A selective sampling algorithm is a learning algorithm for classification that, based on the past observed data, decides whether to ask the label of each new instance to be classified. In this paper, we introduce a general technique for turning linearthreshold classification algorithms from the general additive family into randomized selective sampling algorithms. For the most popular algorithms in this family we derive mistake bounds that hold for individual sequences of examples. These bounds