Results 1  10
of
102
Large Margin Classification Using the Perceptron Algorithm
 Machine Learning
, 1998
"... We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compa ..."
Abstract

Cited by 415 (1 self)
 Add to MetaCart
We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leaveoneout method. Like Vapnik 's maximalmargin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximalmargin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 293 (22 self)
 Add to MetaCart
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms. 1
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 133 (8 self)
 Add to MetaCart
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
Fast Kernel Classifiers With Online And Active Learning
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This ..."
Abstract

Cited by 102 (17 self)
 Add to MetaCart
Very high dimensional learning systems become theoretically possible when training examples are abundant. The computing cost then becomes the limiting factor. Any efficient learning algorithm should at least take a brief look at each example. But should all examples be given equal attention? This contribution proposes an empirical answer. We first present an online SVM algorithm based on this premise. LASVM yields competitive misclassification rates after a single pass over the training examples, outspeeding stateoftheart SVM solvers. Then we show how active example selection can yield faster training, higher accuracies, and simpler models, using only a fraction of the training example labels.
Fast Binary Feature Selection with Conditional Mutual Information
 Journal of Machine Learning Research
, 2004
"... We propose in this paper a very fast feature selection technique based on conditional mutual information. ..."
Abstract

Cited by 101 (1 self)
 Add to MetaCart
We propose in this paper a very fast feature selection technique based on conditional mutual information.
Online Bayes Point Machines
"... We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A rand ..."
Abstract

Cited by 69 (3 self)
 Add to MetaCart
We present a new and simple algorithm for learning large margin classi ers that works in a truly online manner. The algorithm generates a linear classi er by averaging the weights associated with several perceptronlike algorithms run in parallel in order to approximate the Bayes point. A random subsample of the incoming data stream is used to ensure diversity in the perceptron solutions. We experimentally study the algorithm's performance on online and batch learning settings.
On kernel methods for relational learning
 In Proc. of the International Conference on Machine Learning
, 2003
"... Kernel methods have gained a great deal of popularity in the machine learning community as a method to learn indirectly in highdimensional feature spaces. Those interested in relational learning have recently begun to cast learning from structured and relational data in terms of kernel operations. W ..."
Abstract

Cited by 65 (5 self)
 Add to MetaCart
Kernel methods have gained a great deal of popularity in the machine learning community as a method to learn indirectly in highdimensional feature spaces. Those interested in relational learning have recently begun to cast learning from structured and relational data in terms of kernel operations. We describe a general family of kernel functions built up from a description language of limited expressivity and use it to study the benefits and drawbacks of kernel learning in relational domains. Learning with kernels in this family directly models learning over an expanded feature space constructed using the same description language. This allows us to examine issues of time complexity in terms of learning with these and other relational kernels, and how these relate to generalization ability. The tradeoffs between using kernels in a very high dimensional implicit space versus a restricted feature space, is highlighted through two experiments, in bioinformatics and in natural language processing. 1.
A secondorder perceptron algorithm
, 2005
"... Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptr ..."
Abstract

Cited by 59 (21 self)
 Add to MetaCart
Kernelbased linearthreshold algorithms, such as support vector machines and Perceptronlike algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called secondorder Perceptron, and analyze its performance within the mistake bound model of online learning. The bound achieved by our algorithm depends on the sensitivity to secondorder data information and is the best known mistake bound for (efficient) kernelbased linearthreshold classifiers to date. This mistake bound, which strictly generalizes the wellknown Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the secondorder Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.
Parameter Estimation for Statistical Parsing Models: Theory and Practice of DistributionFree Methods
, 2001
"... A fundamental problem in statistical parsing is the choice of criteria and algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximumlikelihood estimation. The assumptions under which m ..."
Abstract

Cited by 53 (10 self)
 Add to MetaCart
A fundamental problem in statistical parsing is the choice of criteria and algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximumlikelihood estimation. The assumptions under which maximumlikelihood estimation is justified are arguably quite strong. This paper discusses the statistical theory underlying various parameterestimation methods, and gives algorithms which depend on alternatives to (smoothed) maximumlikelihood estimation. We first give an overview of results from statistical learning theory. We then show how important concepts from the classification literature  specifically, generalization results based on margins on training data  can be derived for parsing models. Finally, we describe parameter estimation algorithms which are motivated by these generalization bounds.