Results 1  10
of
136
In defense of onevsall classification
 Journal of Machine Learning Research
, 2004
"... Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This the ..."
Abstract

Cited by 202 (0 self)
 Add to MetaCart
Editor: John ShaweTaylor We consider the problem of multiclass classification. Our main thesis is that a simple “onevsall ” scheme is as accurate as any other approach, assuming that the underlying binary classifiers are welltuned regularized classifiers such as support vector machines. This thesis is interesting in that it disagrees with a large body of recent published work on multiclass classification. We support our position by means of a critical review of the existing literature, a substantial collection of carefully controlled experimental work, and theoretical arguments.
Semisupervised learning on Riemannian manifolds
 Machine Learning
, 2004
"... We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. T ..."
Abstract

Cited by 157 (8 self)
 Add to MetaCart
We consider the general problem of utilizing both labeled and unlabeled data to improve classification accuracy. Under the assumption that the data lie on a submanifold in a high dimensional space, we develop an algorithmic framework to classify a partially labeled data set in a principled manner. The central idea of our approach is that classification functions are naturally defined only on the submanifold in question rather than the total ambient space. Using the LaplaceBeltrami operator one produces a basis (the Laplacian Eigenmaps) for a Hilbert space of square integrable functions on the submanifold. To recover such a basis, only unlabeled examples are required. Once such a basis is obtained, training can be performed using the labeled data set. Our algorithm models the manifold using the adjacency graph for the data and approximates the LaplaceBeltrami operator by the graph Laplacian. We provide details of the algorithm, its theoretical justification, and several practical applications for image, speech, and text classification. 1.
On the Generalization Ability of Online Learning Algorithms
 IEEE Transactions on Information Theory
, 2001
"... In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary onlin ..."
Abstract

Cited by 133 (8 self)
 Add to MetaCart
In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentrationofmeasure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.
The mathematics of learning: Dealing with data
 Notices of the American Mathematical Society
, 2003
"... Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1 ..."
Abstract

Cited by 103 (15 self)
 Add to MetaCart
Draft for the Notices of the AMS Learning is key to developing systems tailored to a broad range of data analysis and information extraction tasks. We outline the mathematical foundations of learning theory and describe a key algorithm of it. 1
Learning the kernel function via regularization
 Journal of Machine Learning Research
, 2005
"... We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a realvalued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we ch ..."
Abstract

Cited by 96 (7 self)
 Add to MetaCart
We study the problem of finding an optimal kernel from a prescribed convex set of kernels K for learning a realvalued function by regularization. We establish for a wide variety of regularization functionals that this leads to a convex optimization problem and, for square loss regularization, we characterize the solution of this problem. We show that, although K may be an uncountable set, the optimal kernel is always obtained as a convex combination of at most m+2 basic kernels, where m is the number of data examples. In particular, our results apply to learning the optimal radial kernel or the optimal dot product kernel. 1.
Online Choice of Active Learning Algorithms
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2004
"... This work is concerned with the question of how to combine online an ensemble of active learners so as to expedite the learning progress in poolbased active learning. We develop an activelearning master algorithm, based on a known competitive algorithm for the multiarmed bandit problem. A major ..."
Abstract

Cited by 88 (2 self)
 Add to MetaCart
This work is concerned with the question of how to combine online an ensemble of active learners so as to expedite the learning progress in poolbased active learning. We develop an activelearning master algorithm, based on a known competitive algorithm for the multiarmed bandit problem. A major challenge in successfully choosing top performing active learners online is to reliably estimate their progress during the learning session. To this end we propose a simple maximum entropy criterion that provides effective estimates in realistic settings. We study the performance of the proposed master algorithm using an ensemble containing two of the best known activelearning algorithms as well as a new algorithm. The resulting
What Can We Learn Privately?
 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or sp ..."
Abstract

Cited by 57 (9 self)
 Add to MetaCart
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.
AlmostEverywhere Algorithmic Stability and Generalization Error
 In UAI2002: Uncertainty in Artificial Intelligence
, 2002
"... We introduce a new notion of algorithmic stability, which we call training stability. ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
We introduce a new notion of algorithmic stability, which we call training stability.
Statistical performance of support vector machines
 ANN. STATIST
, 2008
"... The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result build ..."
Abstract

Cited by 42 (8 self)
 Add to MetaCart
The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes. Our main result builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure. From this point of view, it can also be interpreted as a model selection principle using a penalized criterion. It is then possible to adapt general methods related to model selection in this framework to study two important points: (1) what is the minimum penalty and how does it compare to the penalty actually used in the SVM algorithm; (2) is it possible to obtain “oracle inequalities ” in that setting, for the specific loss function used in the SVM algorithm? We show that the answer to the latter question is positive and provides relevant insight to the former. Our result shows that it is possible to obtain fast rates of convergence for SVMs.