Results 1  10
of
18
A tutorial on support vector machines for pattern recognition
 Data Mining and Knowledge Discovery
, 1998
"... The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SV ..."
Abstract

Cited by 2272 (11 self)
 Add to MetaCart
The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and nonseparable data, working through a nontrivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract

Cited by 515 (18 self)
 Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called RankBoost. We also describe an efficient implementation of the algorithm for certain natural cases. We discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different WWW search strategies, each of which is a query expansion for a given domain. For this task, we compare the performance of RankBoost to the individual search strategies. The second experiment is a collaborativefiltering task for making movie recommendations. Here, we present results comparing RankBoost to nearestneighbor and regression algorithms.
Generalization bounds for the area under the ROC curve
 Journal of Machine Learning Research
"... We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existi ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
We study generalization properties of the area under an ROC curve (AUC), a quantity that has been advocated as an evaluation criterion for bipartite ranking problems. The AUC is a different and more complex term than the error rate used for evaluation in classification problems; consequently, existing generalization bounds for the classification error rate cannot be used to draw conclusions about the AUC. In this paper, we define a precise notion of the expected accuracy of a ranking function (analogous to the expected error rate of a classification function), and derive distributionfree probabilistic bounds on the deviation of the empirical AUC of a ranking function (observed on a finite data sequence) from its expected accuracy. We derive both a large deviation bound, which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on a test sequence, and a uniform convergence bound, which serves to bound the expected accuracy of a learned ranking function in terms of its empirical AUC on a training sequence. Our uniform convergence bound is expressed in terms of a new set of combinatorial parameters that we term the bipartite rankshatter coefficients; these play the same role in our result as do the standard shatter coefficients (also known variously as the counting numbers or growth function) in uniform convergence results for the classification error rate. We also compare our result with a recent uniform convergence result derived by Freund et al. (2003) for a quantity closely related to the AUC; as we show, the bound provided by our result is considerably tighter. 1 1
Using sample size to limit exposure to data mining
 Journal of Computer Security
"... Data mining introduces new problems in database security. The basic problem of using nonsensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determi ..."
Abstract

Cited by 38 (8 self)
 Add to MetaCart
Data mining introduces new problems in database security. The basic problem of using nonsensitive data to infer sensitive data is made more difficult by the “probabilistic” inferences possible with data mining. This paper shows how lower bounds from pattern recognition theory can be used to determine sample sizes where data mining tools cannot obtain reliable results. 1
Concentration inequalities for the missing mass and for histogram rule error
 Journal of Machine Learning Research
, 2003
"... This paper gives distributionfree concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are hig ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper gives distributionfree concentration inequalities for the missing mass and the error rate of histogram rules. Negative association methods can be used to reduce these concentration problems to concentration questions about independent sums. Although the sums are independent, they are highly heterogeneous. Such highly heterogeneous independent sums cannot be analyzed using standard concentration inequalities such as Hoeffding’s inequality, the AngluinValiant bound, Bernstein’s inequality, Bennett’s inequality, or McDiarmid’s theorem. 1
Online Confidence Machines are wellcalibrated
 In Proceedings of the Forty Third Annual Symposium on Foundations of Computer Science
, 2002
"... praktiqeskie vyvody teorii vero�tnoste� mogut bytь obosnovany v kaqestve sledstvi� gipotez o predelьno� pri dannyh ograniqeni�h sloжnosti izuqaemyh �vleni� ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
praktiqeskie vyvody teorii vero�tnoste� mogut bytь obosnovany v kaqestve sledstvi� gipotez o predelьno� pri dannyh ograniqeni�h sloжnosti izuqaemyh �vleni�
Geometric Decision Rules for Instancebased Learning Problems
"... In the typical nonparametric approach to classification in instancebased learning and data mining, random data (the training set of patterns) are collected and used to design a decision rule (classifier). ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
In the typical nonparametric approach to classification in instancebased learning and data mining, random data (the training set of patterns) are collected and used to design a decision rule (classifier).
Online regression competitive with reproducing kernel Hilbert spaces
, 2005
"... We consider the problem of online prediction of realvalued labels of new objects. The prediction algorithm’s performance is measured by the squared deviation of the predictions from the actual labels. No probabilistic assumptions are made about the way the labels and objects are generated. Instead ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We consider the problem of online prediction of realvalued labels of new objects. The prediction algorithm’s performance is measured by the squared deviation of the predictions from the actual labels. No probabilistic assumptions are made about the way the labels and objects are generated. Instead, we are given a benchmark class of prediction rules some of which are hoped to produce good predictions. We show that for a wide range of infinitedimensional benchmark classes one can construct a prediction algorithm whose cumulative loss over the first N examples does not exceed the cumulative loss of any prediction rule in the class plus O ( √ N). Our proof technique is based on the recently developed method of defensive forecasting. 1
PACBayesian Generalization Bound for Density Estimation with Application to Coclustering
"... We derive a PACBayesian generalization bound for density estimation. Similar to the PACBayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KLdivergence of the posterior from the prior. Moreover, the PACB ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
We derive a PACBayesian generalization bound for density estimation. Similar to the PACBayesian generalization bound for classification, the result has the appealingly simple form of a tradeoff between empirical performance and the KLdivergence of the posterior from the prior. Moreover, the PACBayesian generalization bound for classification can be derived as a special case of the bound for density estimation. To illustrate a possible application of our bound we derive a generalization bound for coclustering. The bound provides a criterion to evaluate the ability of coclustering to predict new cooccurrences, thus introducing the notion of generalization to this traditionally unsupervised task. 1