Results 1  10
of
20
Stability and Generalization
, 2001
"... We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classif ..."
Abstract

Cited by 167 (6 self)
 Add to MetaCart
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a realvalued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and KullbackLeibler regularization. We demonstrate how to apply the results to SVM for regression and classification.
Algorithmic Stability and SanityCheck Bounds for LeaveOneOut CrossValidation
 Neural Computation
, 1997
"... In this paper we prove sanitycheck bounds for the error of the leaveoneout crossvalidation estimate of the generalization error: that is, bounds showing that the worstcase error of this estimate is not much worse than that of the training error estimate. The name sanitycheck refers to the fact ..."
Abstract

Cited by 100 (0 self)
 Add to MetaCart
In this paper we prove sanitycheck bounds for the error of the leaveoneout crossvalidation estimate of the generalization error: that is, bounds showing that the worstcase error of this estimate is not much worse than that of the training error estimate. The name sanitycheck refers to the fact that although we often expect the leaveoneout estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on crossvalidation. Any nontrivial bound on the error of leaveoneout must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearestneighbor and other local algorithms. Here we introduce the new and weaker notion of error stability, and apply it to obtain sanitycheck b...
Estimating the Generalization Performance of an SVM Efficiently
, 2000
"... This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the new estimators are computationally much more ecient than crossvalidation or bootstrap, since they ca ..."
Abstract

Cited by 95 (1 self)
 Add to MetaCart
This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the new estimators are computationally much more ecient than crossvalidation or bootstrap, since they can be computed immediately from the form of the hypothesis returned by the SVM. Moreover, the estimators delevoped here address the special performance measures needed for text classification. While they can be used to estimate error rate, one can also estimate the recall, the precision, and the F 1 . A theoretical analysis and experiments on three text classification collections show that the new method can effectively estimate the performance of SVM text classifiers in a very efficient way.
Generalization Bounds for Ranking Algorithms via Algorithmic Stability
 J. of Machine Learning Research
"... The problem of ranking, in which the goal is to learn a realvalued ranking function that induces a ranking or ordering over an instance space, has recently gained much attention in machine learning. We study generalization properties of ranking algorithms using the notion of algorithmic stability; ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The problem of ranking, in which the goal is to learn a realvalued ranking function that induces a ranking or ordering over an instance space, has recently gained much attention in machine learning. We study generalization properties of ranking algorithms using the notion of algorithmic stability; in particular, we derive generalization bounds for ranking algorithms that have good stability properties. We show that kernelbased ranking algorithms that perform regularization in a reproducing kernel Hilbert space have such stability properties, and therefore our bounds can be applied to these algorithms; this is in contrast with generalization bounds based on uniform convergence, which in many cases cannot be applied to these algorithms. Our results generalize earlier results that were derived in the special setting of bipartite ranking (Agarwal and Niyogi, 2005) to a more general setting of the ranking problem that arises frequently in applications.
Relation between PermutationTest P Values and Classifier Error Estimates
 Machine Learn Eraing, Special Issue on Machine Learning in the Genomics 2003;52:11–30
, 2003
"... Geneexpressionbased classifiers suffer from the small number of microarrays usually available for classifier design. Hence, one is confronted with the dual problem of designing a classifier and estimating its error with only a small sample. Permutation testing has been recommended to assess the de ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Geneexpressionbased classifiers suffer from the small number of microarrays usually available for classifier design. Hence, one is confronted with the dual problem of designing a classifier and estimating its error with only a small sample. Permutation testing has been recommended to assess the dependency of a designed classifier on the specific data set. This involves randomly permuting the labels of the data points, estimating the error of the designed classifiers for each permutation, and then finding the p value of the error for the actual labeling relative to the population of errors for the random labelings. This paper addresses the issue of whether or not this p value is informative. It provides both analytic and simulation results to show that the permutation p value is, up to very small deviation, a function of the error estimate. Moreover, even though the p value is a monotonically increasing function of the error estimate, in the range of the error where the majority of the p values lie, the function is very slowly increasing, so that inversion is problematic. Hence, the conclusion is that the p value is less informative than the error estimate. This result demonstrates that random labeling does not provide any further insight into the accuracy of the classifier or the precision of the error estimate. We have no knowledge beyond the error estimate itself and the various distributionfree, classifierspecific bounds developed for this estimate.
Pattern Recognition for Conditionally Independent Data
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... In this work we consider the task of relaxing the i.i.d. assumption in pattern recognition (or classification) , aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete label of some object based on a set of given examples (pairs ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
In this work we consider the task of relaxing the i.i.d. assumption in pattern recognition (or classification) , aiming to make existing learning algorithms applicable to a wider range of tasks. Pattern recognition is guessing a discrete label of some object based on a set of given examples (pairs of objects and labels). We consider the case of deterministically defined labels. Traditionally, this task is studied under the assumption that examples are independent and identically distributed. However, it turns out that many results of pattern recognition theory carry over a weaker assumption. Namely, under
Theoretical and Practical Model Selection Methods for Support Vector Classifiers
 in L. Wang (Ed.), Support Vector Machines: Theory and Applications
, 2005
"... Abstract. In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are o ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Abstract. In this chapter, we revise several methods for SVM model selection, deriving from different approaches: some of them build on practical lines of reasoning but are not fully justified by a theoretical point of view; on the other hand, some methods rely on rigorous theoretical work but are of little help when applied to real–world problems, because the underlying hypotheses cannot be verified or the result of their application is uninformative. Our objective is to sketch some light on these issues by carefully analyze the most well–known methods and test some of them on standard benchmarks to evaluate their effectiveness.
A Learning Theory Framework for Association Rules and Sequential Events A Learning Theory Framework for Association Rules and Sequential Events
"... Editor: We present a framework and generalization analysis for the use of association rules in the setting of supervised learning. We are specifically interested in a sequential event prediction problem where data are revealed one by one, and the goal is to determine what will next be revealed. In t ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Editor: We present a framework and generalization analysis for the use of association rules in the setting of supervised learning. We are specifically interested in a sequential event prediction problem where data are revealed one by one, and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, to our knowledge there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms that incorporate association rules. These algorithms can be used both for sequential event prediction and for supervised classification. We provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence ” measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis.
CrossValidation and MeanSquare Stability
"... Abstract: kfold cross validation is a popular practical method to get a good estimate of the error rate of a learning algorithm. Here, the set of examples is first partitioned into k equalsized folds. Each fold acts as a test set for evaluating the hypothesis learned on the other k − 1 folds. The ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract: kfold cross validation is a popular practical method to get a good estimate of the error rate of a learning algorithm. Here, the set of examples is first partitioned into k equalsized folds. Each fold acts as a test set for evaluating the hypothesis learned on the other k − 1 folds. The average error across the k hypotheses is used as an estimate of the error rate. Although widely used, especially with small values of k (such as 10), the technique has heretofore resisted theoretical analysis. With only sanitycheck bounds known, there is not a compelling reason to use the kfold crossvalidation estimate over a simpler holdout estimate. The complications stem from the fact that the k distinct estimates have intricate correlations between them. Conventional wisdom is that the averaging in crossvalidation leads to a tighter concentration of the estimate of the error around its mean. In this paper, we show that the conventional wisdom is essentially correct. We analyze the reduction in variance of the gap between the crossvalidation estimate and the true error rate, and show that for a large family of stable algorithms, crossvalidation achieves a near optimal variance reduction factor of (1+o(1))/k. In these cases the k different estimates are essentially acting independent of each other. To proceed with the analysis, we define a new measure of algorithm stability, called meansquare stability. Meansquare stability is weaker than most stability notions described in the literature, and encompasses a large class of algorithms, namely bounded SVM regression and regularized leastsquares regression, among others. For slightly less stable algorithms, such as tNearestNeighbor, we show that cross validation leads to an O ( 1 / √ k) reduction in the variance of the generalization error.
Sequential Event Prediction with Association Rules
"... We consider a supervised learning problem in which data are revealed sequentially and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider a supervised learning problem in which data are revealed sequentially and the goal is to determine what will next be revealed. In the context of this problem, algorithms based on association rules have a distinct advantage over classical statistical and machine learning methods; however, there has not previously been a theoretical foundation established for using association rules in supervised learning. We present two simple algorithms that incorporate association rules, and provide generalization guarantees on these algorithms based on algorithmic stability analysis from statistical learning theory. We include a discussion of the strict minimum support threshold often used in association rule mining, and introduce an “adjusted confidence ” measure that provides a weaker minimum support condition that has advantages over the strict minimum support. The paper brings together ideas from statistical learning theory, association rule mining and Bayesian analysis. 1