Results 1  10
of
29
Stability and Generalization
, 2001
"... We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classif ..."
Abstract

Cited by 167 (6 self)
 Add to MetaCart
We define notions of stability for learning algorithms and show how to use these notions to derive generalization error bounds based on the empirical error and the leaveoneout error. The methods we use can be applied in the regression framework as well as in the classification one when the classifier is obtained by thresholding a realvalued function. We study the stability properties of large classes of learning algorithms such as regularization based algorithms. In particular we focus on Hilbert space regularization and KullbackLeibler regularization. We demonstrate how to apply the results to SVM for regression and classification.
A Generalized Representer Theorem
 In Proceedings of the Annual Conference on Computational Learning Theory
, 2001
"... Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empir ..."
Abstract

Cited by 136 (17 self)
 Add to MetaCart
Wahba's classical representer theorem states that the solutions of certain risk minimization problems involving an empirical risk term and a quadratic regularizer can be written as expansions in terms of the training examples. We generalize the theorem to a larger class of regularizers and empirical risk terms, and give a selfcontained proof utilizing the feature space associated with a kernel. The result shows that a wide range of problems have optimal solutions that live in the finite dimensional span of the training examples mapped into feature space, thus enabling us to carry out kernel algorithms independent of the (potentially infinite) dimensionality of the feature space.
Regularization and semisupervised learning on large graphs
 In COLT
, 2004
"... Abstract. We consider the problem of labeling a partially labeled graph. This setting may arise in a number of situations from survey sampling to information retrieval to pattern recognition in manifold settings. It is also of potential practical importance, when the data is abundant, but labeling i ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
Abstract. We consider the problem of labeling a partially labeled graph. This setting may arise in a number of situations from survey sampling to information retrieval to pattern recognition in manifold settings. It is also of potential practical importance, when the data is abundant, but labeling is expensive or requires human assistance. Our approach develops a framework for regularization on such graphs. The algorithms are very simple and involve solving a single, usually sparse, system of linear equations. Using the notion of algorithmic stability, we derive bounds on the generalization error and relate it to structural invariants of the graph. Some experimental results testing the performance of the regularization algorithm and the usefulness of the generalization bound are presented. 1
AlmostEverywhere Algorithmic Stability and Generalization Error
 In UAI2002: Uncertainty in Artificial Intelligence
, 2002
"... We introduce a new notion of algorithmic stability, which we call training stability. ..."
Abstract

Cited by 44 (8 self)
 Add to MetaCart
We introduce a new notion of algorithmic stability, which we call training stability.
Magnitudepreserving ranking algorithms
, 2007
"... This paper studies the learning problem of ranking when one wishes not just to accurately predict pairwise ordering but also preserve the magnitude of the preferences or the difference between ratings, a problem motivated by its key importance in the design of search engines, movie recommendation, a ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
This paper studies the learning problem of ranking when one wishes not just to accurately predict pairwise ordering but also preserve the magnitude of the preferences or the difference between ratings, a problem motivated by its key importance in the design of search engines, movie recommendation, and other similar ranking systems. We describe and analyze several algorithms for this problem and give stability bounds for their generalization error, extending previously known stability results to nonbipartite ranking and magnitude of preferencepreserving algorithms. We also report the results of experiments comparing these algorithms on several datasets and compare these results with those obtained using an algorithm minimizing the pairwise misranking error and standard regression. 1.
Extensions to McDiarmid’s inequality when differences are bounded with high probability
"... The method of independent bounded differences (McDiarmid, 1989) gives largedeviation concentration bounds for multivariate functions in terms of the maximum effect that changing one coordinate of the input can have on the output. This method has been widely used in combinatorial applications, and in ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
The method of independent bounded differences (McDiarmid, 1989) gives largedeviation concentration bounds for multivariate functions in terms of the maximum effect that changing one coordinate of the input can have on the output. This method has been widely used in combinatorial applications, and in learning theory. In some recent applications to the theory of algorithmic stability (Kutin and Niyogi, 2002), we need to consider the case where changing one coordinate of the input usually leads to a small change in the output, but not always. We prove two extensions to McDiarmid’s inequality. The first applies when, for most inputs, any small change leads to a small change in the output. The second applies when, for a randomly selected input and a random onecoordinate change, the change in the output is usually small. 1
Regression and Classification with Regularization
, 2002
"... The purpose of this chapter is to present a theoretical framework for the problem of learning from examples. Learning from examples can be regarded [13] as the problem of approximating a multivariate function from sparse data. The function can be real valued as in regression or binary valued as in c ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The purpose of this chapter is to present a theoretical framework for the problem of learning from examples. Learning from examples can be regarded [13] as the problem of approximating a multivariate function from sparse data. The function can be real valued as in regression or binary valued as in classification. The problem of approximating a function from sparse data is illposed and a classical solution is regularization theory [19]. Regularization theory, as we will consider here, formulates the regression problem as a variational problem of finding the function f that minimizes the functional K (6.1) where V (; ) is a loss function (in the classical formulation the square loss was used), kfk K is a norm in a Reproducing Kernel Hilbert Space (RKHS) H de ned by the positive definite function K, ` is the number of data points or examples (the ` training pairs (x i ; y i )) and is the regularization parameter. Under rather general conditions [14, 22, ...
The interaction of stability and weakness in AdaBoost
, 2001
"... We provide an analysis of AdaBoost within the framework of algorithmic stability. In particular, we show that AdaBoost is a stabilitypreserving operation: if the \input" (the weak learner) to AdaBoost is stable, then the \output" (the strong learner) is almosteverywhere stable. Because classier com ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
We provide an analysis of AdaBoost within the framework of algorithmic stability. In particular, we show that AdaBoost is a stabilitypreserving operation: if the \input" (the weak learner) to AdaBoost is stable, then the \output" (the strong learner) is almosteverywhere stable. Because classier combination schemes such as AdaBoost have greatest eect when the weak learner is weak, we discuss weakness and its implications. We also show that the notion of almosteverywhere stability is sucient for good bounds on generalization error. These bounds hold even when the weak learner has innite VC dimension. 1
Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization
 Advances in Computational Mathematics
, 2002
"... version corrects some of the remaining typos and imprecisions. ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
version corrects some of the remaining typos and imprecisions.