Results 1  10
of
43
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 286 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
The tradeoffs of large scale learning
 IN: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 20
, 2008
"... This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of smallscale and largescale learning problems. Smallscale learning problems are subject to the usual approx ..."
Abstract

Cited by 138 (4 self)
 Add to MetaCart
This contribution develops a theoretical framework that takes into account the effect of approximate optimization on learning algorithms. The analysis shows distinct tradeoffs for the case of smallscale and largescale learning problems. Smallscale learning problems are subject to the usual approximation–estimation tradeoff. Largescale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in nontrivial ways.
Correcting sample selection bias by unlabeled data
"... We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We prese ..."
Abstract

Cited by 130 (9 self)
 Add to MetaCart
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.
Local Rademacher complexities
 Annals of Statistics
, 2002
"... We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a ..."
Abstract

Cited by 106 (18 self)
 Add to MetaCart
We propose new bounds on the error of learning algorithms in terms of a datadependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
Model Selection for Regularized LeastSquares Algorithm in Learning Theory
 Foundations of Computational Mathematics
, 2005
"... Abstract. We investigate the problem of model selection for learning algorithms depending on a continuous parameter. We propose a model selection procedure based on a worst case analysis and dataindependent choice of the parameter. For regularized leastsquares algorithm we bound the generalization ..."
Abstract

Cited by 37 (12 self)
 Add to MetaCart
Abstract. We investigate the problem of model selection for learning algorithms depending on a continuous parameter. We propose a model selection procedure based on a worst case analysis and dataindependent choice of the parameter. For regularized leastsquares algorithm we bound the generalization error of the solution by a quantity depending on few known constants and we show that the corresponding model selection procedure reduces to solving a biasvariance problem. Under suitable smoothness condition on the regression function, we estimate the optimal parameter as function of the number of data and we prove that this choice ensures consistency of the algorithm. 1.
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
A Stochastic View of Optimal Regret through Minimax Duality
"... We study the regret of optimal strategies for online convex optimization games. Using von Neumann’s minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to ..."
Abstract

Cited by 26 (10 self)
 Add to MetaCart
We study the regret of optimal strategies for online convex optimization games. Using von Neumann’s minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary’s action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen’s inequality for a concave functional—the minimizer over the player’s actions of expected loss—defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary. 1
Complexity regularization via localized random penalties
, 2004
"... In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small empirical loss. The penalties are novel since those considered in the literature are typically based on the entire model class. Oracle inequalities using these penalties are established, and the advantage of the new penalties over those based on the complexity of the whole model class is demonstrated.
Nonparametric quantile estimation
, 2006
"... In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
In regression, the desired estimate of yx is not always given by a conditional mean, although this is most common. Sometimes one wants to obtain a good estimate that satisfies the property that a proportion, τ, of yx, will be below the estimate. For τ = 0.5 this is an estimate of the median. What might be called median regression, is subsumed under the term quantile regression. We present a nonparametric version of a quantile estimator, which can be obtained by solving a simple quadratic programming problem and provide uniform convergence statements and bounds on the quantile property of our estimator. Experimental results show the feasibility of the approach and competitiveness of our method with existing ones. We discuss several types of extensions including an approach to solve the quantile crossing problems, as well as a method to incorporate prior qualitative knowledge such as monotonicity constraints. 1.