Results 1 - 10
of
170
Convexity, Classification, and Risk Bounds
- JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2003
"... Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficien ..."
Abstract
-
Cited by 167 (15 self)
- Add to MetaCart
Many of the classification algorithms developed in the machine learning literature, including the support vector machine and boosting, can be viewed as minimum contrast methods that minimize a convex surrogate of the 0-1 loss function. The convexity makes these algorithms computationally efficient. The use of a surrogate, however, has statistical consequences that must be balanced against the computational virtues of convexity. To study these issues, we provide a general quantitative relationship between the risk as assessed using the 0-1 loss and the risk as assessed using any nonnegative surrogate loss function. We show that this relationship gives nontrivial upper bounds on excess risk under the weakest possible condition on the loss function: that it satisfy a pointwise form of Fisher consistency for classification. The relationship is based on a simple variational transformation of the loss function that is easy to compute in many applications. We also present a refined version of this result in the case of low noise. Finally, we
Perspectives on system identification
- In Plenary talk at the proceedings of the 17th IFAC World Congress, Seoul, South Korea
, 2008
"... System identification is the art and science of building mathematical models of dynamic systems from observed input-output data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. As such, it is an ubiquitous ne ..."
Abstract
-
Cited by 160 (3 self)
- Add to MetaCart
System identification is the art and science of building mathematical models of dynamic systems from observed input-output data. It can be seen as the interface between the real world of applications and the mathematical world of control theory and model abstractions. As such, it is an ubiquitous necessity for successful applications. System identification is a very large topic, with different techniques that depend on the character of the models to be estimated: linear, nonlinear, hybrid, nonparametric etc. At the same time, the area can be characterized by a small number of leading principles, e.g. to look for sustainable descriptions by proper decisions in the triangle of model complexity, information contents in the data, and effective validation. The area has many facets and there are many approaches and methods. A tutorial or a survey in a few pages is not quite possible. Instead, this presentation aims at giving an overview of the “science ” side, i.e. basic principles and results and at pointing to open problem areas in the practical, “art”, side of how to approach and solve a real problem. 1.
Lectures on the central limit theorem for empirical processes
- Probability and Banach Spaces
, 1986
"... Abstract. Concentration inequalities are used to derive some new inequalities for ratio-type suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratio-type suprema and to recover anumber of the results from [1] and [2]. As a statistical applica ..."
Abstract
-
Cited by 135 (9 self)
- Add to MetaCart
(Show Context)
Abstract. Concentration inequalities are used to derive some new inequalities for ratio-type suprema of empirical processes. These general inequalities are used to prove several new limit theorems for ratio-type suprema and to recover anumber of the results from [1] and [2]. As a statistical application, an oracle inequality for nonparametric regression is obtained via ratio bounds. 1.
Diffusion Kernels on Statistical Manifolds
, 2004
"... A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian ker ..."
Abstract
-
Cited by 116 (8 self)
- Add to MetaCart
(Show Context)
A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.
Tutorial on Practical Prediction Theory for Classification
, 2005
"... We discuss basic prediction theory and it's impact on classification success evaluation, implications for learning algorithm design, and uses in learning algorithm execution. This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and practicall ..."
Abstract
-
Cited by 109 (3 self)
- Add to MetaCart
We discuss basic prediction theory and it's impact on classification success evaluation, implications for learning algorithm design, and uses in learning algorithm execution. This tutorial is meant to be a comprehensive compilation of results which are both theoretically rigorous and practically useful. There are two important implications...
Theory of classification: A survey of some recent advances
, 2005
"... The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results. ..."
Abstract
-
Cited by 93 (3 self)
- Add to MetaCart
The last few years have witnessed important new developments in the theory and practice of pattern classification. We intend to survey some of the main new ideas that have led to these recent results.
Concentration inequalities
- ADVANCED LECTURES IN MACHINE LEARNING
, 2004
"... Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis o ..."
Abstract
-
Cited by 89 (1 self)
- Add to MetaCart
(Show Context)
Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools.
Fast rates for support vector machines using gaussian kernels
- Ann. Statist
, 2004
"... We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1-SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we ..."
Abstract
-
Cited by 69 (9 self)
- Add to MetaCart
(Show Context)
We establish learning rates up to the order of n −1 for support vector machines with hinge loss (L1-SVMs) and nontrivial distributions. For the stochastic analysis of these algorithms we use recently developed concepts such as Tsybakov’s noise assumption and local Rademacher averages. Furthermore we introduce a new geometric noise condition for distributions that is used to bound the approximation error of Gaussian kernels in terms of their widths. 1