Results 1  10
of
12
Introduction to Statistical Learning Theory
 In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
Risk bounds for Statistical Learning
"... We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weig ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the ”size”of a class of classi…ers than entropy with bracketing as in Tsybakov’s work. In particular we derive new risk bounds for the ERM when the classi…cation rules belong to some VCclass under margin conditions and discuss the optimality of those bounds in a minimax sense.
Concentration inequalities
 Advanced Lectures in Machine Learning
, 2004
"... Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical a ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools. 1
Complexity regularization via localized random penalties
, 2004
"... In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small empirical loss. The penalties are novel since those considered in the literature are typically based on the entire model class. Oracle inequalities using these penalties are established, and the advantage of the new penalties over those based on the complexity of the whole model class is demonstrated.
Ranking and scoring using empirical risk minimization
 Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a Ustatistic. Inequalities from the theory of Ustatistics and Uprocesses are ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a Ustatistic. Inequalities from the theory of Ustatistics and Uprocesses are used to obtain performance bounds for the empirical risk minimizers. Convex risk minimization methods are also studied to give a theoretical framework for ranking algorithms based on boosting and support vector machines. Just like in binary classification, fast rates of convergence are achieved under certain noise assumption. General sufficient conditions are proposed in several special cases that guarantee fast rates of convergence. 1
Minimum Probability of Error Image Retrieval
 IEEE Trans. Signal Processing
"... Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower and upperboun ..."
Abstract

Cited by 21 (14 self)
 Add to MetaCart
Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower and upperbounded by functions of the Bayes and density estimation errors, and the impact of the components of the retrieval architecture (namely, the feature transformation and density estimation) on these bounds is characterized. This characterization suggests interpreting the search for the MPE feature set as the search for the minimum of the convex hull of a collection of curves of probability of error versus feature space dimension. A new algorithm for MPE feature design, based on a dictionary of empirical feature sets and the wrapper model for feature selection, is proposed. It is shown that, unlike traditional feature selection techniques, this algorithm scales to problems containing large numbers of classes. Experimental evaluation reveals that the MPE architecture is at least as good as popular empirical solutions on the narrow domains where these perform best but significantly outperforms them outside these domains. Index Terms—Bayesian methods, color and texture, expectation–maximization, feature selection, image retrieval, image similarity, minimum probability of error, mixture models, multiresolution, optimal retrieval systems, wrapper methods. I.
RANKING AND EMPIRICAL MINIMIZATION OF USTATISTICS
"... The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is “better, ” with minimum ranking risk. Since the natural estimates of the risk are of the form of a Ustatistic, results of the theory of Uprocesses are required for investigating the consistency of empirical risk minimizers. We establish, in particular, a tail inequality for degenerate Uprocesses, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied. 1. Introduction. Motivated
Learning classes of probabilistic automata
 In COLT 2004, number 3120 in LNAI
, 2004
"... Abstract. Probabilistic finite automata (PFA) model stochastic languages, i.e. probability distributions over strings. Inferring PFA from stochastic data is an open field of research. We show that PFA are identifiable in the limit with probability one. Multiplicity automata (MA) is another device to ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Abstract. Probabilistic finite automata (PFA) model stochastic languages, i.e. probability distributions over strings. Inferring PFA from stochastic data is an open field of research. We show that PFA are identifiable in the limit with probability one. Multiplicity automata (MA) is another device to represent stochastic languages. We show that a MA may generate a stochastic language that cannot be generated by a PFA, but we show also that it is undecidable whether a MA generates a stochastic language. Finally, we propose a learning algorithm for a subclass of PFA, called PRFA. 1
Geometric Parameters in Learning Theory
 Manuscript, Research School of Information Sciences and Engineering, Australian National University, submitted
, 2003
"... this article would encourage mathematicians to investigate these seemingly "applied" problems, which are, in fact, linked to interesting theoretical questions ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
this article would encourage mathematicians to investigate these seemingly "applied" problems, which are, in fact, linked to interesting theoretical questions