Results 1 - 10
of
12
Introduction to Statistical Learning Theory
- In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
Risk bounds for Statistical Learning
"... We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weig ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
We propose a general theorem providing upper bounds for the risk of an empirical risk minimizer (ERM).We essentially focus on the binary classi…cation framework. We extend Tsybakov’s analysis of the risk of an ERM under margin type conditions by using concentration inequalities for conveniently weighted empirical processes. This allows us to deal with other ways of measuring the ”size”of a class of classi…ers than entropy with bracketing as in Tsybakov’s work. In particular we derive new risk bounds for the ERM when the classi…cation rules belong to some VC-class under margin conditions and discuss the optimality of those bounds in a minimax sense.
Concentration inequalities
- Advanced Lectures in Machine Learning
, 2004
"... Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical a ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Abstract. Concentration inequalities deal with deviations of functions of independent random variables from their expectation. In the last decade new tools have been introduced making it possible to establish simple and powerful inequalities. These inequalities are at the heart of the mathematical analysis of various problems in machine learning and made it possible to derive new efficient algorithms. This text attempts to summarize some of the basic tools. 1
Minimum Probability of Error Image Retrieval
- IEEE Trans. Signal Processing
"... Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower- and upper-boun ..."
Abstract
-
Cited by 19 (13 self)
- Add to MetaCart
Abstract—We address the design of optimal architectures for image retrieval from large databases. Minimum probability of error (MPE) is adopted as the optimality criterion and retrieval formulated as a problem of statistical classification. The probability of retrieval error is lower- and upper-bounded by functions of the Bayes and density estimation errors, and the impact of the components of the retrieval architecture (namely, the feature transformation and density estimation) on these bounds is characterized. This characterization suggests interpreting the search for the MPE feature set as the search for the minimum of the convex hull of a collection of curves of probability of error versus feature space dimension. A new algorithm for MPE feature design, based on a dictionary of empirical feature sets and the wrapper model for feature selection, is proposed. It is shown that, unlike traditional feature selection techniques, this algorithm scales to problems containing large numbers of classes. Experimental evaluation reveals that the MPE architecture is at least as good as popular empirical solutions on the narrow domains where these perform best but significantly outperforms them outside these domains. Index Terms—Bayesian methods, color and texture, expectation–maximization, feature selection, image retrieval, image similarity, minimum probability of error, mixture models, multiresolution, optimal retrieval systems, wrapper methods. I.
Complexity regularization via localized random penalties
, 2004
"... In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
In this article, model selection via penalized empirical loss minimization in nonparametric classification problems is studied. Datadependent penalties are constructed, which are based on estimates of the complexity of a small subclass of each model class, containing only those functions with small empirical loss. The penalties are novel since those considered in the literature are typically based on the entire model class. Oracle inequalities using these penalties are established, and the advantage of the new penalties over those based on the complexity of the whole model class is demonstrated.
Ranking and scoring using empirical risk minimization
- Proceedings of the Eighteenth Annual Conference on Computational Learning Theory (COLT
, 2005
"... Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a U-statistic. Inequalities from the theory of U-statistics and Uprocesses are ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Abstract. A general model is proposed for studying ranking problems. We investigate learning methods based on empirical minimization of the natural estimates of the ranking risk. The empirical estimates are of the form of a U-statistic. Inequalities from the theory of U-statistics and Uprocesses are used to obtain performance bounds for the empirical risk minimizers. Convex risk minimization methods are also studied to give a theoretical framework for ranking algorithms based on boosting and support vector machines. Just like in binary classification, fast rates of convergence are achieved under certain noise assumption. General sufficient conditions are proposed in several special cases that guarantee fast rates of convergence. 1
Learning classes of probabilistic automata
- In COLT 2004, number 3120 in LNAI
, 2004
"... Abstract. Probabilistic finite automata (PFA) model stochastic languages, i.e. probability distributions over strings. Inferring PFA from stochastic data is an open field of research. We show that PFA are identifiable in the limit with probability one. Multiplicity automata (MA) is another device to ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract. Probabilistic finite automata (PFA) model stochastic languages, i.e. probability distributions over strings. Inferring PFA from stochastic data is an open field of research. We show that PFA are identifiable in the limit with probability one. Multiplicity automata (MA) is another device to represent stochastic languages. We show that a MA may generate a stochastic language that cannot be generated by a PFA, but we show also that it is undecidable whether a MA generates a stochastic language. Finally, we propose a learning algorithm for a subclass of PFA, called PRFA. 1
Geometric Parameters in Learning Theory
- Manuscript, Research School of Information Sciences and Engineering, Australian National University, submitted
, 2003
"... this article would encourage mathematicians to investigate these seemingly "applied" problems, which are, in fact, linked to interesting theoretical questions ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
this article would encourage mathematicians to investigate these seemingly "applied" problems, which are, in fact, linked to interesting theoretical questions
RANKING AND EMPIRICAL MINIMIZATION OF U-STATISTICS
"... The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is “better, ” with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish, in particular, a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied. 1. Introduction. Motivated

