Results 11 - 20
of
92
A few notes on Statistical Learning Theory
, 2003
"... this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interest ..."
Abstract
-
Cited by 41 (10 self)
- Add to MetaCart
this article is on the theoretical side and not on the applicative one; hence, we shall not present examples which may be interesting from the practical point of view but have little theoretical significance. This survey is far from being complete and it focuses on problems the author finds interesting (an opinion which is not necessarily shared by the majority of the learning community). Relevant books which present a more evenly balanced approach are, for example [1, 4, 35, 36] The starting point of our discussion is the formulation of the learning problem. Consider a class G, consisting of real valued functions defined on a space #, and assume that each g G maps # into [0, 1]. Let T be an unknown function, T : # [0, 1] and set to be an unknown probability measure on #
On the Rate of Convergence of Regularized Boosting Classifiers
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
A regularized boosting method is introduced, for which regularization is obtained through a penalization function. It is shown through oracle inequalities that this method is model adaptive. The rate of convergence of the probability of misclassification is investigated. It is shown that for quite a large class of distributions, the probability of error converges to the Bayes risk at a rate faster than n -(V+2)/(4(V+1)) where V is the VC dimension of the "base" class whose elements are combined by boosting methods to obtain an aggregated classifier. The dimension-independent nature of the rates may partially explain the good behavior of these methods in practical problems. Under Tsybakov's noise condition the rate of convergence is even faster. We investigate the conditions necessary to obtain such rates for different base classes. The special case of boosting using decision stumps is studied in detail. We characterize the class of classifiers realizable by aggregating decision stumps.
Learning and Value Function Approximation in Complex Decision Processes
, 1998
"... In principle, a wide variety of sequential decision problems -- ranging from dynamic resource allocation in telecommunication networks to financial risk management -- can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and sto ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
In principle, a wide variety of sequential decision problems -- ranging from dynamic resource allocation in telecommunication networks to financial risk management -- can be formulated in terms of stochastic control and solved by the algorithms of dynamic programming. Such algorithms compute and store a value function, which evaluates expected future reward as a function of current state. Unfortunately, exact computation of the value function typically requires time and storage that grow proportionately with the number of states, and consequently, the enormous state spaces that arise in practical applications render the algorithms intractable. In this thesis, we study tractable methods that approximate the value function. Our work builds on research in an area of artificial intelligence known as reinforcement learning. A point of focus of this thesis is temporal-difference learning -- a stochastic algorithm inspired to some extent by phenomena observed in animal behavior. Given a selection of...
Adaptive model selection using empirical complexities
- Annals of Statistics
, 1999
"... Key words and phrases. Complexity regularization, classi cation, pattern recognition, regression estimation, curve tting, minimum description length. 1 Given n independent replicates of a jointly distributed pair (X; Y) 2R d R, we wish to select from a xed sequence of model classes F1; F2;:::a deter ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
Key words and phrases. Complexity regularization, classi cation, pattern recognition, regression estimation, curve tting, minimum description length. 1 Given n independent replicates of a jointly distributed pair (X; Y) 2R d R, we wish to select from a xed sequence of model classes F1; F2;:::a determin-istic prediction rule f: R d! R whose risk is small. We investigate the possibility of empirically assessing the complexity of each model class, that is, the actual di culty of the estimation problem within each class. The estimated complexities are in turn used to de ne an adaptive model selection procedure, which is based on complexity penalized empirical risk. The available data are divided into two parts. The rst is used to form an empirical cover of each model class, and the second is used to select a candidate rule from each cover based on empirical risk. The covering radii are determined empirically to optimize a tight upper bound on the estimation error.
Clustering for Edge-Cost Minimization
"... Leonard J. Schulman College of Computing Georgia Institute of Technology Atlanta GA 30332-0280 ABSTRACT We address the problem of partitioning a set of n points into clusters, so as to minimize the sum, over all intracluster pairs of points, of the cost associated with each pair. We obtain a ra ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
Leonard J. Schulman College of Computing Georgia Institute of Technology Atlanta GA 30332-0280 ABSTRACT We address the problem of partitioning a set of n points into clusters, so as to minimize the sum, over all intracluster pairs of points, of the cost associated with each pair. We obtain a randomized approximation algorithm for this problem, for the cost functions ` 2 2 ; `1 and `2 , as well as any cost function isometrically embeddable in ` 2 2 .
Introduction to Statistical Learning Theory
- In , O. Bousquet, U.v. Luxburg, and G. Rsch (Editors
, 2004
"... ..."
Bounds for the uniform deviation of empirical measures
- Journal of Multivariate Analysis
, 1982
"... If x,)...) X, are independent identically distributed Rd-valued random vectors with probability measure p and empirical probability measure p,, and if QZ is a subset of the Bore1 sets on Rd, then we show that P{sup,, ~ ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
If x,)...) X, are independent identically distributed Rd-valued random vectors with probability measure p and empirical probability measure p,, and if QZ is a subset of the Bore1 sets on Rd, then we show that P{sup,, ~
Piecewise-polynomial regression trees
- Statistica Sinica
, 1994
"... A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed Piecewise-Polynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
A nonparametric function 1 estimation method called SUPPORT (“Smoothed and Unsmoothed Piecewise-Polynomial Regression Trees”) is described. The estimate is typically made up of several pieces, each piece being obtained by fitting a polynomial regression to the observations in a subregion of the data space. Partitioning is car-ried out recursively as in a tree-structured method. If the estimate is required to be smooth, the polynomial pieces may be glued together by means of weighted averaging. The smoothed estimate is thus obtained in three steps. In the first step, the regressor space is recursively partitioned until the data in each piece are adequately fitted by a polynomial of a fixed order. Partitioning is guided by analysis of the distributions of residuals and cross-validation estimates of prediction mean square error. In the sec-ond step, the data within a neighborhood of each partition are fitted by a polynomial. The final estimate of the regression function is obtained by averaging the polynomial pieces, using smooth weight functions each of which diminishes rapidly to zero outside its associated partition. Estimates of derivatives of the regression function may be
Vapnik-Chervonenkis classes of definable sets
- J. London Math. Soc
, 1992
"... We show that a class of subsets of a structure uniformly definable by a first-order formula is a Vapnik-Chervonenkis class if and only if the formula does not have the independence property. Via this connection we obtain several new examples of Vapnik-Chervonenkis classes, including sets of positivi ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
We show that a class of subsets of a structure uniformly definable by a first-order formula is a Vapnik-Chervonenkis class if and only if the formula does not have the independence property. Via this connection we obtain several new examples of Vapnik-Chervonenkis classes, including sets of positivity of finitely subanalytic functions.

