Results 1 
6 of
6
Efficient Estimators for Generalized Additive Models
, 2005
"... Generalized additive models are a powerful generalization of linear and logistic regression models. In this paper we show that a natural regression graph learning algorithm efficiently learns generalized additive models. Efficiency is proven in two senses: the estimator’s future prediction accuracy ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Generalized additive models are a powerful generalization of linear and logistic regression models. In this paper we show that a natural regression graph learning algorithm efficiently learns generalized additive models. Efficiency is proven in two senses: the estimator’s future prediction accuracy approaches optimality at rate inverse polynomial in the size of the training data, and its runtime is polynomial in the size of the training data. Furthermore, the guarantees are nearly linear in terms of the dimensionality (number of regressors) of the problem, and hence the algorithm does not suffer from the “curse of dimensionality. ” The algorithm is a simple generalization of Mansour and McAllester’s classification algorithm that generates decision graphs, i.e., decision trees with merges. Our analysis is also viewed as defining a natural extension of the original classification boosting theorems (Schapire, 1990) to the regression setting. Loosely speaking, we define a weak correlator to be a realvalued predictor that has a correlation coefficient with the target function that is bounded from zero. We show how to efficiently boost weak correlators to get predictions with correlation arbitrarily close to 1 (error arbitrarily close to 0). Our boosting analysis is a natural extension of the classification boosting analysis of Kearns and Mansour (1999) and Mansour and McAllester (2002).
Risk bounds for random regression graphs
 Foundations of Computational Mathematics
"... Abstract. We consider the regression problem and describe an algorithm approximating the regression function by estimators piecewise constant on the elements of an adaptive partition. The partitions are iteratively constructed by suitable random merges and splits, using cuts of arbitrary geometry. W ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the regression problem and describe an algorithm approximating the regression function by estimators piecewise constant on the elements of an adaptive partition. The partitions are iteratively constructed by suitable random merges and splits, using cuts of arbitrary geometry. We give a risk bound under the assumption that a “weak learning hypothesis” holds, and characterize this hypothesis in terms of a suitable RKHS. 1.
Learning Nested Halfspaces and Uphill Decision Trees
"... Abstract. Predicting class probabilities and other realvalued quantities is often more useful than binary classification, but comparatively little work in PACstyle learning addresses this issue. We show that two rich classes of realvalued functions are learnable in the probabilisticconcept framew ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Predicting class probabilities and other realvalued quantities is often more useful than binary classification, but comparatively little work in PACstyle learning addresses this issue. We show that two rich classes of realvalued functions are learnable in the probabilisticconcept framework of Kearns and Schapire. Let X be a subset of Euclidean space and f be a realvalued function on X. We say f is a nested halfspace function if, for each real threshold t, the set {x ∈ Xf(x) ≤ t}, is a halfspace. This broad class of functions includes binary halfspaces with a margin (e.g., SVMs) as a special case. We give an efficient algorithm that provably learns (Lipschitzcontinuous) nested halfspace functions on the unit ball. The sample complexity is independent of the number of dimensions. We also introduce the class of uphill decision trees, which are realvalued decision trees (sometimes called regression trees) in which the sequence of leaf values is nondecreasing. We give an efficient algorithm for provably learning uphill decision trees whose sample complexity is polynomial in the number of dimensions but independent of the size of the tree (which may be exponential). Both of our algorithms employ a realvalued extension of Mansour and McAllester’s boosting algorithm. 1
Microsoft Research One Memorial Drive
"... The Perceptron algorithm elegantly solves binary classification problems that have a margin between positive and negative examples. Isotonic regression (fitting an arbitrary increasing function in one dimension) is also a natural problem with a simple solution. By combining the two, we get a new but ..."
Abstract
 Add to MetaCart
The Perceptron algorithm elegantly solves binary classification problems that have a margin between positive and negative examples. Isotonic regression (fitting an arbitrary increasing function in one dimension) is also a natural problem with a simple solution. By combining the two, we get a new but very simple algorithm with strong guarantees. Our ISOTRON algorithm provably learns Single Index Models (SIM), a generalization of linear and logistic regression, generalized linear models, as well as binary classification by linear threshold functions. In particular, it provably learns SIMs with unknown mean functions that are nondecreasing and Lipschitzcontinuous, thereby generalizing linear and logistic regression and linearthreshold functions (with a margin). Like the Perceptron, it is straightforward to implement and kernelize. Hence, the ISOTRON provides a very simple yet flexible and principled approach to regression. 1
ABSTRACT Non Parametric Learnability of IncomeLipschitz Demand Functions
"... A sequence of prices and demands are rationalizable if there exists a concave, continuous and monotone utility function such that the demands are the maximizers of the utility function over the budget set corresponding to the price. Afriat [1] presented necessary and sufficient conditions for a fini ..."
Abstract
 Add to MetaCart
(Show Context)
A sequence of prices and demands are rationalizable if there exists a concave, continuous and monotone utility function such that the demands are the maximizers of the utility function over the budget set corresponding to the price. Afriat [1] presented necessary and sufficient conditions for a finite sequence to be rationalizable. Varian [30] and later Blundell et al. [5, 6] continued this line of work studying nonparametric methods to forecasts demand. Their methods do not implement any probabilistic model and therefore fall short of giving a general degree of confidence in the forecast. The present paper complements this line of research by introducing a statistical model and a measure of complexity through which we are able to study the learnability of classes of demand functions and derive a degree of confidence in the forecasts. In this paper we develop a framework to study the learnability of real vector valued demand functions through observations on prices and demand. Our results give lower and upper bounds on the sample complexity of PAC learnability and show that the sample complexity of learning a class of vector valued functions with finite fat shattering dimension increases by a linear factor of the dimension. We show that classes of incomeLipschitz demand functions with global bounds on the Lipschitz constant have finite fat shattering dimension. 1.
Boosting in the presence of noise
, 2004
"... Boosting algorithms are procedures that “boost ” lowaccuracy weak learning algorithms to achieve arbitrarily high accuracy. Over the past decade boosting has been widely used in practice and has become a major research topic in computational learning theory. In this paper we study boosting in the p ..."
Abstract
 Add to MetaCart
(Show Context)
Boosting algorithms are procedures that “boost ” lowaccuracy weak learning algorithms to achieve arbitrarily high accuracy. Over the past decade boosting has been widely used in practice and has become a major research topic in computational learning theory. In this paper we study boosting in the presence of random classification noise, giving both positive and negative results. We show that a modified version of a boosting algorithm due to Mansour and McAllester (J. Comput. System Sci. 64(1) (2002) 103) can achieve accuracy arbitrarily close to the noise rate. We also give a matching lower bound by showing that no efficient blackbox boosting algorithm can boost accuracy beyond the noise rate (assuming that oneway functions exist). Finally, we consider a variant of the standard scenario for boosting in which the “weak learner ” satisfies a slightly stronger condition than the usual weak learning guarantee. We give an efficient algorithm in this framework which can boost to arbitrarily high accuracy in the presence of classification noise. © 2004 Elsevier Inc. All rights reserved.