Results 1 
6 of
6
Distributionfree consistency results in nonparametric discrimination and regression function estimation
 Ann. Statist
, 1980
"... (X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i1 Y,k((X, x)/h)/7. n~1k((Xi x)/h,?) where k is a bounded nonnegative function on Rd with compact ..."
Abstract

Cited by 41 (9 self)
 Add to MetaCart
(X,,, Y„) be a random sample drawn from its distribution. We study the consistency properties of the kernel estimate m(x) of the regression function m(x) = E { Y X = x} that is defined by m(x) = ~ i1 Y,k((X, x)/h)/7. n~1k((Xi x)/h,?) where k is a bounded nonnegative function on Rd with compact support and (h,? ) is a sequence of positive numbers satisfying h „~,,0, nh,'n oo. It is shown that E { f I m„ (x) m(x)rµ(dx))~,,0 whenever E(I YAP) < x (p> 1). No other restrictions are placed on the distribution of (X, Y). The result is applied to verify the Bayes risk consistency of the corresponding discrimination rules. 1. Introduction and summary. In this paper we present consistency results for the nonparametric regression function estimation problem. Assume that (X, Y), (X1, Y1),. • • , (Xn, Yn) are independent identically distributed Rd x Rvalued random vectors with E { I Y I} C oo. The purpose is to estimate the regression function m(x) = E{YIX = x}
Bounds for the uniform deviation of empirical measures
 Journal of Multivariate Analysis
, 1982
"... If x,)...) X, are independent identically distributed Rdvalued random vectors with probability measure p and empirical probability measure p,, and if QZ is a subset of the Bore1 sets on Rd, then we show that P{sup,, ~ ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
If x,)...) X, are independent identically distributed Rdvalued random vectors with probability measure p and empirical probability measure p,, and if QZ is a subset of the Bore1 sets on Rd, then we show that P{sup,, ~
Optimal rates of convergence to Bayes risk in nonparametric discrimination
 Ph.D. Dissertation, UCLA
, 1982
"... Consider the multiclassification (discrimination) problem with known prior probabilities and a multidimensional vector of observations. Assume the underlying densities corresponding to the various classes are unknown but a training sample of size N is available from each class. Rates of convergence ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
Consider the multiclassification (discrimination) problem with known prior probabilities and a multidimensional vector of observations. Assume the underlying densities corresponding to the various classes are unknown but a training sample of size N is available from each class. Rates of convergence to Bayes Risk are investigated under smoothness conditions on the underlying densities of the type often seen in nonparametric density estimation. These rates can be drastically affected by a small change in the prior probabilities, so the error criterion used here is Bayes Risk • ~ averaged (uniformly) over all prior probabilities. Then it is shown that a certain rate, N r, is optimal in the sense that no rule can do better (uniformly over the class of smooth densities) and a rule is exhibited which does that well. The optimal value of r depends on the smoothness and the dimensionality of the observations in the same way as for nonparametric density estimation with integrated square error loss. 1• I NTRODUCTI ON The classification or discrimination problem arises whenever one wants to assign an object to one of a finite number of classes based on a vector of d measurements. More precisely, let f
The strong uniform consistency of nearest neighbor density estimates
 Annals of Statistics
, 1977
"... Let X,, • • •, Xn be independent, identically distributed random vectors with values in]Rd and with a common probability density f. If Vk(x) is the volume o f the smallest sphere centered at x and containing at least k of the Xi, • • •, Xn then fn(x) = k/(nVk (x)) is a nearest neighbor density ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Let X,, • • •, Xn be independent, identically distributed random vectors with values in]Rd and with a common probability density f. If Vk(x) is the volume o f the smallest sphere centered at x and containing at least k of the Xi, • • •, Xn then fn(x) = k/(nVk (x)) is a nearest neighbor density estimate of f. We show that if k = k(n) satisfies k(n)/n ~ 0 and k(n)/log n * 00 then 51 x I fn(x) f(x)J ~ 0 w.p. l when f is uniformly continuous on Rd. Introduction. Suppose that Xi, • • • , Xn are independent, identically distributed random vectors with values in W and with a common probability density f. If Vk (x) is the volume of the smallest sphere centered at x and containing at least k of the random vectors X,, • • • , Xn, then Loftsgaarden and Quesenberry (1965), to estimate f(x) from X,, • • •, Xn, let
A NOTE ON THE L1 CONSISTENCY OF VARIABLE KERNEL ESTIMATES
, 1985
"... A sample X,, •.., X, ~ of i.i.d. R dvalued random vectors with common density f is used to construct the density estimate fn(x) _ ( 1/n) ~n, HndK((x Xi)/Hm), where K is a given density on Rd, and the H's are positive functions of n, i and X,, • • • , X „ (but not of x). The H, 's can be thou ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
A sample X,, •.., X, ~ of i.i.d. R dvalued random vectors with common density f is used to construct the density estimate fn(x) _ ( 1/n) ~n, HndK((x Xi)/Hm), where K is a given density on Rd, and the H's are positive functions of n, i and X,, • • • , X „ (but not of x). The H, 's can be thought of as locally adapted smoothing parameters. We give sufficient conditons for the weak convergence to 0 of f I fn f I for all f. This is illustrated for the estimate of Breiman, Meisel and Purcell (1977).
Automatic Pattern Recognition: A Study of the Probability of Error
"... AbstractA test sequence is used to select the best rule from a rich class of discrimination rules defined in terms of the training sequence. The VapnikChervonenkis and related inequalities are used to obtain distributionfree bounds on the difference between the probability of error o € the select ..."
Abstract
 Add to MetaCart
AbstractA test sequence is used to select the best rule from a rich class of discrimination rules defined in terms of the training sequence. The VapnikChervonenkis and related inequalities are used to obtain distributionfree bounds on the difference between the probability of error o € the selected rule and the probability of error of the best rule in the given class. The bounds are used to prove the consistency and asymptotic optimality for several popular classes, including linear discriminators, nearest neighbor rules, kernelbased rules, histogram rules, binary tree classifiers, and Fourier series classifiers. In particular, the method can be used to choose the smoothing parameter in kernelbased rules, to choose k in the knearest neighbor rule, and to choose between parametric and nonparametric rules. Index TermsAutomatic parameter selection, empirical risk, error estimation, nonparametric rule, probability of error, statistical pattern recognition, VapnikChervonenkis inequality.