## Learning by Canonical Smooth Estimation, Part II: Learning and Choice of Model Complexity (0)

Venue: | IEEE Transactions on Automatic Control |

Citations: | 13 - 2 self |

### BibTeX

@ARTICLE{Buescher_learningby,

author = {Kevin L. Buescher and P. R. Kumar},

title = {Learning by Canonical Smooth Estimation, Part II: Learning and Choice of Model Complexity},

journal = {IEEE Transactions on Automatic Control},

year = {},

volume = {41},

pages = {557--569}

}

### OpenURL

### Abstract

In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypothesis," from a set of candidates that will accurately predict the labels of new sample points. The expected mismatch between a hypothesis' prediction and the actual label of a new sample point is called the hypothesis ' "generalization error." We compare the canonical learner with the traditional technique of finding hypotheses that minimize the relative frequency-based empirical error estimate. We show that, for a broad class of learning problems, the set of cases for which such empirical error minimization works is a proper subset of the cases for which the canonical learner works. We derive bounds to show that the number of samples required by these two methods is comparable. We also add...

### Citations

2320 |
Estimating the dimension of a model
- Schwarz
- 1978
(Show Context)
Citation Context ...k to select the final hypothesis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, see [32], [3=-=3], [34], [35]-=-, [4], [13], [36], [37], [38], and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure that learnin... |

1855 |
A new look at the statistical model identification
- Akaike
- 1974
(Show Context)
Citation Context ...is minimized over k to select the final hypothesis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this me=-=thod, see [32]-=-, [33], [34], [35], [4], [13], [36], [37], [38], and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to e... |

947 |
On the uniform convergence of relative frequencies of events to their probabilities
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ...n . 3.3 The 0/1-valued, Distribution-Free Case We now specialize to the case where the hypotheses and labels are 0/1-valued and P = P , the set of all probability distributions on X \Theta f0; 1g. In =-=[2], Vap-=-nik and Chervonenkis introduced a property of a set of 0/1-valued functions H that determines when (P ; H) is simultaneously estimable by femp . This property has come to be known as the "VapnikC... |

804 |
Estimation of Dependences Based on Empirical Data
- Vapnik
- 1982
(Show Context)
Citation Context ...ply choosing the hypothesis with the least estimated error, since this hypothesis will also have nearly the least true error. Inspired by the pioneering work of Vapnik and Chervonenkis ([2], [3], and =-=[4]-=-), much research has been done to determine when the empirical error estimate based on the labeled sample ~s(m) = (~x(m); ~y(m)), femp [~s(m); h] = 1 m m X i=1 L(h(x i ); y i ); 2 succeeds in simultan... |

723 |
Cross-validatory choice and assessment of statistical predictions
- Stone
- 1974
(Show Context)
Citation Context ...erences between g cl and some other procedures for learning in the literature. Also, we discuss how g cl avoids overfitting the data. Superficially, g cl resembles the method of cross-validation (see =-=[6]-=-). However, the hypotheses selected by cross-validation are usually chosen by minimizing the empirical error on a subsample. The use of an empirical cover is clearly a different approach, since the la... |

625 |
Learnability and the VapnikChervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...sting classes H, learning (P; H) will frequently be impossible because of the requirement that the number of samples used be uniform over P 2 P. Indeed, VCdim(H) ! 1 is necessary for learning (P; H) (=-=[15]-=-). Accordingly, we concentrate on nonuniform learnability in this section. Definition 4.1. (P; H) is nonuniformly learnable if there is a mapping g : ~s 7! H such that, for each P 2 P, jerr(P; g[~s(m)... |

573 |
Convergence of Stochastic Processes
- Pollard
- 1984
(Show Context)
Citation Context ...n bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m femp (ffl; ffi; P ; H) ! q ffl 2 [20 ln(8=ffl) + 8 ln(6=ffi)] : (5) 13 Proof: See the Appendix. In [14], Haussler uses results from =-=[23]-=- to give a bound on m femp that is similar to (5) in that it is of order q ln(1=ffl)=ffl 2 in ffl and q. The proof of the following result appears in the Appendix. Theorem 3.1. With q = VCdim(H); 1sq ... |

400 |
A universal prior for integers and estimation by minimum description length. The Annals of Statistics 11(2
- Rissanen
- 1983
(Show Context)
Citation Context ...imum. That one should resist the temptation to overfit the data is a well-known maxim. This principle is the basis of many estimation methods, such as Rissanen's Minimum Description Length Principle (=-=[13]) and Vapn-=-ik's Principle of Structural Risk Minimization ([4]). These methods penalize a hypothesis' empirical error on the basis of the "complexity" of the class of hypotheses from which it is drawn ... |

373 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...e same order as m femp . 12 Remark 3.1. We can capture the manner in which K(ff) scales with ff by the quantity lim ff&0 log K(ff) log( 1 ff ) : This is the metric dimension of the set Z (see [5] and =-=[14]-=-). It is a way of defining the dimension of a metric space (even when it is not a vector space) by appealing to the notion of volume. For example, if Z is IR n and d is the Euclidean distance, K(ff) i... |

365 |
Computer Systems that Learn
- Weiss, Kulikowski
- 1995
(Show Context)
Citation Context ... the sample size is a critical issue. There are a number of techniques by which one can attempt to estimate the error of the candidate hypotheses h k 2 H k and thereby choose the best value of k (see =-=[40]-=- for an overview of these methods). Most of these schemes involve withholding part of the samples and/or resampling the data in some fashion, as in cross-validation ([6] and [36]) and bootstrapping ([... |

315 |
What size net gives valid generalization
- Baum, Haussler
- 1989
(Show Context)
Citation Context ...e largest n for which there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], [16], [17], [4], [18], [15], =-=[19]-=-, [20], [21], and [22] for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m femp (ffl; ffi; P ; H) ... |

223 |
Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework
- Haussler
- 1988
(Show Context)
Citation Context ...rge M allows the elements of the cover to be less simple. In some cases, finding the simplest hypothesis consistent with a labeling is much harder than finding one that is only reasonably simple (see =-=[42]-=-). This would dictate using M ? 1. A key observation is that we can construct finite M-simple empirical coverings. Lemma 4.1. Under Assumption 2.2, for any Ms1 we can construct an M-simple empirical f... |

213 |
Estimating the error rate of a prediction rule. improvements on cross-validation
- Efron
- 1983
(Show Context)
Citation Context ...] for an overview of these methods). Most of these schemes involve withholding part of the samples and/or resampling the data in some fashion, as in cross-validation ([6] and [36]) and bootstrapping (=-=[41]-=-). A point that we should note here is that femp is still used in many of these methods to select the initial candidate hypotheses h k . The learning procedure we present next, or at least the ideas i... |

205 |
Minimum Complexity Density Estimation
- Barron, Cover
- 1991
(Show Context)
Citation Context ...nderlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, see [32], [33], [34], [35], [4], [13], [36], [37], =-=[38], and [39]-=-. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure that learning occurs. The two preceding methods hav... |

153 |
Abstract Inference
- Grenander
- 1981
(Show Context)
Citation Context .... A variety of methods use this same principle, and all of them rely on some connection between the convergence properties of femp and some prior measure of the complexity of hypotheses from H k (see =-=[26]-=-, [27], [28], [29], [30], and [31]). For instance, consider the case in which each H k has finite VC-dimension. Inspection of Lemma 3.1 shows that, if we let k(n) increase slowly enough, then femp sim... |

137 |
Central limit theorems for empirical measures
- Dudley
- 1978
(Show Context)
Citation Context ...mension of B, VCdim(B), is the largest n for which there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], =-=[16]-=-, [17], [4], [18], [15], [19], [20], [21], and [22] for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq !... |

131 |
Connectionist Nonparametric Regression: Multilayer Feedforward Networks Can Learn Arbitrary Mappings
- White
- 1991
(Show Context)
Citation Context ...se this same principle, and all of them rely on some connection between the convergence properties of femp and some prior measure of the complexity of hypotheses from H k (see [26], [27], [28], [29], =-=[30]-=-, and [31]). For instance, consider the case in which each H k has finite VC-dimension. Inspection of Lemma 3.1 shows that, if we let k(n) increase slowly enough, then femp simultaneously estimates (P... |

117 |
Empirical Processes: theory and applications
- Pollard
- 1990
(Show Context)
Citation Context ...mine the case where the hypotheses and labels are real-valued and P = P , the set of all probability distributions on X \Theta Y . Building on the work of Dudley ([16] and [18]) and Pollard ([23] and =-=[24]), Haussle-=-r has made much progress in finding conditions that are sufficient for femp to be a simultaneous estimator (see [25] and [14]). One of these conditions is that a certain "pseudodimension" be... |

114 |
Approximation and estimation bounds for artificial neural networks
- Barron, R
- 1994
(Show Context)
Citation Context ...sis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, see [32], [33], [34], [35], [4], [13], [3=-=6], [37], [38]-=-, and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure that learning occurs. The two preceding m... |

91 | Bounding the Vapnik-Chervonenkis dimension of concept classes parametrized by realnumbers
- Goldberg, Jerrum
- 1993
(Show Context)
Citation Context ...for which there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], [16], [17], [4], [18], [15], [19], [20], =-=[21]-=-, and [22] for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m femp (ffl; ffi; P ; H) ! q ffl 2 [2... |

82 |
A course on empirical processes
- Dudley
- 1984
(Show Context)
Citation Context ...im(B), is the largest n for which there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], [16], [17], [4], =-=[18]-=-, [15], [19], [20], [21], and [22] for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m femp (ffl; ... |

81 |
Complexity regularization with application to artificial neural network, in Nonparametric functional estimation and related topics
- Barron
- 1991
(Show Context)
Citation Context ...ypothesis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, see [32], [33], [34], [35], [4], [1=-=3], [36], [37]-=-, [38], and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure that learning occurs. The two prece... |

66 |
A.: The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya 26(3
- Vapnik, Chervonenkis
- 1981
(Show Context)
Citation Context ...rn by simply choosing the hypothesis with the least estimated error, since this hypothesis will also have nearly the least true error. Inspired by the pioneering work of Vapnik and Chervonenkis ([2], =-=[3]-=-, and [4]), much research has been done to determine when the empirical error estimate based on the labeled sample ~s(m) = (~x(m); ~y(m)), femp [~s(m); h] = 1 m m X i=1 L(h(x i ); y i ); 2 succeeds in... |

60 | Bounds for the computational power and learning complexity of analog neural nets
- Maass
- 1992
(Show Context)
Citation Context ...there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], [16], [17], [4], [18], [15], [19], [20], [21], and =-=[22]-=- for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m femp (ffl; ffi; P ; H) ! q ffl 2 [20 ln(8=ffl... |

44 | Finiteness results for sigmoidal `neural' networks
- Macintyre, Sontag
- 1993
(Show Context)
Citation Context ...est n for which there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], [16], [17], [4], [18], [15], [19], =-=[20]-=-, [21], and [22] for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m femp (ffl; ffi; P ; H) ! q ff... |

40 |
Automatic Pattern Recognition: A Study of the Probability of Error
- Devroye
- 1988
(Show Context)
Citation Context ...mportant difference: g cl selects an empirical cover based on the data, whereas the covers in these other methods are fixed in advance. Devroye examines a general structure for learning procedures in =-=[11]. There, a class of -=-candidate hypotheses is selected based on a "training set" (~s 0 in our notation), and the hypothesis with the least empirical error on an independent "testing set" (~s 00 ) is sel... |

39 |
Learnability by fixed distributions
- Benedek, Itai
- 1988
(Show Context)
Citation Context ...bsample. The use of an empirical cover is clearly a different approach, since the labels of the first n points are not even used. The learning procedure g cl resembles the cover-based methods in [4], =-=[7]-=-, [8], [9], and [10]. In these methods, knowledge of P or the structure of P is used to select a finite cover for H and empirical estimates are used to select the best element of the cover. Thus, ther... |

33 |
Computational Learning Theory. Cambridge Tracts
- Anthony, Biggs
- 1992
(Show Context)
Citation Context ...e Equation (24) as sup h2H \Delta s.t. h(~x(n))= ~ 0 E P h ? ffl: (25) Let q \Delta = VCdim(H \Delta ). By Lemma A.1, q \Delta is finite if q is. Also, it is true that q \Deltas1 if qs1. We have from =-=[47]-=- that (25) holds with probability less than ffi for any P 2 P when n is at least 3 ffl i q \Delta ln(12=ffl) + ln(2=ffi) j : (26) Thus, (26) serves as a bound on N ec (1; ffl; ffi). Replacing ffl and ... |

29 | A Markovian extension of Valiant’s learning model
- Aldous, Vazirani
- 1990
(Show Context)
Citation Context ...articular interest is to relax the requirement that the labeled samples are drawn independently and are identically distributed. A few papers in the literature do address more general situations; see =-=[44]-=-, [30], [45], and [46]. Also, the learning framework could be extended to a nonparametric setting by allowing the hypothesis class to vary with the observed data (as in nearest neighbor classification... |

28 |
Inductive principles of the search for empirical dependences
- Vapnik
- 1989
(Show Context)
Citation Context ...hods use this same principle, and all of them rely on some connection between the convergence properties of femp and some prior measure of the complexity of hypotheses from H k (see [26], [27], [28], =-=[29]-=-, [30], and [31]). For instance, consider the case in which each H k has finite VC-dimension. Inspection of Lemma 3.1 shows that, if we let k(n) increase slowly enough, then femp simultaneously estima... |

25 |
Some special Vapnik–Chervonenkis classes
- Wenocur, Dudley
- 1981
(Show Context)
Citation Context ...n of B, VCdim(B), is the largest n for which there is some ~v(n) 2 V n shattered by B. (If, for arbitrarily large n, there are ~v(n) that are shattered by B, we say that VCdim(B) = 1.) See [2], [16], =-=[17]-=-, [4], [18], [15], [19], [20], [21], and [22] for examples of classes of finite VC-dimension. Using results from [4] and [15], we can bound m femp as follows. Lemma 3.1. With q = VCdim(H); 1sq ! 1, m ... |

23 |
Generalizing the PAC model: Sample size bounds from metric dimension-based uniform convergence results
- Haussler
- 1989
(Show Context)
Citation Context ... Y . Building on the work of Dudley ([16] and [18]) and Pollard ([23] and [24]), Haussler has made much progress in finding conditions that are sufficient for femp to be a simultaneous estimator (see =-=[25] and [14])-=-. One of these conditions is that a certain "pseudodimension" be finite. This pseudodimension generalizes the VC-dimension to classes of real-valued functions. (Vapnik generalizes the VC-dim... |

12 | Learning by canonical smooth estimation, Part I: Simultaneous estimation
- Buescher, Kumar
- 1996
(Show Context)
Citation Context ...iuc.edu 1 1 Introduction This paper develops and investigates the properties of a new class of learning procedures. These procedures are based on the canonical error estimation procedure developed in =-=[1]. We also -=-provide bounds on the number of samples required by a procedure. Additionally, we propose and analyze a method of selecting a hypothesis, or "model," with an appropriate degree of complexity... |

12 |
A parametrization scheme for classifying models of learnability. InProc. 2nd Annu. Workshop on Comput. Learning Theory
- Ben-David, Benedek, et al.
- 1989
(Show Context)
Citation Context ...f the relative sizes of ~s 0 and ~s 00 as well as a characterization of a condition that is sufficient for this procedure to work. In the 0/1-valued, noise-free case, a scheme akin to g cl is used in =-=[12]-=- to transform a learning procedure for one triple (P; C 1 ; C 1 ) into a learning procedure for another triple 6 (P; C 2 ; C 2 ). If g : ~s 7! H picks a hypothesis that agrees with the data and yet g ... |

10 |
Problems of computational and information complexity in machine vision and learning
- Kulkarni
- 1991
(Show Context)
Citation Context ...he use of an empirical cover is clearly a different approach, since the labels of the first n points are not even used. The learning procedure g cl resembles the cover-based methods in [4], [7], [8], =-=[9]-=-, and [10]. In these methods, knowledge of P or the structure of P is used to select a finite cover for H and empirical estimates are used to select the best element of the cover. Thus, there is an im... |

10 |
Ordered risk minimization
- Vapnik, Chervonenkis
- 1974
(Show Context)
Citation Context ...imized over k to select the final hypothesis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, =-=see [32], [33]-=-, [34], [35], [4], [13], [36], [37], [38], and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure ... |

9 | On metric entropy, Vapnik-Chervonenkis dimension, and learnability for a class of distributions
- Kulkarni
- 1989
(Show Context)
Citation Context ...le. The use of an empirical cover is clearly a different approach, since the labels of the first n points are not even used. The learning procedure g cl resembles the cover-based methods in [4], [7], =-=[8]-=-, [9], and [10]. In these methods, knowledge of P or the structure of P is used to select a finite cover for H and empirical estimates are used to select the best element of the cover. Thus, there is ... |

9 |
Asymptotic optimality for C P , CL , cross-validation and generalized crossvalidation: Discrete index set
- Li
- 1987
(Show Context)
Citation Context ...inal hypothesis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, see [32], [33], [34], [35], [=-=4], [13], [36]-=-, [37], [38], and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure that learning occurs. The two... |

4 |
Tikhomirov, "ffl-entropy and ffl-capacity of sets in function spaces
- Kolmogorov, M
- 1961
(Show Context)
Citation Context ...enote the smallest ffl-cover of H(~x) by N (ffl; H(~x); ae 1 ). When M(ffl; H(~x); ae 1 ) is finite for every ffl ? 0, as will always be the case when Z = [0; B], the following inequalities hold (see =-=[5]-=-). Lemma A.2. M(2ffl; H(~x); ae 1 )sN (ffl; H(~x); ae 1 )sM(ffl; H(~x); ae 1 ): Finally, we have this result (Theorem 6 of [14]). Lemma A.3. If H(x) ` [0; B] and psdim(H) = q for some 1sq ! 1, then fo... |

4 |
Learning decision rules for pattern classification under a family of probability measures
- Kulkarni, Vidyasagar
- 1997
(Show Context)
Citation Context ... an empirical cover is clearly a different approach, since the labels of the first n points are not even used. The learning procedure g cl resembles the cover-based methods in [4], [7], [8], [9], and =-=[10]-=-. In these methods, knowledge of P or the structure of P is used to select a finite cover for H and empirical estimates are used to select the best element of the cover. Thus, there is an important di... |

4 |
Approximation of least squares regression on nested subspaces
- Cox
- 1988
(Show Context)
Citation Context ...riety of methods use this same principle, and all of them rely on some connection between the convergence properties of femp and some prior measure of the complexity of hypotheses from H k (see [26], =-=[27]-=-, [28], [29], [30], and [31]). For instance, consider the case in which each H k has finite VC-dimension. Inspection of Lemma 3.1 shows that, if we let k(n) increase slowly enough, then femp simultane... |

4 |
On uniform laws of averages
- Nobel
- 1992
(Show Context)
Citation Context ...to relax the requirement that the labeled samples are drawn independently and are identically distributed. A few papers in the literature do address more general situations; see [44], [30], [45], and =-=[46]-=-. Also, the learning framework could be extended to a nonparametric setting by allowing the hypothesis class to vary with the observed data (as in nearest neighbor classification). It might prove usef... |

3 |
Ordered risk minimization II
- Vapnik, Chervonenkis
- 1974
(Show Context)
Citation Context ... over k to select the final hypothesis. The underlying intuition is that we should trade some accuracy on the data in exchange for a "simpler" hypothesis. For examples of this method, see [3=-=2], [33], [34]-=-, [35], [4], [13], [36], [37], [38], and [39]. Again, this penalty is determined from some measure of complexity that is derived from the structure of H and must be carefully selected to ensure that l... |

2 |
Learning and smooth simultaneous estimation of errors based on empirical data
- Buescher
- 1992
(Show Context)
Citation Context ... For instance, consider the case where the hypotheses and labels are real-valued and the distance measure d(z 1 ; z 2 ) = jz 1 \Gamma z 2 j. Using results from [4], it is straightforward to show (see =-=[43]-=-) that E ~x converges simultaneously over (P; d(H i ; H i )) 20 whenever it does so over (P; H i ). Thus, when VCdim(H i ) or psdim(H i ) is finite for each i (as is the case for many parametric model... |

2 |
Evaluating the performance of a simple inductive procedure in the presence of overfitting error
- Nobel
- 1991
(Show Context)
Citation Context ...terest is to relax the requirement that the labeled samples are drawn independently and are identically distributed. A few papers in the literature do address more general situations; see [44], [30], =-=[45]-=-, and [46]. Also, the learning framework could be extended to a nonparametric setting by allowing the hypothesis class to vary with the observed data (as in nearest neighbor classification). It might ... |

1 |
Minimization of expected risk based on empirical data
- Vapnik, Chervonenkis
- 1987
(Show Context)
Citation Context ...of methods use this same principle, and all of them rely on some connection between the convergence properties of femp and some prior measure of the complexity of hypotheses from H k (see [26], [27], =-=[28]-=-, [29], [30], and [31]). For instance, consider the case in which each H k has finite VC-dimension. Inspection of Lemma 3.1 shows that, if we let k(n) increase slowly enough, then femp simultaneously ... |

1 |
Nonparameteric estimation via empirical risk minimization." Submitted to the
- Lugosi, Zeger
- 1993
(Show Context)
Citation Context ...me principle, and all of them rely on some connection between the convergence properties of femp and some prior measure of the complexity of hypotheses from H k (see [26], [27], [28], [29], [30], and =-=[31]-=-). For instance, consider the case in which each H k has finite VC-dimension. Inspection of Lemma 3.1 shows that, if we let k(n) increase slowly enough, then femp simultaneously estimates (P ; H k(n) ... |