## On Efficient Agnostic Learning of Linear Combinations of Basis Functions (1995)

### Cached

### Download Links

- [wwwsyseng.anu.edu.au]
- [axiom.anu.edu.au]
- [users.cecs.anu.edu.au]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Eighth Annual Conference on Computational Learning Theory |

Citations: | 15 - 3 self |

### BibTeX

@INPROCEEDINGS{Lee95onefficient,

author = {Wee Sun Lee and Peter L. Bartlett and Robert C. Williamson},

title = {On Efficient Agnostic Learning of Linear Combinations of Basis Functions},

booktitle = {In Proceedings of the Eighth Annual Conference on Computational Learning Theory},

year = {1995},

pages = {369--376},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider efficient agnostic learning of linear combinations of basis functions when the sum of absolute values of the weights of the linear combinations is bounded. With the quadratic loss function, we show that the class of linear combinations of a set of basis functions is efficiently agnostically learnable if and only if the class of basis functions is efficiently agnostically learnable. We also show that the sample complexity for learning the linear combinations grows polynomially if and only if a combinatorial property of the class of basis functions, called the fat-shattering function, grows at most polynomially. We also relate the problem to agnostic learning of f0; 1g-valued function classes by showing that if a class of f0; 1g-valued functions is efficiently agnostically learnable (using the same function class) with the discrete loss function, then the class of linear combinations of functions from the class is efficiently agnostically learnable with the quadratic loss fun...

### Citations

2573 |
The design and analysis of computer algorithms
- Aho, Hopcroft, et al.
- 1974
(Show Context)
Citation Context ...ons drawn from AD;f have the form (x; f(x)) where x is drawn randomly according to D. In learning probabilistic concepts [14], we have Y = f0; 1g. Again, there is a class F of functions, mapping X to =-=[0; 1]-=- and AD;f 2 A is formed from a distribution D over X and a function f 2 F . Observations drawn from AD;f are of the form (x; b), where x is drawn randomly according to D and b = 1 with probability f(x... |

1761 | A theory of learnable
- Valiant
- 1984
(Show Context)
Citation Context ...s efficiently learnable in the PAC learning model. Whether polynomial-size DNF can be learned efficiently has been an open problem in computational learning theory since it was first posed by Valiant =-=[20]-=- in 1984 (the majority view is that polynomial-sized DNF is not likely to be efficiently learnable [12]). Using techniques similar to that in [15], it is possible to show that if a class of f0; 1g-val... |

1584 | Probability inequalities for sums of bounded random variables - Hoeffding - 1963 |

710 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...-term DNF. Proof Sketch. We will show that there exists a weak learning algorithm (which produces randomized hypotheses) for p(n)- term DNF. The result then follows from Schapire's boosting technique =-=[19]-=- for converting a weak learning algorithm into a strong one. For any target p(n)-term DNF, there exists a monomial that never makes an error on a negative example and gets at least 1=p(n) of the posit... |

647 |
Learnability and the VapnikChervonenkis Dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...equired in Lemma 10. If G is agnostically PAC learnable, then the fat-shattering function (which is the same as the VC-dimension for f0; 1gvalued functions) is polynomial in the complexity parameters =-=[7]-=-. Theorem 13 shows that approximating to accuracy ffl i at each iteration with respect to the empirical distribution will provide a hypothesis with the desired error. For each iteration i, 1sisk, we f... |

411 |
Universal approximation bounds for superposition of a sigmoidal function
- Barron
- 1993
(Show Context)
Citation Context ...nothing assumed about the hypothesis class except the bound on the range of its output. We will need an approximation result from [17]. This result is an extension of results by Jones [13] and Barron =-=[5]-=-. A related result is also presented by Koiran [16]. Theorem 4 Let H be a Hilbert space with norm k \Delta k. Let G be a subset of H with k g ksb for each g 2 G. Let co(G) be the convex hull of G. For... |

387 |
Decision theoretic generalizations of the PAC model for neural net and learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...s functions in a robust extension of the popular Probably Approximately Correct (PAC) learning model in computational learning theory [4]. The learning model we use, commonly called agnostic learning =-=[9, 15, 18]-=- is robust with respect to noise and mismatches between the model and the phenomenon being modelled. The only assumption made about the phenomenon is that it can be represented by a joint probability ... |

268 |
Fast probabilistic algorithms for hamiltonian circuits and matchings
- Angluin, Valiant
- 1977
(Show Context)
Citation Context ...sification and 1=4 when it gives the wrong classification. The algorithm for producing the randomized hypothesis goes as follows. Get a sufficiently large sample. (Use for example the Chernoff bounds =-=[3]-=- to get a sufficient sample size). If significantly more than half of the examples are labelled as positive, use the all one monomial for classification. If significantly more than half the examples a... |

212 | Scale-sensitive dimensions, uniform convergence, and learnability
- Alon, Ben-David, et al.
- 1993
(Show Context)
Citation Context ...ss result for the class of linear combinations of functions from the basis function class. In Section 4, we use a combinatorial property of the basis function class called the fat-shattering function =-=[14, 2, 6]-=- to bound the covering number of the linear combinations, and provide sample complexity bounds (which are better than those obtained using the results in Section 3). Gurvits and Koiran [8] have also p... |

201 | Towards efficient agnostic learning
- Kearns, Schapire, et al.
- 1992
(Show Context)
Citation Context ...ip with Agnostic PAC learning Let G be a class of f0; 1g-valued functions. Let the target function be chosen from the class of all f0; 1g-valued functions on X . Following Kearns, Schapire and Sellie =-=[15]-=-, we call proper efficient agnostic learning with discrete loss under these assumptions agnostic PAC learning. In this section, we show that if G is agnostically PAC learnable, then N G K is properly ... |

199 | Ecient distribution-free learning of probabilistic concepts
- Kearns, Shapire
- 1994
(Show Context)
Citation Context ... functions, learning the conditional expectation when the observations are noisy, learning the best approximation when the target function is not in the class and also learning probabilistic concepts =-=[14]-=-. The function classes we study include some which are widely used in practice. Linear combinations of basis functions can be considered as a generalization of two layer neural networks. Instead of th... |

144 |
A simple lemma on greedy approximation in Hilbert space and convergence rates for Projection Pursuit Regression and neural network training
- Jones
- 1992
(Show Context)
Citation Context ... functions with nothing assumed about the hypothesis class except the bound on the range of its output. We will need an approximation result from [17]. This result is an extension of results by Jones =-=[13]-=- and Barron [5]. A related result is also presented by Koiran [16]. Theorem 4 Let H be a Hilbert space with norm k \Delta k. Let G be a subset of H with k g ksb for each g 2 G. Let co(G) be the convex... |

86 | Robust trainability of single neurons
- Höffgen, Simon
- 1992
(Show Context)
Citation Context ...mbinations of f0� 1g-valued basis functions to the work on agnostic learning of f0� 1g-valued functions. Unfortunately, most of the results on agnostic learning of f0� 1g-valuedfunctions are negative =-=[15, 11]-=-. In Section 7, we discuss the hardness results as well some open problems. 2 Definitions and Learning Model 2.1 Agnostic Learning model Our agnostic learning model is based on the agnostic learning m... |

70 | Efficient agnostic learning of neural networks with bounded fan-in
- Lee, Bartlett, et al.
- 1996
(Show Context)
Citation Context ...d the number of basis functions in a linear combination but instead insist that the sum of absolute values of the weights of the linear combination be bounded. This work is an extension of results in =-=[17]-=- where it was shown that the class of linear combinations of linear threshold functions with bounded fanin is efficiently agnostically learnable. Related works include that of Koiran [16] which consid... |

62 | Fat-shattering and the learnability of realvalued functions
- Bartlett, Long, et al.
- 1996
(Show Context)
Citation Context ...ss result for the class of linear combinations of functions from the basis function class. In Section 4, we use a combinatorial property of the basis function class called the fat-shattering function =-=[14, 2, 6]-=- to bound the covering number of the linear combinations, and provide sample complexity bounds (which are better than those obtained using the results in Section 3). Gurvits and Koiran [8] have also p... |

35 |
Computational Learning Theory. Cambridge Tracts
- Anthony, Biggs
- 1992
(Show Context)
Citation Context ...efficient (polynomial-time)learning of linear combinations of basis functions in a robust extension of the popular Probably Approximately Correct (PAC) learning model in computational learning theory =-=[4]-=-. The learning model we use, commonly called agnostic learning [9, 15, 18] is robust with respect to noise and mismatches between the model and the phenomenon being modelled. The only assumption made ... |

35 |
Robust trainability of single neurons
- HÄo®gen, Simon
- 1992
(Show Context)
Citation Context ...mbinations of f0; 1g-valued basis functions to the work on agnostic learning of f0; 1g-valued functions. Unfortunately, most of the results on agnostic learning of f0; 1g-valuedfunctions are negative =-=[15, 11]-=-. In Section 7, we discuss the hardness results as well some open problems. 2 Definitions and Learning Model 2.1 Agnostic Learning model Our agnostic learning model is based on the agnostic learning m... |

21 | Agnostic PAC-learning of functions on analog neural networks
- Maass
- 1993
(Show Context)
Citation Context ...s functions in a robust extension of the popular Probably Approximately Correct (PAC) learning model in computational learning theory [4]. The learning model we use, commonly called agnostic learning =-=[9, 15, 18]-=- is robust with respect to noise and mismatches between the model and the phenomenon being modelled. The only assumption made about the phenomenon is that it can be represented by a joint probability ... |

13 |
E cient learning of continuous neural networks
- Koiran
- 1994
(Show Context)
Citation Context ...f results in [17] where it was shown that the class of linear combinations of linear threshold functions with bounded fanin is efficiently agnostically learnable. Related works include that of Koiran =-=[16]-=- which considered learning two layer neural networks with piecewise linear activation functions (but not in the agnostic model) and that of Maass [18] on agnostic learning of fixed sized multilayer ne... |

6 |
Simple translation-invariant concepts are hard to learn
- Jerrum
- 1994
(Show Context)
Citation Context ...ly has been an open problem in computational learning theory since it was first posed by Valiant [20] in 1984 (the majority view is that polynomial-sized DNF is not likely to be efficiently learnable =-=[12]-=-). Using techniques similar to that in [15], it is possible to show that if a class of f0; 1g-valued basis functions include monomials, then an efficient agnostic learning algorithm for the class usin... |

2 |
Approximation and learning of real-valued functions
- Gurvits, Koiran
- 1995
(Show Context)
Citation Context ...tion [14, 2, 6] to bound the covering number of the linear combinations, and provide sample complexity bounds (which are better than those obtained using the results in Section 3). Gurvits and Koiran =-=[8]-=- have also provided sample complexity bounds for the case when the basis functions are f0; 1g-valued functions by bounding the fat-shattering function of the convex closure of the basis function class... |