## Statistical Queries and Faulty PAC Oracles (1993)

Venue: | In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory |

Citations: | 40 - 6 self |

### BibTeX

@INPROCEEDINGS{Decatur93statisticalqueries,

author = {Scott Evan Decatur},

title = {Statistical Queries and Faulty PAC Oracles},

booktitle = {In Proceedings of the Sixth Annual ACM Workshop on Computational Learning Theory},

year = {1993},

pages = {262--268},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we study learning in the PAC model of Valiant [18] in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns [12] recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical queries is sufficient for learning in the PAC model with malicious error rate proportional to the required statistical query accuracy. One application of this result is a new lower bound for tolerable malicious error in learning monomials of k literals. This is the first such bound which is independent of the number of irrelevant attributes n. We also use the statistical query model to give sufficient conditions for using distribution specific algorithms on distributions outside their prescr...

### Citations

1697 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ... and Faulty PAC Oracles Scott Evan Decatur Aiken Computation Laboratory Harvard University Cambridge, MA 02138 sed@das.harvard.edu Abstract In this paper we study learning in the PAC model of Valiant =-=[18]-=- in which the example oracle used for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which e... |

666 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...s of faulty oracles. Finally, we examine hypothesis boosting algorithms in the context of learning with distribution noise, and show that Schapire's result regarding the strength of weak learnability =-=[17]-=- is in some sense tight in requiring the weak learner to be nearly distribution free. Research supported by an NDSEG Fellowship and by NSF grant CCR-89-02500. Appeared in the Proceedings of the Sixth ... |

288 | Efficient noise-tolerant learning from statistical queries
- Kearns
- 1998
(Show Context)
Citation Context ...for learning may be faulty in one of two ways: either by misclassifying the example or by distorting the distribution of examples. We first consider models in which examples are misclassified. Kearns =-=[12]-=- recently showed that efficient learning in a new model using statistical queries is a sufficient condition for PAC learning with classification noise. We show that efficient learning with statistical... |

281 |
Constant depth circuits, Fourier transform, and learnability
- Linial, Mansour, et al.
- 1989
(Show Context)
Citation Context ... restricted algorithms. Due to the difficulty of finding distribution free algorithms, there has been considerable work in finding learning algorithms that work on restricted classes of distributions =-=[9, 10, 13, 16]-=-. Most of these algorithms can also be specified in the statistical query model [12]. We show how to take such statistical query algorithms and prove their correctness on larger classes of distributio... |

223 |
Quantifying inductive bias: AI learning algorithms and Valiantâ€™s learning framework
- Haussler
- 1988
(Show Context)
Citation Context ... (M k ) is PAC learnable in polynomial time with malicious error rate E poly MAL = \Omega\Gamma ffl k log( 1 ffl ) ). Proof (Sketch): We use the set cover approach to learning monomials of k literals =-=[11]-=-. The SQ version of this algorithm is a straightforward conversion from covering actual examples to covering probabilities of examples. Such a conversion results in a SQ algorithm of tolerance \Theta(... |

221 |
Learning from noisy examples
- Angluin, Laird
- 1988
(Show Context)
Citation Context ...[18] in which the example oracle used for learning is faulty. One way the oracle may be faulty is by returning examples with incorrect classification labels. Classification noise of Angluin and Laird =-=[1]-=- and malicious error of Valiant [19] (studied further by Kearns and Li [15]) model two such learning environments. Alternatively, the oracle may be considered faulty if the examples are not chosen acc... |

167 | On the learnability of Boolean formulae - Kearns, Li, et al. - 1987 |

167 | Learning in the presence of malicious errors
- Kearns, Li
- 1993
(Show Context)
Citation Context ...oracle may be faulty is by returning examples with incorrect classification labels. Classification noise of Angluin and Laird [1] and malicious error of Valiant [19] (studied further by Kearns and Li =-=[15]-=-) model two such learning environments. Alternatively, the oracle may be considered faulty if the examples are not chosen according to the target distribution, even if they are labeled correctly with ... |

117 |
Learning disjunction of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ...ed for learning is faulty. One way the oracle may be faulty is by returning examples with incorrect classification labels. Classification noise of Angluin and Laird [1] and malicious error of Valiant =-=[19]-=- (studied further by Kearns and Li [15]) model two such learning environments. Alternatively, the oracle may be considered faulty if the examples are not chosen according to the target distribution, e... |

60 | The Computational Complexity of Machine Learning
- Kearns
- 1990
(Show Context)
Citation Context ...an formulae are weakly learnable on the uniform distribution, but to either allow non-monotonicity, allow arbitrary distributions, or strengthen from weak to strong learning results in intractability =-=[14]-=-. We can show that monotone Boolean formulae are weakly learnable on a class of distributions strictly containing the uniform distribution. Theorem 16 If Cn is weakly learnable by statistical queries ... |

50 |
An improved boosting algorithm and its implications on learning complexity
- Freund
- 1992
(Show Context)
Citation Context ...e [17] shows that for any class of target functions C, if there exists a distribution free weak learning algorithm for C, then there exists a distribution free strong learning algorithm for C. Freund =-=[8]-=- asks how much we can relax the requirement that the weak learning algorithm work for all distributions. In what cases can a D distribution restricted weak learning algorithm for concept class C be bo... |

42 |
On the necessity of Occam Algorithms
- Board, Pitt
- 1990
(Show Context)
Citation Context ...rows above to be unidirectional (i.e. we have strict inclusion), with exception of the possibility that DE ! DDE. This question hinges on the question of whether PAC algorithms imply Occam algorithms =-=[7]-=-. It would also be of interest to characterize the relationship between the CN model and the DS and DDE models. Which is a more powerful adversary: one that misclassifies with fixed probability or one... |

39 |
Learnability by fixed distributions
- Benedek, Itai
- 1988
(Show Context)
Citation Context ... show that the above strategy will not work in general. In order to show such a statement, we must first rule out those distributions on which we can strongly learn by trivial means. Benedek and Itai =-=[5]-=- show that any concept class is strongly learnable (not necessarily in polynomial time) under a discrete distribution. We define a class of distributions on which any concept class can be strongly lea... |

23 |
Learning 2DNF formulas and k decision trees
- Hancock
- 1991
(Show Context)
Citation Context ... restricted algorithms. Due to the difficulty of finding distribution free algorithms, there has been considerable work in finding learning algorithms that work on restricted classes of distributions =-=[9, 10, 13, 16]-=-. Most of these algorithms can also be specified in the statistical query model [12]. We show how to take such statistical query algorithms and prove their correctness on larger classes of distributio... |

17 |
Improved learning of AC ~ functions
- Furst, Jackson, et al.
- 1991
(Show Context)
Citation Context ... restricted algorithms. Due to the difficulty of finding distribution free algorithms, there has been considerable work in finding learning algorithms that work on restricted classes of distributions =-=[9, 10, 13, 16]-=-. Most of these algorithms can also be specified in the statistical query model [12]. We show how to take such statistical query algorithms and prove their correctness on larger classes of distributio... |

14 | Learning with a slowly changing distribution
- Bartlett
- 1992
(Show Context)
Citation Context ...tribution D 2 D, based on examples it receives from a distribution D 0 chosen by an adversary with the restriction that d(D; D 0 )sfi, where d(\Delta; \Delta) is defined as in Definition 10. Bartlett =-=[2]-=- studies learning when the sequence of examples come from a sequence of distributions such that the shift between consecutive distributions in the sequence is bounded. Distribution shift differs from ... |

13 | Learning by distances - Ben-David, Itai, et al. - 1990 |

12 |
Investigating the distribution assumptions in the PAC learning model
- Bartlett, Williamson
- 1991
(Show Context)
Citation Context ...he oracle may be considered faulty if the examples are not chosen according to the target distribution, even if they are labeled correctly with respect to the target function. Bartlett and Williamson =-=[3]-=- study such a model of distribution noise. Kearns [12] recently introduced a new model of learning by statistical queries in which efficient learning is a sufficient condition for PAC learning with cl... |

6 | Dominating distributions and learnability
- Benedek, Itai
- 1992
(Show Context)
Citation Context ...n D of distance fl to some distribution in D, we cannot hope in general to get better than error fl. We can avoid this situation if we consider dominating distributions as defined by Benedek and Itai =-=[6]-=-. They use dominating distributions to specify when learning with one distribution implies learning with another. They say D 1 dominates D 2 if for every x 2 X, D 1 (x) = 0 implies D 2 (x) = 0. In add... |