## Learning in the Presence of Malicious Errors (1993)

Venue: | SIAM Journal on Computing |

Citations: | 167 - 12 self |

### BibTeX

@ARTICLE{Kearns93learningin,

author = {Michael Kearns and Ming Li},

title = {Learning in the Presence of Malicious Errors},

journal = {SIAM Journal on Computing},

year = {1993},

volume = {22},

pages = {807--837}

}

### Years of Citing Articles

### OpenURL

### Abstract

In this paper we study an extension of the distribution-free model of learning introduced by Valiant [23] (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary with unbounded computational power and access to the entire history of the learning algorithm's computation. Thus, we study a worst-case model of errors. Our results include general methods for bounding the rate of error tolerable by any learning algorithm, efficient algorithms tolerating nontrivial rates of malicious errors, and equivalences between problems of learning with errors and standard combinatorial optimization problems. 1 Introduction In this paper, we study a practical extension to Valiant's distribution-free model of learning: the presence of errors (possibly maliciously generated by an adversary) in the sample data. The distribution-free model typically makes the idealize...

### Citations

10959 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...x) = 1 (c(x) = 0, respectively) are used interchangeably. We assume that domain points x 2 X and representations c 2 C are efficiently encoded using any of the standard schemes (see Garey and Johnson =-=[9]-=-), and denote by jxj and jcj the length of these encodings measured in bits. Parameterized representation classes. In this paper we will study parameterized classes of representations. Here we have a ... |

1697 | A theory of the learnable
- Valiant
- 1984
(Show Context)
Citation Context ...Malicious Errors Michael Kearns y AT&T Bell Laboratories Ming Li z University of Waterloo Abstract In this paper we study an extension of the distribution-free model of learning introduced by Valiant =-=[23]-=- (also known as the probably approximately correct or PAC model) that allows the presence of malicious errors in the examples given to a learning algorithm. Such errors are generated by an adversary w... |

948 |
On the uniform convergence of relative frequencies of events to their probabilities
- Vapnik, Chervonenkis
- 1971
(Show Context)
Citation Context ... we define vcd(C) = maxfjY j : Y is shattered by Cg: If this maximum does not exist, then vcd(C) is infinite. The Vapnik-Chervonenkis was originally introduced in the paper of Vapnik and Chervonenkis =-=[25]-=- and was first studied in the context of the distribution-free model by Blumer et al. [5]. Notational conventions. Let E(x) be an event and /(x) a random variable that depend on a parameter x that tak... |

717 |
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations
- Chernoff
- 1952
(Show Context)
Citation Context ...p; m; (1 \Gamma ff)mp)se \Gammaff 2 mp=2 and Fact CB2. GE(p; m; (1 + ff)mp)se \Gammaff 2 mp=3 These bounds in the form they are stated are from the paper of Angluin and Valiant [2]; see also Chernoff =-=[6]-=-. Although we will make frequent use of Fact CB1 and Fact CB2, we will do so in varying levels of detail, depending on the complexity of the calculation involved. However, we are primarily interested ... |

680 |
Approximation Algorithms For Combinatorial Problems
- Johnson
- 1974
(Show Context)
Citation Context ...arning monomials with errors to a generalization of the weighted set cover problem, and give an approximation algorithm for this problem (generalizing the greedy algorithm analyzed by several authors =-=[7, 11, 18]-=-) that is of independent interest. This approximation algorithm is used as a subroutine in a learning algorithm that tolerates an improved error rate for monomials. In the other direction, we prove th... |

674 | Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm
- Littlestone
- 1988
(Show Context)
Citation Context ...s considerably smaller than the total number of variables n. Other improvements in the performance of learning algorithms in the presence of many irrelevant attributes are investigated by Littlestone =-=[16]-=- and Blum [3]. We note that by applying Theorem 2 we can show that even for M 1 n , the class of monomials of length 1, the positive-only and negative-only malicious error rates are bounded by ffl=(n ... |

666 | The strength of weak learnability
- Schapire
- 1990
(Show Context)
Citation Context ...ions, that is, representations that can be written down in polynomial time. All representation classes considered here are polynomially evaluatable. It is worth mentioning at this point that Schapire =-=[20]-=- has shown that if a representation class is not polynomially evaluatable, then it is not efficiently learnable in our model. Thus, perhaps not surprisingly we see that classes that are not polynomial... |

625 |
Learnability and the VapnikChervonenkis dimension
- Blumer, Ehrenfeucht, et al.
- 1989
(Show Context)
Citation Context ...either malicious errors or classification noise errors) are present. Several existing learning algorithms use only positive examples or only negative examples (see e.g. Valiant [23] and Blumer et al. =-=[5]-=-). We demonstrate strong upper bounds on the tolerable error rate when only one type is used, and show that this rate can be provably increased when both types are used. In addition to proving this fo... |

375 | Learning decision lists
- Rivest
- 1987
(Show Context)
Citation Context ... + 1, where the ith entry indicates whether the function is 0 or 1 on all inputs with exactly i bits set to 1. We denote by SF n the class of all such representations. Decision Lists: A decision list =-=[19]-=- is a list L = ! (T 1 ; b 1 ); : : : ; (T l ; b l ) ?, where each T i is a monomial over the Boolean variables x 1 ; : : : ; x n and each b i 2 f0; 1g. For ~v 2 f0; 1g n , we define L(~v) as follows: ... |

263 |
On the ratio of optimal integral and fractional covers
- Lovász
- 1975
(Show Context)
Citation Context ... be considerably smaller than the optimal information-theoretic rate. The best approximation known for the set cover problem remains the greedy algorithm analyzed by Chvatal [7], Johnson [11], Lovasz =-=[17]-=-, and Nigmatullin [18]. Finally, we give a canonical reduction that allows many learning with errors problems to be studied as equivalent optimization problems, thus allowing one to sidestep some of t... |

256 | Fast probabilistic algorithms for hamiltonian circuits and matchings - Angluin, Valiant - 1979 |

221 |
Learning from noisy examples
- Angluin, Laird
- 1988
(Show Context)
Citation Context ...ial optimization problems such as set cover and natural problems of learning with errors. Several of our results also apply to a more benign model of classification noise defined by Angluin and Laird =-=[1]-=-, in which the underlying target distributions are unaltered, but there is some probability that a positive example is incorrectly classified as being negative, and vice-versa. Several themes are brou... |

167 | On the learnability of Boolean formulae
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ...sses studied here. Even in cases where the representation class is known to be not learnable from only positive or only negative examples in polynomial time (for example, it is shown in Kearns et al. =-=[13]-=- that monomials are not polynomially learnable from negative examples), the bounds on EMAL;+ and EMAL;\Gamma are relevant since they also hold for algorithms that do not run in polynomial time. Coroll... |

150 |
A greedy heuristic for the set covering problem
- Chvátal
- 1979
(Show Context)
Citation Context ...arning monomials with errors to a generalization of the weighted set cover problem, and give an approximation algorithm for this problem (generalizing the greedy algorithm analyzed by several authors =-=[7, 11, 18]-=-) that is of independent interest. This approximation algorithm is used as a subroutine in a learning algorithm that tolerates an improved error rate for monomials. In the other direction, we prove th... |

117 |
Learning disjunction of conjunctions
- Valiant
- 1985
(Show Context)
Citation Context ...al. [13] it can be shown that the algorithm of Theorem 18 (as well as that obtained from Theorem 12) can be used to obtain an improvement in the error rate over the negative-only algorithm of Valiant =-=[24]-=- for the class kDNF n;s of kDNF formulae with at most s terms. Briefly, the appropriate transformation regards a kDNF formulae as a 1DNF formulae in a space of \Theta(n k ) variables, one variable for... |

90 |
Equivalence of models for polynomial learnability
- HAUSSLER, KEARNS, et al.
- 1988
(Show Context)
Citation Context ...ivalent optimization problems, thus allowing one to sidestep some of the difficulties of analysis in the distribution-free model. Similar results are given for the error-free model by Haussler et al. =-=[10]-=-. We now give a brief survey of other studies of error in the distribution-free model. Valiant [24] modified his initial definitions of learnability to include the presence of errors in the examples. ... |

51 |
Learning from Good and Bad Data
- Laird
- 1988
(Show Context)
Citation Context ... by polynomial-time algorithms for nontrivial representation classes. Shackelford and Volper [21] investigate a model of random noise in the instances rather than the labels, and Sloan [22] and Laird =-=[15]-=- discuss a number of variants of both the malicious error and classification noise models. 2 Definitions for Distribution-free Learning In this section we give definitions and motivation for the model... |

51 |
Learning DNF under the uniform distribution in quasipolynomial time
- Verbeurgt
- 1990
(Show Context)
Citation Context ...of the set cover problem that we call the partial cover problem, which is defined below. This approximation algorithm is of independent interest and has found application in other learning algorithms =-=[14, 26]-=-. Our analysis and notation rely heavily on the work of Chvatal [7]; the reader may find it helpful to read his paper first. The Partial Cover Problem: Input: Finite sets S 1 ; : : : ; S n with positi... |

47 |
Types of noise in data for concept learning
- Sloan
- 1988
(Show Context)
Citation Context ...an be tolerated by polynomial-time algorithms for nontrivial representation classes. Shackelford and Volper [21] investigate a model of random noise in the instances rather than the labels, and Sloan =-=[22]-=- and Laird [15] discuss a number of variants of both the malicious error and classification noise models. 2 Definitions for Distribution-free Learning In this section we give definitions and motivatio... |

33 |
Learning k-DNF with noise in the attributes
- Shackelford, Volper
- 1988
(Show Context)
Citation Context ...onstrate that under stronger assumptions on the nature of the errors, large rates of error can be tolerated by polynomial-time algorithms for nontrivial representation classes. Shackelford and Volper =-=[21]-=- investigate a model of random noise in the instances rather than the labels, and Sloan [22] and Laird [15] discuss a number of variants of both the malicious error and classification noise models. 2 ... |

21 | A polynomial-time algorithm for learning k-variable pattern languages from examples
- Kearns, Pitt
- 1989
(Show Context)
Citation Context ...of the set cover problem that we call the partial cover problem, which is defined below. This approximation algorithm is of independent interest and has found application in other learning algorithms =-=[14, 26]-=-. Our analysis and notation rely heavily on the work of Chvatal [7]; the reader may find it helpful to read his paper first. The Partial Cover Problem: Input: Finite sets S 1 ; : : : ; S n with positi... |

4 | A general lower bound on the number of examples needed for learning - Valiant - 1988 |

3 |
The fastest descent method for covering problems
- Nigmatullin
- 1969
(Show Context)
Citation Context ...arning monomials with errors to a generalization of the weighted set cover problem, and give an approximation algorithm for this problem (generalizing the greedy algorithm analyzed by several authors =-=[7, 11, 18]-=-) that is of independent interest. This approximation algorithm is used as a subroutine in a learning algorithm that tolerates an improved error rate for monomials. In the other direction, we prove th... |

1 |
Learning in an infinite attribute space
- Blum
- 1990
(Show Context)
Citation Context ... smaller than the total number of variables n. Other improvements in the performance of learning algorithms in the presence of many irrelevant attributes are investigated by Littlestone [16] and Blum =-=[3]-=-. We note that by applying Theorem 2 we can show that even for M 1 n , the class of monomials of length 1, the positive-only and negative-only malicious error rates are bounded by ffl=(n \Gamma 1). Th... |