Results 1  10
of
63
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 357 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
Noisetolerant learning, the parity problem, and the statistical query model
 J. ACM
"... We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomialtime algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known ins ..."
Abstract

Cited by 164 (2 self)
 Add to MetaCart
We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomialtime algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noisetolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns [7]. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model. In codingtheory terms, what we give is a poly(n)time algorithm for decoding linear k × n codes in the presence of random noise for the case of k = clog n log log n for some c> 0. (The case of k O(log n) is trivial since one can just individually check each of the 2 k possible messages and choose the one that yields the closest codeword.) A natural extension of the statistical query model is to allow queries about statistical properties that involve ttuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with twise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions. 1.
On the learnability of discrete distributions
 In The 25th Annual ACM Symposium on Theory of Computing
, 1994
"... We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled ..."
Abstract

Cited by 114 (11 self)
 Add to MetaCart
(Show Context)
We introduce and investigate a new model of learning probability distributions from independent draws. Our model is inspired by the popular Probably Approximately Correct (PAC) model for learning boolean functions from labeled
What Can We Learn Privately?
 49TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or sp ..."
Abstract

Cited by 99 (10 self)
 Add to MetaCart
(Show Context)
Learning problems form an important category of computational tasks that generalizes many of the computations researchers apply to large reallife data sets. We ask: what concept classes can be learned privately, namely, by an algorithm whose output does not depend too heavily on any one input or specific training example? More precisely, we investigate learning algorithms that satisfy differential privacy, a notion that provides strong confidentiality guarantees in the contexts where aggregate information is released about a database containing sensitive information about individuals. We present several basic results that demonstrate general feasibility of private learning and relate several models previously studied separately in the contexts of privacy and standard learning.
The minimum consistent DFA problem cannot be approximated within any polynomial
 Journal of the Association for Computing Machinery
, 1993
"... Abstract. The minimum consistent DFA problem is that of finding a DFA with as few states as possible that is consistent with a given sample (a finite collection of words, each labeled as to whether the DFA found should accept or reject). Assuming that P # NP, it is shown that for any constant k, no ..."
Abstract

Cited by 99 (4 self)
 Add to MetaCart
(Show Context)
Abstract. The minimum consistent DFA problem is that of finding a DFA with as few states as possible that is consistent with a given sample (a finite collection of words, each labeled as to whether the DFA found should accept or reject). Assuming that P # NP, it is shown that for any constant k, no polynomialtime algorithm can be guaranteed to find a consistent DFA with fewer than opt ~ states, where opt is the number of states in the minimum state DFA consistent with the sample. This result holds even if the alphabet is of constant size two, and if the algorithm is allowed to produce an NFA, a regular expression, or a regular grammar that is consistent with the sample. A similar nonapproximability result is presented for the problem of finding small consistent linear grammars. For the case of finding minimum consistent DFAs when the alphabet is not of constant size but instead is allowed to vay with the problem specification, the slightly
Sample compression, learnability, and the VapnikChervonenkis dimension
 MACHINE LEARNING
, 1995
"... Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function r ..."
Abstract

Cited by 83 (5 self)
 Add to MetaCart
Within the framework of paclearning, we explore the learnability of concepts from samples using the paradigm of sample compression schemes. A sample compression scheme of size k for a concept class C ` 2 X consists of a compression function and a reconstruction function. The compression function receives a finite sample set consistent with some concept in C and chooses a subset of k examples as the compression set. The reconstruction function forms a hypothesis on X from a compression set of k examples. For any sample set of a concept in C the compression set produced by the compression function must lead to a hypothesis consistent with the whole original sample set when it is fed to the reconstruction function. We demonstrate that the existence of a sample compression scheme of fixedsize for a class C is sufficient to ensure that the class C is paclearnable. Previous work has shown that a class is paclearnable if and only if the VapnikChervonenkis (VC) dimension of the class i...
PAC Learning from Positive Statistical Queries
 Proc. 9th International Conference on Algorithmic Learning Theory  ALT ’98
, 1998
"... . Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
(Show Context)
. Learning from positive examples occurs very frequently in natural learning. The PAC learning model of Valiant takes many features of natural learning into account, but in most cases it fails to describe such kind of learning. We show that in order to make the learning from positive data possible, extrainformation about the underlying distribution must be provided to the learner. We define a PAC learning model from positive and unlabeled examples. We also define a PAC learning model from positive and unlabeled statistical queries. Relations with PAC model ([Val84]), statistical query model ([Kea93]) and constantpartition classification noise model ([Dec97]) are studied. We show that kDNF and kdecision lists are learnable in both models, i.e. with far less information than it is assumed in previously used algorithms. 1 Introduction The PAC learning model of Valiant ([Val84]) has become the reference model in computational learning theory. However, in spite of the importance of lea...
Learning with Restricted Focus of Attention
, 1997
"... We consider learning tasks in which the learner faces restrictions on the amount of information he can extract from each example he encounters. We introduce a formal framework for the analysis of such scenarios. We call it RFA (Restricted Focus of Attention) learning. While being a natural refine ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
We consider learning tasks in which the learner faces restrictions on the amount of information he can extract from each example he encounters. We introduce a formal framework for the analysis of such scenarios. We call it RFA (Restricted Focus of Attention) learning. While being a natural refinement of the PAC learning model, some of the fundamental PAClearning results and techniques fail in the RFA paradigm; learnability in the RFA model is no longer characterized by the VC dimension, and many PAC learning algorithms are not applicable in the RFA setting. Hence, the RFA formulation reflects the need for new techniques and tools to cope with some fundamental constraints of realistic learning problems. In this work we also present some paradigms and algorithms that may serve as a first step towards answering this need. Two main types of restrictions are considered here  in the stronger one, called kRFA, only k of the n attributes of each example are revealed to the learner, while in the weakest one, called kwRFA, the restriction is made on the size of each observation (k bits), and no restriction is made on how the observations are extracted from the examples. For the stronger kRFA restriction we develop a general technique for composing efficient kRFA algorithms, and apply it to deduce, for instance, the efficient kRFA learnability of kDNF formulas, and the efficient 1RFA learnability of axisaligned rectangles in the Euclidean space R n . We also prove the kRFA learnability of richer classes of Boolean functions (such as kdecision lists) with respect to a given distribution, and the efficient (n \Gamma 1)RFA learnability (for fixed n), under product distributions, of classes of subsets of R n which are defined by mild surfaces. ...
Learning Nonsingular Phylogenies and Hidden Markov Models
 Proceedings of the thirtyseventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract

Cited by 42 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomialtime algorithm for learning nonsingular phylogenies and hidden Markov models.
Learning juntas
 In Proc. 35th Ann. ACM Symp. on the Theory of Computing
, 2003
"... We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions from uniform random examples which runs in time roughly (n k) ω ω+1, where ω & ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
We consider a fundamental problem in computational learning theory: learning an arbitrary Boolean function which depends on an unknown set of k out of n Boolean variables. We give an algorithm for learning such functions from uniform random examples which runs in time roughly (n k) ω ω+1, where ω < 2.376 is the matrix multiplication exponent. We thus obtain the first polynomial factor improvement on the naive n k time bound which can be achieved via exhaustive search. Our algorithm and analysis exploit new structural properties of Boolean functions.