Results 1  10
of
46
Efficient noisetolerant learning from statistical queries
 JOURNAL OF THE ACM
, 1998
"... In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from stat ..."
Abstract

Cited by 288 (6 self)
 Add to MetaCart
In this paper, we study the problem of learning in the presence of classification noise in the probabilistic learning model of Valiant and its variants. In order to identify the class of “robust” learning algorithms in the most general way, we formalize a new but related model of learning from statistical queries. Intuitively, in this model, a learning algorithm is forbidden to examine individual examples of the unknown target function, but is given access to an oracle providing estimates of probabilities over the sample space of random examples. One of our main results shows that any class of functions learnable from statistical queries is in fact learnable with classification noise in Valiant’s model, with a noise rate approaching the informationtheoretic barrier of 1/2. We then demonstrate the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learned in the presence of noise). A notable exception to this statement is the class of parity functions, which we prove is not learnable from statistical queries, and for which no noisetolerant algorithm is known.
Active learning literature survey
, 2010
"... The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., ..."
Abstract

Cited by 132 (1 self)
 Add to MetaCart
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns. An active learner may ask queries in the form of unlabeled instances to be labeled by an oracle (e.g., a human annotator). Active learning is wellmotivated in many modern machine learning problems, where unlabeled data may be abundant but labels are difficult, timeconsuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for active learning, a summary of several problem setting variants, and a discussion
Analysis of perceptronbased active learning
 In COLT
, 2005
"... Abstract. We start by showing that in an active learning setting, the Perceptron algorithm needs \Omega ( 1ffl2) labels to learn linear separators within generalization error ffl. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron ..."
Abstract

Cited by 69 (12 self)
 Add to MetaCart
Abstract. We start by showing that in an active learning setting, the Perceptron algorithm needs \Omega ( 1ffl2) labels to learn linear separators within generalization error ffl. We then present a simple selective sampling algorithm for this problem, which combines a modification of the perceptron update with an adaptive filtering rule for deciding which points to query. For data distributed uniformly over the unit sphere, we show that our algorithm reaches generalization error ffl after asking for just ~O(d log 1ffl) labels. This exponential improvement over the usual sample complexity of supervised learning has previously been demonstrated only for the computationally more complex querybycommittee algorithm. 1 Introduction In many machine learning applications, unlabeled data is abundant but labelingis expensive. This distinction is not captured in the standard PAC or online models of supervised learning, and has motivated the field of active learning, inwhich the labels of data points are initially hidden, and the learner must pay for each label it wishes revealed. If query points are chosen randomly, the numberof labels needed to reach a target generalization error ffl, at a target confidencelevel 1 ffi, is similar to the sample complexity of supervised learning. The hopeis that there are alternative querying strategies which require significantly fewer
A PACstyle Model for Learning from Labeled and Unlabeled Data
 In Proceedings of the 18th Annual Conference on Computational Learning Theory (COLT
, 2005
"... There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of di#erent approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical model ..."
Abstract

Cited by 52 (9 self)
 Add to MetaCart
There has been growing interest in practice in using unlabeled data together with labeled data in machine learning, and a number of di#erent approaches have been developed. However, the assumptions these methods are based on are often quite distinct and not captured by standard theoretical models. In this paper we describe a PACstyle framework that can be used to model many of these assumptions, and analyze samplecomplexity issues in this setting: that is, how much of each type of data one should expect to need in order to learn well, and what are the basic quantities that these numbers depend on. Our model can be viewed as an extension of the standard PAC model, where in addition to a concept class C, one also proposes a type of compatibility that one believes the target concept should have with the underlying distribution.
New results for learning noisy parities and halfspaces
 In Proceedings of the 47th Annual Symposium on Foundations of Computer Science (FOCS
, 2006
"... We address wellstudied problems concerning the learnability of parities and halfspaces in the presence of classification noise. Learning of parities under the uniform distribution with random classification noise, also called the noisy parity problem is a famous open problem in computational learni ..."
Abstract

Cited by 47 (11 self)
 Add to MetaCart
We address wellstudied problems concerning the learnability of parities and halfspaces in the presence of classification noise. Learning of parities under the uniform distribution with random classification noise, also called the noisy parity problem is a famous open problem in computational learning. We reduce a number of basic problems regarding learning under the uniform distribution to learning of noisy parities. We show that under the uniform distribution, learning parities with adversarial classification noise reduces to learning parities with random classification noise. Together with the parity learning algorithm of Blum et al. [5], this gives the first nontrivial algorithm for learning parities with adversarial noise. We show that learning of DNF expressions reduces to learning noisy parities of just logarithmic number of variables. We show that learning of kjuntas reduces to learning noisy parities of k variables. These reductions work even in the presence of random classification noise in the original DNF or junta. We then consider the problem of learning halfspaces over Qn with adversarial noise or finding a halfspace that maximizes the agreement rate with a given set of examples. We prove an essentially optimal hardness factor of 2 − ɛ, improving the factor of 85 84 − ɛ due to Bshouty and Burroughs [8]. Finally, we show that majorities of halfspaces are hard to PAClearn using any representation, based on the cryptographic assumption underlying the AjtaiDwork cryptosystem.
Learning with Positive and Unlabeled Examples Using Weighted Logistic Regression
 Proceedings of the Twentieth International Conference on Machine Learning (ICML
, 2003
"... The problem of learning with positive and unlabeled examples arises frequently in retrieval applications. ..."
Abstract

Cited by 38 (7 self)
 Add to MetaCart
The problem of learning with positive and unlabeled examples arises frequently in retrieval applications.
Hardness of learning halfspaces with noise
 In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
, 2006
"... Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noisefree case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noisefree case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However, under the promise that a halfspace consistent with a fraction (1 − ε) of the examples exists (for some small constant ε> 0), it was not known how to efficiently find a halfspace that is correct on even 51 % of the examples. Nor was a hardness result that ruled out getting agreement on more than 99.9 % of the examples known. In this work, we close this gap in our understanding, and prove that even a tiny amount of worstcase noise makes the problem of learning halfspaces intractable in a strong sense. Specifically, for arbitrary ε, δ> 0, we prove that given a set of exampleslabel pairs from the hypercube a fraction (1 − ε) of which can be explained by a halfspace, it is NPhard to find a halfspace that correctly labels a fraction (1/2 + δ) of the examples. The hardness result is tight since it is trivial to get agreement on 1/2 the examples. In learning theory parlance, we prove that weak proper agnostic learning of halfspaces is hard. This settles a question that was raised by Blum et al. in their work on learning halfspaces in the presence of random classification noise [10], and in some more recent works as well. Along the way, we also obtain a strong hardness result for another basic computational problem: solving a linear system over the rationals. 1
A Simple Polynomialtime Rescaling Algorithm for Solving Linear Programs
 Proceedings of STOC’04
, 2004
"... The perceptron algorithm, developed mainly in the machine learning literature, is a simple greedy method for finding a feasible solution to a linear program (alternatively, for learning a threshold function.). In spite of its exponential worstcase complexity, it is often quite useful, in part due to ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
The perceptron algorithm, developed mainly in the machine learning literature, is a simple greedy method for finding a feasible solution to a linear program (alternatively, for learning a threshold function.). In spite of its exponential worstcase complexity, it is often quite useful, in part due to its noisetolerance and also its overall simplicity. In this paper, we show that a randomized version of the perceptron algorithm with periodic rescaling runs in polynomialtime. The resulting algorithm for linear programming has an elementary description and analysis.