Results 1  10
of
44
Noisetolerant learning, the parity problem, and the statistical query model
 J. ACM
"... We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomialtime algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known ins ..."
Abstract

Cited by 165 (2 self)
 Add to MetaCart
We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomialtime algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noisetolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns [7]. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model. In codingtheory terms, what we give is a poly(n)time algorithm for decoding linear k × n codes in the presence of random noise for the case of k = clog n log log n for some c> 0. (The case of k O(log n) is trivial since one can just individually check each of the 2 k possible messages and choose the one that yields the closest codeword.) A natural extension of the statistical query model is to allow queries about statistical properties that involve ttuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with twise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions. 1.
Agnostically learning halfspaces
, 2005
"... We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels (e.g ..."
Abstract

Cited by 97 (28 self)
 Add to MetaCart
(Show Context)
We give the first algorithm that (under distributional assumptions) efficiently learns halfspaces in the notoriously difficult agnostic framework of Kearns, Schapire, & Sellie, where a learner is given access to labeled examples drawn from a distribution, without restriction on the labels (e.g. adversarial noise). The algorithm constructs a hypothesis whose error rate on future examples is within an additive ǫ of the optimal halfspace, in time poly(n) for any constant ǫ> 0, under the uniform distribution over {−1, 1}n or the unit sphere in R n, as well as under any logconcave distribution over R n. It also agnostically learns Boolean disjunctions in time 2Õ( n) with respect to any distribution. The new algorithm, essentially L1 polynomial regression, is a noisetolerant arbitrarydistribution generalization of the “lowdegree ” Fourier algorithm of Linial, Mansour, & Nisan. We also give a new algorithm for PAC learning halfspaces under the uniform distribution on the unit sphere with the current best bounds on tolerable rate of “malicious noise.” 1.
Theory and Applications of Agnostic PACLearning with Small Decision Trees
, 1995
"... We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show t ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAClearning algorithm is shown to be applicable to "realworld" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for w...
Smooth Boosting and Learning with Malicious Noise
 Journal of Machine Learning Research
, 2003
"... We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In pa ..."
Abstract

Cited by 55 (5 self)
 Add to MetaCart
We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In particular, we use the new smooth boosting algorithm to construct malicious noise tolerant versions of the PACmodel pnorm linear threshold learning algorithms described in [23]. The bounds on sample complexity and malicious noise tolerance of these new PAC algorithms closely correspond to known bounds for the online p...
Learning from Ambiguity
, 1998
"... There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as MultipleInstance learning. Each example is a bag, consisting of any number of instances. A bag is labeled ..."
Abstract

Cited by 53 (0 self)
 Add to MetaCart
There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as MultipleInstance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multipleinstance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful...
Learning in natural language
 Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI ’99); 31 July–6
, 1999
"... Statisticsbased classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, le ..."
Abstract

Cited by 49 (23 self)
 Add to MetaCart
Statisticsbased classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, leaving open the question of why these approaches work. This paper presents a learning theory account of the major statistical approaches to learning in natural language. A class of Linear Statistical Queries (LSQ) hypotheses is defined and learning with it is shown to exhibit some robustness properties. Many statistical learners used in natural language, including naive Bayes, Markov Models and Maximum Entropy models are shown to be LSQ hypotheses, explaining the robustness of these predictors even when the underlying probabilistic assumptions do not hold. This coherent view of when and why learning approaches work in this context may help to develop better learning methods and an understanding of the role of learning in natural language inferences. 1
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
 in Proceedings of the 34th Annual Symposium on Foundations of Computer Science
, 1993
"... We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced ..."
Abstract

Cited by 48 (5 self)
 Add to MetaCart
(Show Context)
We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns [12] to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to 6 on the number of queries, O(log2:), the VapnikChervonenkis dimension of the query space, O(1og log log +), and the inverse of the minimum tolerance, O(+ log 3). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by R(1og f) and the inverse of the minimum tolerance by a(:). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which *Author was supported by DARPA Contract N0001487K825 and by NSF Grant CCR8914428. Author’s net address: jaaQtheory.lca.rit.edu +.Author was supported by an NDSEG Fellowship and
Efficient agnostic paclearning with simple hypotheses
 Proc. of the 7th Annual ACM Conference on Computational Learning Theory
, 1994
"... We exhibit efficient algorithms for agnostic PAClearning with rectangles, unions of two rectangles, and unions of k intervals as hypotheses. These hypothesis classes are of some interest from the point of view of applied machine learning, because empirical studies show that hypotheses of this simp ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
(Show Context)
We exhibit efficient algorithms for agnostic PAClearning with rectangles, unions of two rectangles, and unions of k intervals as hypotheses. These hypothesis classes are of some interest from the point of view of applied machine learning, because empirical studies show that hypotheses of this simple type (in just one or two of the attributes) provide good prediction rules for various realworld classification problems. In addition, optimal hypotheses of this type may provide valuable heuristic insight into the structure of a realworld classification problem, The algorithms that are introduced in this paper make it feasible to compute optimal hypotheses of this type for a training set of several hundred examples. We also exhibit an approximation algorithm that can compute nearly optimal hypotheses for much larger datasets.
Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance
 Journal of Computer and System Sciences
, 1995
"... A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the PAC model, both in the absence and in the presence of noise. However, simulations of SQ algorithms in ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
(Show Context)
A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the PAC model, both in the absence and in the presence of noise. However, simulations of SQ algorithms in the PAC model have nonoptimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type of relative error and provide simulations in the noisefree and noisetolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the form: "Return an estimate of the statistic within a 1 \Sigma factor, or return `?', promising that the statistic is less than `." In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms in both the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yield learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on the d metric which is a type of relative error metric useful for quantities which are small or even zero. We sho...
MadaBoost: A Modification of AdaBoost
, 2000
"... . In the last decade, one of the research topics that has received a great deal of attention from the machine learning and computational learning communities has been the so called boosting techniques. In this paper, we further explore this topic by proposing a new boosting algorithm that mends some ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
. In the last decade, one of the research topics that has received a great deal of attention from the machine learning and computational learning communities has been the so called boosting techniques. In this paper, we further explore this topic by proposing a new boosting algorithm that mends some of the problems that have been detected in the, so far most successful boosting algorithm, AdaBoost due to Freund and Schapire [FS97]. These problems are: (1) AdaBoost cannot be used in the boosting by filtering framework, and (2) AdaBoost does not seem to be noise resistant. In order to solve them, we propose a new boosting algorithm MadaBoost by modifying the weighting system of AdaBoost. We first prove that one version of MadaBoost is in fact a boosting algorithm. Second, we show how our algorithm can be used and analyzed its performance in detail. Finally, we show that our new boosting algorithm can be casted in the statistical query learning model [Kea93] and thus, it is robust to ra...