Results 1  10
of
34
Noisetolerant learning, the parity problem, and the statistical query model
 J. ACM
"... We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomialtime algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known ins ..."
Abstract

Cited by 116 (2 self)
 Add to MetaCart
We describe a slightly subexponential time algorithm for learning parity functions in the presence of random classification noise. This results in a polynomialtime algorithm for the case of parity functions that depend on only the first O(log n log log n) bits of input. This is the first known instance of an efficient noisetolerant algorithm for a concept class that is provably not learnable in the Statistical Query model of Kearns [7]. Thus, we demonstrate that the set of problems learnable in the statistical query model is a strict subset of those problems learnable in the presence of noise in the PAC model. In codingtheory terms, what we give is a poly(n)time algorithm for decoding linear k × n codes in the presence of random noise for the case of k = clog n log log n for some c> 0. (The case of k O(log n) is trivial since one can just individually check each of the 2 k possible messages and choose the one that yields the closest codeword.) A natural extension of the statistical query model is to allow queries about statistical properties that involve ttuples of examples (as opposed to single examples). The second result of this paper is to show that any class of functions learnable (strongly or weakly) with twise queries for t = O(log n) is also weakly learnable with standard unary queries. Hence this natural extension to the statistical query model does not increase the set of weakly learnable functions. 1.
Theory and Applications of Agnostic PACLearning with Small Decision Trees
, 1995
"... We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for mo ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAClearning algorithm is shown to be applicable to "realworld" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for w...
Learning from Ambiguity
, 1998
"... There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as MultipleInstance learning. Each example is a bag, consisting of any number of instances. A bag is labeled ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
There are many learning problems for which the examples given by the teacher are ambiguously labeled. In this thesis, we will examine one framework of learning from ambiguous examples known as MultipleInstance learning. Each example is a bag, consisting of any number of instances. A bag is labeled negative if all instances in it are negative. A bag is labeled positive if at least one instance in it is positive. Because the instances themselves are not labeled, each positive bag is an ambiguous example. We would like to learn a concept which will correctly classify unseen bags. We have developed a measure called Diverse Density and algorithms for learning from multipleinstance examples. We have applied these techniques to problems in drug design, stock prediction, and image database retrieval. These serve as examples of how to translate the ambiguity in the application domain into bags, as well as successful...
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
 in Proceedings of the 34th Annual Symposium on Foundations of Computer Science
, 1993
"... We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns [12] to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to 6 on the number of queries, O(log2:), the VapnikChervonenkis dimension of the query space, O(1og log log +), and the inverse of the minimum tolerance, O(+ log 3). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by R(1og f) and the inverse of the minimum tolerance by a(:). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which *Author was supported by DARPA Contract N0001487K825 and by NSF Grant CCR8914428. Author’s net address: jaaQtheory.lca.rit.edu +.Author was supported by an NDSEG Fellowship and
Learning in natural language
 Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI ’99); 31 July–6
, 1999
"... Statisticsbased classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, le ..."
Abstract

Cited by 42 (22 self)
 Add to MetaCart
Statisticsbased classifiers in natural language are developed typically by assuming a generative model for the data, estimating its parameters from training data and then using Bayes rule to obtain a classifier. For many problems the assumptions made by the generative models are evidently wrong, leaving open the question of why these approaches work. This paper presents a learning theory account of the major statistical approaches to learning in natural language. A class of Linear Statistical Queries (LSQ) hypotheses is defined and learning with it is shown to exhibit some robustness properties. Many statistical learners used in natural language, including naive Bayes, Markov Models and Maximum Entropy models are shown to be LSQ hypotheses, explaining the robustness of these predictors even when the underlying probabilistic assumptions do not hold. This coherent view of when and why learning approaches work in this context may help to develop better learning methods and an understanding of the role of learning in natural language inferences. 1
Smooth Boosting and Learning with Malicious Noise
 Journal of Machine Learning Research
, 2003
"... We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In pa ..."
Abstract

Cited by 40 (6 self)
 Add to MetaCart
We describe a new boosting algorithm which generates only smooth distributions which do not assign too much weight to any single example. We show that this new boosting algorithm can be used to construct efficient PAC learning algorithms which tolerate relatively high rates of malicious noise. In particular, we use the new smooth boosting algorithm to construct malicious noise tolerant versions of the PACmodel pnorm linear threshold learning algorithms described in [23]. The bounds on sample complexity and malicious noise tolerance of these new PAC algorithms closely correspond to known bounds for the online p...
Efficient agnostic paclearning with simple hypotheses
 Proc. of the 7th Annual ACM Conference on Computational Learning Theory
, 1994
"... We exhibit efficient algorithms for agnostic PAClearning with rectangles, unions of two rectangles, and unions of k intervals as hypotheses. These hypothesis classes are of some interest from the point of view of applied machine learning, because empirical studies show that hypotheses of this simp ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
We exhibit efficient algorithms for agnostic PAClearning with rectangles, unions of two rectangles, and unions of k intervals as hypotheses. These hypothesis classes are of some interest from the point of view of applied machine learning, because empirical studies show that hypotheses of this simple type (in just one or two of the attributes) provide good prediction rules for various realworld classification problems. In addition, optimal hypotheses of this type may provide valuable heuristic insight into the structure of a realworld classification problem, The algorithms that are introduced in this paper make it feasible to compute optimal hypotheses of this type for a training set of several hundred examples. We also exhibit an approximation algorithm that can compute nearly optimal hypotheses for much larger datasets.
Specification and Simulation of Statistical Query Algorithms for Efficiency and Noise Tolerance
 Journal of Computer and System Sciences
, 1995
"... A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the PAC model, both in the absence and in the presence of noise. However, simulations of SQ algorithms in ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
A recent innovation in computational learning theory is the statistical query (SQ) model. The advantage of specifying learning algorithms in this model is that SQ algorithms can be simulated in the PAC model, both in the absence and in the presence of noise. However, simulations of SQ algorithms in the PAC model have nonoptimal time and sample complexities. In this paper, we introduce a new method for specifying statistical query algorithms based on a type of relative error and provide simulations in the noisefree and noisetolerant PAC models which yield more efficient algorithms. Requests for estimates of statistics in this new model take the form: "Return an estimate of the statistic within a 1 \Sigma factor, or return `?', promising that the statistic is less than `." In addition to showing that this is a very natural language for specifying learning algorithms, we also show that this new specification is polynomially equivalent to standard SQ, and thus, known learnability and hardness results for statistical query learning are preserved. We then give highly efficient PAC simulations of relative error SQ algorithms. We show that the learning algorithms obtained by simulating efficient relative error SQ algorithms in both the absence of noise and in the presence of malicious noise have roughly optimal sample complexity. We also show that the simulation of efficient relative error SQ algorithms in the presence of classification noise yield learning algorithms at least as efficient as those obtained through standard methods, and in some cases improved, roughly optimal results are achieved. The sample complexities for all of these simulations are based on the d metric which is a type of relative error metric useful for quantities which are small or even zero. We sho...
CLASSIC Learning
 In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory
, 1991
"... . Description logics, also called terminological logics, are commonly used in knowledgebased systems to describe objects and their relationships. We investigate the learnability of a typical description logic, Classic, and show that Classic sentences are learnable in polynomial time in the exact lea ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
. Description logics, also called terminological logics, are commonly used in knowledgebased systems to describe objects and their relationships. We investigate the learnability of a typical description logic, Classic, and show that Classic sentences are learnable in polynomial time in the exact learning model using equivalence queries and membership queries (which are in essence, "subsumption queries"we show a prediction hardness result for the more traditional membership queries that convey information about specific individuals). We show that membership queries alone are insufficient for polynomial time learning of Classic sentences. Combined with earlier negative results (Cohen & Hirsh, 1994a) showing that, given standard complexity theoretic assumptions, equivalence queries alone are insufficient (or random examples alone in the PAC setting are insufficient), this shows that both sources of information are necessary for efficient learning in that neither type alone is sufficie...
On Learning Visual Concepts and DNF Formulae
, 1993
"... We consider the problem of learning DNF formulae in the mistakebound and the PAC models. We develop a new approach, which is called polynomial explainability, that is shown to be useful for learning some new subclasses of DNF (and CNF) formulae that were not known to be learnable before. Unlike pre ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
We consider the problem of learning DNF formulae in the mistakebound and the PAC models. We develop a new approach, which is called polynomial explainability, that is shown to be useful for learning some new subclasses of DNF (and CNF) formulae that were not known to be learnable before. Unlike previous learnability results for DNF (and CNF) formulae, these subclasses are not limited in the number of terms or in the number of variables per term; yet, they contain the subclasses of kDNF and ktermDNF (and the corresponding classes of CNF) as special cases. We apply our DNF results to the problem of learning visual concepts and obtain learning algorithms for several natural subclasses of visual concepts that appear to have no natural boolean counterpart. On the other hand, we show that learning some other natural subclasses of visual concepts is as hard as learning the class of all DNF formulae. We also consider the robustness of these results under various types of noise.