Results 1 - 10
of
24
Rigorous learning curve bounds from statistical mechanics
- Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the well-established Vapnik-Chervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the Vapnik-Chervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problem-specific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distribution-independent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
Yago: A Large Ontology from Wikipedia and WordNet
, 2007
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy a ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic Is-A hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing n-ary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
- in Proceedings of the 34th Annual Symposium on Foundations of Computer Science
, 1993
"... We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced ..."
Abstract
-
Cited by 41 (5 self)
- Add to MetaCart
We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns [12] to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to 6 on the number of queries, O(log2:), the Vapnik-Chervonenkis dimension of the query space, O(1og log log +), and the inverse of the minimum tolerance, O(+ log 3). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by R(1og f) and the inverse of the minimum tolerance by a(:). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which *Author was supported by DARPA Contract N00014-87-K-825 and by NSF Grant CCR-89-14428. Author’s net address: jaaQtheory.lca.rit.edu +.Author was supported by an NDSEG Fellowship and
Learning Distributions by their Density Levels - A Paradigm for Learning Without a Teacher
- Journal of Computer and System Sciences
, 1997
"... We propose a mathematical model for learning the high-density areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we believe that th ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
We propose a mathematical model for learning the high-density areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we believe that this it a rather basic problem that appears in many practical learning scenarios. From a statistical theory standpoint, our model may be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. From a computational learning angle, what we propose is a new framework of un-supervised concept learning. The examples provided to the learner in our model are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution. We investigate the basic features of the proposed model and provide lower and upper bounds on the sample complexity of such learning tasks. Our main result is that the learnability of a class of distributions in this setting is equivalent to the finiteness of the VC-dimension of the class of the high-density areas of these distributions. One direction of the proof involves a reduction of the density-level-learnability to p-concepts learnability, while the sufficiency condition is proved through the introduction of a generic learning algorithm. Keywords Learning Theory, PAC, Vapnik-Chervonenkis dimension, ffl-approximation, un-supervised learning. This research was supported by the David and Ruth Moskowitz Academic Lectureship 1 1
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1994
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models, much-studied since the introduction of the basic PAC model ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models, much-studied since the introduction of the basic PAC model by Valiant in 1984, provide a probabilistic framework for the discussion of generalization and learning. CONTENTS 3 Contents 1 Introduction 4 2 The Basic PAC Model of Learning 5 3 VC-Dimension and Growth Function 8 4 VC-Dimension and Linear Dimension 10 5 A Useful Probability Theorem 12 6 PAC Learning and the VC-Dimension 16 7 VC-Dimension of Binary-Output Networks 19 7.1 Introduction 19 7.2 Linearly weighted neural networks 21 7.3 Linear threshold networks 22 7.4 Other activation functions 26 7.5 The effect of weight restrictions 29 8 Computational Complexity of Learning 30 9 Stochastic Concepts 36 10 Distribution-Specific Learning 39 11 Graph Dimension and Multiple-Output Nets 42 11.1 T...
Sample-efficient Strategies for Learning in the Presence of Noise
, 1999
"... In this paper we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behaviour of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of "=\Delta 2 + d=\Delta (up to l ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behaviour of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of "=\Delta 2 + d=\Delta (up to logarithmic factors) examples are necessary for PAC learning any target class of f0; 1g-valued functions of VC dimension d, where " is the desired accuracy and j = "=(1 + ") \Gamma \Delta the malicious noise rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy " and malicious noise rate j "=(1 + "), this irrespective to sample complexity) . We also show that this result cannot be significantly improved in general by presenting efficient learning algorithms for the class of all subsets of d elements and the class of unions of at most d intervals on the real line. This is especially interesting as we can also show that the popular minimum disagreement strategy needs samples of size d"=\Delta 2 , hence is not optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the bound "=(1 + ") on the noise rate is no longer true and is replaced by 2"=(1 + 2"). In fact, we present a generic algorithm using randomized hypotheses which can tolerate noise rates slightly larger than "=(1 + ") while using samples of size d=" as in the noise-free case. Again one observes a quadratic powerlaw (in this case d"=\Delta 2 , \Delta = 2"=(1 + 2") \Gamma j) as \Delta goes to zero. We show upper and lower bounds of this order.
On the sample complexity of noise-tolerant learning
- Information Processing Letters
, 1996
"... Abstract In this paper, we further characterize the complexity of noise-tolerant learning in the PAC model. Specifically, we show a general lower bound of \Omega \Gamma log(1/ffi)"(1-2j)2 \Delta on the number of examples required for PAC learning in the presence of classification noise. Combine ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Abstract In this paper, we further characterize the complexity of noise-tolerant learning in the PAC model. Specifically, we show a general lower bound of \Omega \Gamma log(1/ffi)"(1-2j)2 \Delta on the number of examples required for PAC learning in the presence of classification noise. Combined with a result ofSimon, we effectively show that the sample complexity of PAC learning in the presence of classification noise is \Omega \Gamma VC(F)"(1-2j)2 + log(1/ffi)"(1-2j)2 \Delta. Furthermore, we demonstrate the optimality of the general lower bound by providing a noise-tolerant learning algorithm for the class of symmetricBoolean functions which uses a sample size within a constant factor of this bound. Finally, we
Bounds On The Number Of Examples Needed For Learning Functions
- In Computational Learning Theory: EUROCOLT'93
, 1997
"... . We prove general lower bounds on the number of examples needed for learning function classes within di#erent natural learning models which are related to pac-learning (and coincide with the pac-learning model of Valiant in the case of {0, 1}-valued functions). The lower bounds are obtained by sh ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
. We prove general lower bounds on the number of examples needed for learning function classes within di#erent natural learning models which are related to pac-learning (and coincide with the pac-learning model of Valiant in the case of {0, 1}-valued functions). The lower bounds are obtained by showing that all nontrivial function classes contain a "hard binary-valued subproblem." Although (at first glance) it seems to be likely that real-valued function classes are much harder to learn than their hardest binary-valued subproblem, we show that these general lower bounds cannot be improved by more than a logarithmic factor. This is done by discussing some natural function classes like nondecreasing functions or piecewise-smooth functions (the function classes that were discussed in [M. J. Kearns and R. E. Schapire, Proc. 31st Annual Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA, 1990, pp. 382--392, full version, J. Comput. System Sci.,...
Improved Lower Bounds for Learning from Noisy Examples: an Information-Theoretic Approach
- Proc eedings of the 11th Annual Conference on Computational Learning Theory
, 1998
"... This paper presents a general information-theoretic approach for obtaining lower bounds on the num-ber of examples needed to PAC learn in the pres-ence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to severa ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper presents a general information-theoretic approach for obtaining lower bounds on the num-ber of examples needed to PAC learn in the pres-ence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to several different models, illustrating its generality and power. The resulting bounds add logarithmic factors to (or improve the constants in) previously known lower bounds. 1
Improved Noise-Tolerant Learning and Generalized Statistical Queries
, 1994
"... The statistical query learning model can be viewed as a tool for creating (or demonstrating the existence of) noise-tolerant learning algorithms in the PAC model. The complexity of a statistical query algorithm, in conjunction with the complexity of simulating SQ algorithms in the PAC model with noi ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
The statistical query learning model can be viewed as a tool for creating (or demonstrating the existence of) noise-tolerant learning algorithms in the PAC model. The complexity of a statistical query algorithm, in conjunction with the complexity of simulating SQ algorithms in the PAC model with noise, determine the complexity of the noise-tolerant PAC algorithms produced. Although roughly optimal upper bounds have been shown for the complexity of statistical query learning, the corresponding noisetolerant PAC algorithms are not optimal due to inefficient simulations. In this paper we provide both improved simulations and a new variant of the statistical query model in order to overcome these inefficiencies. We improve the time complexity of the classification noise simulation of statistical query algorithms. Our new simulation has a roughly optimal dependence on the noise rate. We also derive a simpler proof that statistical queries can be simulated in the presence of classification n...

