Results 1  10
of
25
Yago: A Large Ontology from Wikipedia and WordNet
, 2007
"... This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic IsA hierarchy a ..."
Abstract

Cited by 72 (11 self)
 Add to MetaCart
This article presents YAGO, a large ontology with high coverage and precision. YAGO has been automatically derived from Wikipedia and WordNet. It comprises entities and relations, and currently contains more than 1.7 million entities and 15 million facts. These include the taxonomic IsA hierarchy as well as semantic relations between entities. The facts for YAGO have been extracted from the category system and the infoboxes of Wikipedia and have been combined with taxonomic relations from WordNet. Type checking techniques help us keep YAGO’s precision at 95% – as proven by an extensive evaluation study. YAGO is based on a clean logical model with a decidable consistency. Furthermore, it allows representing nary relations in a natural way while maintaining compatibility with RDFS. A powerful query model facilitates access to YAGO’s data.
Rigorous learning curve bounds from statistical mechanics
 Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the VapnikChervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problemspecific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distributionindependent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
General Bounds on Statistical Query Learning and PAC Learning with Noise via Hypothesis Boosting
 in Proceedings of the 34th Annual Symposium on Foundations of Computer Science
, 1993
"... We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced ..."
Abstract

Cited by 45 (5 self)
 Add to MetaCart
We derive general bounds on the complexity of learning in the Statistical Query model and in the PAC model with classification noise. We do so by considering the problem of boosting the accuracy of weak learning algorithms which fall within the Statistical Query model. This new model was introduced by Kearns [12] to provide a general framework for efficient PAC learning in the presence of classification noise. We first show a general scheme for boosting the accuracy of weak SQ learning algorithms, proving that weak SQ learning is equivalent to strong SQ learning. The boosting is efficient and is used to show our main result of the first general upper bounds on the complexity of strong SQ learning. Specifically, we derive simultaneous upper bounds with respect to 6 on the number of queries, O(log2:), the VapnikChervonenkis dimension of the query space, O(1og log log +), and the inverse of the minimum tolerance, O(+ log 3). In addition, we show that these general upper bounds are nearly optimal by describing a class of learning problems for which we simultaneously lower bound the number of queries by R(1og f) and the inverse of the minimum tolerance by a(:). We further apply our boosting results in the SQ model to learning in the PAC model with classification noise. Since nearly all PAC learning algorithms can be cast in the SQ model, we can apply our boosting techniques to convert these PAC algorithms into highly efficient SQ algorithms. By simulating these efficient SQ algorithms in the PAC model with classification noise, we show that nearly all PAC algorithms can be converted into highly efficient PAC algorithms which *Author was supported by DARPA Contract N0001487K825 and by NSF Grant CCR8914428. Author’s net address: jaaQtheory.lca.rit.edu +.Author was supported by an NDSEG Fellowship and
Learning distributions by their density levels: A paradigm for learning without a teacher
 Journal of Computer and System Sciences
, 1997
"... We propose a mathematical model for learning the highdensity areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we believethat this i ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
We propose a mathematical model for learning the highdensity areas of an unknown distribution from (unlabeled) random points drawn according to this distribution. While this type of a learning task has not been previously addressed in the Computational Learnability literature, we believethat this it a rather basic problem that appears in many practical learning scenarios. From a statistical theory standpoint, our model may be viewed as a restricted instance of the fundamental issue of inferring information about a probability distribution from the random samples it generates. From a computational learning angle, what we propose is a new framework of unsupervised concept learning. The examples provided to the learner in our model are not labeled (and are not necessarily all positive or all negative). The only information about their membership is indirectly disclosed to the student through the sampling distribution. We investigate the basic features of the proposed model and provide lower and upper bounds on the sample complexity of such learning tasks. Our main result is that the learnability of a class of distributions in this setting is equivalent to the niteness of the VCdimension of the class of the highdensity areas of these distributions. One direction of the proof involves a reduction of the densitylevellearnability to pconcepts learnability, while the su ciency condition is proved through the introduction of a generic learning algorithm.
Probabilistic Analysis of Learning in Artificial Neural Networks: The PAC Model and its Variants
, 1997
"... There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generali ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
There are a number of mathematical approaches to the study of learning and generalization in artificial neural networks. Here we survey the `probably approximately correct' (PAC) model of learning and some of its variants. These models provide a probabilistic framework for the discussion of generalization and learning. This survey concentrates on the sample complexity questions in these models; that is, the emphasis is on how many examples should be used for training. Computational complexity considerations are briefly discussed for the basic PAC model. Throughout, the importance of the VapnikChervonenkis dimension is highlighted. Particular attention is devoted to describing how the probabilistic models apply in the context of neural network learning, both for networks with binaryvalued output and for networks with realvalued output.
On the sample complexity of noisetolerant learning
 Information Processing Letters
, 1996
"... Abstract In this paper, we further characterize the complexity of noisetolerant learning in the PAC model. Specifically, we show a general lower bound of \Omega \Gamma log(1/ffi)"(12j)2 \Delta on the number of examples required for PAC learning in the presence of classification noise. Combine ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Abstract In this paper, we further characterize the complexity of noisetolerant learning in the PAC model. Specifically, we show a general lower bound of \Omega \Gamma log(1/ffi)"(12j)2 \Delta on the number of examples required for PAC learning in the presence of classification noise. Combined with a result ofSimon, we effectively show that the sample complexity of PAC learning in the presence of classification noise is \Omega \Gamma VC(F)"(12j)2 + log(1/ffi)"(12j)2 \Delta. Furthermore, we demonstrate the optimality of the general lower bound by providing a noisetolerant learning algorithm for the class of symmetricBoolean functions which uses a sample size within a constant factor of this bound. Finally, we
Bounds On The Number Of Examples Needed For Learning Functions
 In Computational Learning Theory: EUROCOLT'93
, 1997
"... . We prove general lower bounds on the number of examples needed for learning function classes within di#erent natural learning models which are related to paclearning (and coincide with the paclearning model of Valiant in the case of {0, 1}valued functions). The lower bounds are obtained by sh ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
. We prove general lower bounds on the number of examples needed for learning function classes within di#erent natural learning models which are related to paclearning (and coincide with the paclearning model of Valiant in the case of {0, 1}valued functions). The lower bounds are obtained by showing that all nontrivial function classes contain a "hard binaryvalued subproblem." Although (at first glance) it seems to be likely that realvalued function classes are much harder to learn than their hardest binaryvalued subproblem, we show that these general lower bounds cannot be improved by more than a logarithmic factor. This is done by discussing some natural function classes like nondecreasing functions or piecewisesmooth functions (the function classes that were discussed in [M. J. Kearns and R. E. Schapire, Proc. 31st Annual Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA, 1990, pp. 382392, full version, J. Comput. System Sci.,...
Improved Lower Bounds for Learning from Noisy Examples: an InformationTheoretic Approach
 Proc eedings of the 11th Annual Conference on Computational Learning Theory
, 1998
"... This paper presents a general informationtheoretic approach for obtaining lower bounds on the number of examples needed to PAC learn in the presence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to severa ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
This paper presents a general informationtheoretic approach for obtaining lower bounds on the number of examples needed to PAC learn in the presence of noise. This approach deals directly with the fundamental information quantities, avoiding a Bayesian analysis. The technique is applied to several different models, illustrating its generality and power. The resulting bounds add logarithmic factors to (or improve the constants in) previously known lower bounds. 1
Sampleefficient Strategies for Learning in the Presence of Noise
, 1999
"... In this paper we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behaviour of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of "=\Delta 2 + d=\Delta (up to l ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
In this paper we prove various results about PAC learning in the presence of malicious noise. Our main interest is the sample size behaviour of learning algorithms. We prove the first nontrivial sample complexity lower bound in this model by showing that order of "=\Delta 2 + d=\Delta (up to logarithmic factors) examples are necessary for PAC learning any target class of f0; 1gvalued functions of VC dimension d, where " is the desired accuracy and j = "=(1 + ") \Gamma \Delta the malicious noise rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy " and malicious noise rate j "=(1 + "), this irrespective to sample complexity) . We also show that this result cannot be significantly improved in general by presenting efficient learning algorithms for the class of all subsets of d elements and the class of unions of at most d intervals on the real line. This is especially interesting as we can also show that the popular minimum disagreement strategy needs samples of size d"=\Delta 2 , hence is not optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the bound "=(1 + ") on the noise rate is no longer true and is replaced by 2"=(1 + 2"). In fact, we present a generic algorithm using randomized hypotheses which can tolerate noise rates slightly larger than "=(1 + ") while using samples of size d=" as in the noisefree case. Again one observes a quadratic powerlaw (in this case d"=\Delta 2 , \Delta = 2"=(1 + 2") \Gamma j) as \Delta goes to zero. We show upper and lower bounds of this order.
Improved NoiseTolerant Learning and Generalized Statistical Queries
, 1994
"... The statistical query learning model can be viewed as a tool for creating (or demonstrating the existence of) noisetolerant learning algorithms in the PAC model. The complexity of a statistical query algorithm, in conjunction with the complexity of simulating SQ algorithms in the PAC model with noi ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
The statistical query learning model can be viewed as a tool for creating (or demonstrating the existence of) noisetolerant learning algorithms in the PAC model. The complexity of a statistical query algorithm, in conjunction with the complexity of simulating SQ algorithms in the PAC model with noise, determine the complexity of the noisetolerant PAC algorithms produced. Although roughly optimal upper bounds have been shown for the complexity of statistical query learning, the corresponding noisetolerant PAC algorithms are not optimal due to inefficient simulations. In this paper we provide both improved simulations and a new variant of the statistical query model in order to overcome these inefficiencies. We improve the time complexity of the classification noise simulation of statistical query algorithms. Our new simulation has a roughly optimal dependence on the noise rate. We also derive a simpler proof that statistical queries can be simulated in the presence of classification n...