Results 1  10
of
19
Efficient Distributionfree Learning of Probabilistic Concepts
 Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behaviorthus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or pconcepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of pconcepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of pconcepts, we study and develop in detail an underlying theory of learning pconcepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Covering Number Bounds of Certain Regularized Linear Function Classes
 Journal of Machine Learning Research
, 2002
"... Recently, sample complexity bounds have been derived for problems involving linear functions such as neural networks and support vector machines. In many of these theoretical studies, the concept of covering numbers played an important role. It is thus useful to study covering numbers for linear ..."
Abstract

Cited by 42 (3 self)
 Add to MetaCart
Recently, sample complexity bounds have been derived for problems involving linear functions such as neural networks and support vector machines. In many of these theoretical studies, the concept of covering numbers played an important role. It is thus useful to study covering numbers for linear function classes. In this paper, we investigate two closely related methods to derive upper bounds on these covering numbers. The first method, already employed in some earlier studies, relies on the socalled Maurey's lemma; the second method uses techniques from the mistake bound framework in online learning. We compare results from these two methods, as well as their consequences in some learning formulations.
Learning From a Consistently Ignorant Teacher
, 1994
"... One view of computational learning theory is that of a learner acquiring the knowledge of a teacher. We introduce a formal model of learning capturing the idea that teachers may have gaps in their knowledge. In particular, we consider learning from a teacher who labels examples "+" (a positive in ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
One view of computational learning theory is that of a learner acquiring the knowledge of a teacher. We introduce a formal model of learning capturing the idea that teachers may have gaps in their knowledge. In particular, we consider learning from a teacher who labels examples "+" (a positive instance of the concept being learned), "\Gamma" (a negative instance of the concept being learned), and "?" (an instance with unknown classification), in such a way that knowledge of the concept class and all the positive and negative examples is not sufficient to determine the labelling of any of the examples labelled with "?". The goal of the learner is not to compensate for the ignorance of the teacher by attempting to infer "+" or "\Gamma" labels for the examples labelled with "?", but is rather to learn (an approximation to) the ternary labelling presented by the teacher. Thus, the goal of the learner is still to acquire the knowledge of the teacher, but now the learner must also ...
NoiseTolerant DistributionFree Learning of General Geometric Concepts
, 1996
"... this paper. First, we give an algorithm to learn C ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
this paper. First, we give an algorithm to learn C
Learning by Canonical Smooth Estimation, Part II: Learning and Choice of Model Complexity
 IEEE Transactions on Automatic Control
"... In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypo ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
In this paper, we analyze the properties of a procedure for learning from examples. This "canonical learner" is based on a canonical error estimator developed in a companion paper. In learning problems, we observe data that consists of labeled sample points, and the goal is to find a model, or "hypothesis," from a set of candidates that will accurately predict the labels of new sample points. The expected mismatch between a hypothesis' prediction and the actual label of a new sample point is called the hypothesis ' "generalization error." We compare the canonical learner with the traditional technique of finding hypotheses that minimize the relative frequencybased empirical error estimate. We show that, for a broad class of learning problems, the set of cases for which such empirical error minimization works is a proper subset of the cases for which the canonical learner works. We derive bounds to show that the number of samples required by these two methods is comparable. We also add...
Learning by Canonical Smooth Estimation, Part I: Simultaneous Estimation
 IEEE Transactions on Automatic Control
, 1996
"... This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, u ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
This paper examines the problem of learning from examples in a framework that is based on, but more general than, Valiant's Probably Approximately Correct (PAC) model for learning. In our framework, the learner observes examples that consist of sample points drawn and labeled according to a fixed, unknown probability distribution. Based on this empirical data, the learner must select, from a set of candidate functions, a particular function, or "hypothesis," that will accurately predict the labels of future sample points. The expected mismatch between a hypothesis' prediction and the label of a new sample point is called the hypothesis' "generalization error." Following the pioneering work of Vapnik and Chervonenkis, others have attacked this sort of learning problem by finding hypotheses that minimize the relative frequencybased empirical error estimate. We generalize this approach by examining the "simultaneous estimation" problem: When does some procedure exist for estimating the g...
Bounds On The Number Of Examples Needed For Learning Functions
 In Computational Learning Theory: EUROCOLT'93
, 1997
"... . We prove general lower bounds on the number of examples needed for learning function classes within di#erent natural learning models which are related to paclearning (and coincide with the paclearning model of Valiant in the case of {0, 1}valued functions). The lower bounds are obtained by sh ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
. We prove general lower bounds on the number of examples needed for learning function classes within di#erent natural learning models which are related to paclearning (and coincide with the paclearning model of Valiant in the case of {0, 1}valued functions). The lower bounds are obtained by showing that all nontrivial function classes contain a "hard binaryvalued subproblem." Although (at first glance) it seems to be likely that realvalued function classes are much harder to learn than their hardest binaryvalued subproblem, we show that these general lower bounds cannot be improved by more than a logarithmic factor. This is done by discussing some natural function classes like nondecreasing functions or piecewisesmooth functions (the function classes that were discussed in [M. J. Kearns and R. E. Schapire, Proc. 31st Annual Symposium on the Foundations of Computer Science, IEEE Computer Society Press, Los Alamitos, CA, 1990, pp. 382392, full version, J. Comput. System Sci.,...
NoiseTolerant Parallel Learning of Geometric Concepts
 IN PROC. 8TH ANNU. CONF. ON COMPUT. LEARNING THEORY
, 1995
"... We present several efficient parallel algorithms for PAClearning geometric concepts in a constantdimensional space that are robust even against malicious misclassification noise of any rate less than 1/2. In particular we consider the class of geometric concepts defined by a polynomial number ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
We present several efficient parallel algorithms for PAClearning geometric concepts in a constantdimensional space that are robust even against malicious misclassification noise of any rate less than 1/2. In particular we consider the class of geometric concepts defined by a polynomial number of (d  1)dimensional hyperplanes against an arbitrary distribution where each hyperplane has a slope from a set of known slopes, and the class of geometric concepts defined by a polynomial number of (d  1)dimensional hyperplanes (of unrestricted slopes) against a product distribution. Next we define a complexity measure of any set S of (d  1)dimensional surfaces that we call the variant of S and prove that the class of geometric concepts defined by surfaces of polynomial variant can be efficiently learned in parallel under a product distribution (even under malicious misclassification noise). Finally, we describe how boosting techniques can be used so that ...
Probably Almost Bayes Decisions
 Information and Computation
, 1991
"... We put Bayes decision theory into the framework of paclearning as introduced by Valiant [Val84]. Unlike classical Boolean concept learning where functions f : f0; 1g n ! f0; 1g are approximated, we assume here that f(¯x) is 0 (or 1) with a certain probability. We develop a theoretical framework f ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
We put Bayes decision theory into the framework of paclearning as introduced by Valiant [Val84]. Unlike classical Boolean concept learning where functions f : f0; 1g n ! f0; 1g are approximated, we assume here that f(¯x) is 0 (or 1) with a certain probability. We develop a theoretical framework for estimating functions and reduce the classification problem to the problem of estimating parameters. Within this framework it is shown that classifications based on n conditional independent Boolean features can efficiently be learned by examples. Our learning algorithm achieves with probability 1 \Gamma ffi an error which comes arbitrarily close (up to an additive ") to the optimal one of a perfect Bayes decision. It requires O i n 3 " 4 ln \Gamma n ffi \Delta j examples. In the particular case of two state classification, learning can be performed on a single neuron. Moreover we relax the restriction of conditional independence to dependencies of bounded order k and show that in...
A Theory for MemoryBased Learning
 Machine Learning
, 1994
"... A memorybased learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information. The main generalization techniques employed by memorybased learning systems are t ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
A memorybased learning system is an extended memory management system that decomposes the input space either statically or dynamically into subregions for the purpose of storing and retrieving functional information. The main generalization techniques employed by memorybased learning systems are the nearestneighbor search, space decomposition techniques, and clustering. Research on memorybased learning is still in its early stage. In particular, there are very few rigorous theoretical results regarding memory requirement, sample size, expected performance, and computational complexity. In this paper, we propose a model for memorybased learning and use it to analyze several methods fflcovering, hashing, clustering, treestructured clustering, and receptivefieldsfor learning smooth functions. The sample size and system complexity are derived for each method. Our model is built upon the generalized PAC learning model of Haussler (Haussler, 1989) and is closely related to the method of vector quantization in data compression. Our main result is that we can build memorybased learning systems using new clustering storage in typical situations.