Results 1 
8 of
8
Improving generalization with active learning
 Machine Learning
, 1994
"... Abstract. Active learning differs from "learning from examples " in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, g ..."
Abstract

Cited by 419 (1 self)
 Add to MetaCart
Abstract. Active learning differs from "learning from examples " in that the learning algorithm assumes at least some control over what part of the input domain it receives information about. In some situations, active learning is provably more powerful than learning from examples alone, giving better generalization for a fixed number of training examples. In this article, we consider the problem of learning a binary concept in the absence of noise. We describe a formalism for active concept learning called selective sampling and show how it may be approximately implemented by a neural network. In selective sampling, a learner receives distribution information from the environment and queries an oracle on parts of the domain it considers "useful. " We test our implementation, called an SGnetwork, on three domains and observe significant improvement in generalization.
Theory and Applications of Agnostic PACLearning with Small Decision Trees
, 1995
"... We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for mo ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
We exhibit a theoretically founded algorithm T2 for agnostic PAClearning of decision trees of at most 2 levels, whose computation time is almost linear in the size of the training set. We evaluate the performance of this learning algorithm T2 on 15 common "realworld" datasets, and show that for most of these datasets T2 provides simple decision trees with little or no loss in predictive power (compared with C4.5). In fact, for datasets with continuous attributes its error rate tends to be lower than that of C4.5. To the best of our knowledge this is the first time that a PAClearning algorithm is shown to be applicable to "realworld" classification problems. Since one can prove that T2 is an agnostic PAClearning algorithm, T2 is guaranteed to produce close to optimal 2level decision trees from sufficiently large training sets for any (!) distribution of data. In this regard T2 differs strongly from all other learning algorithms that are considered in applied machine learning, for w...
Rigorous learning curve bounds from statistical mechanics
 Machine Learning
, 1994
"... Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, an ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
Abstract In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior (functional form) of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. 1 Introduction According to the VapnikChervonenkis (VC) theory of learning curves [27, 26], minimizing empirical error within a function class F on a random sample of m examples leads to generalization error bounded by ~O(d=m) (in the case that the target function is contained in F) or ~O(pd=m) plus the optimal generalization error achievable within F (in the general case). 1 These bounds are universal: they hold for any class of hypothesis functions F, for any input distribution, and for any target function. The only problemspecific quantity remaining in these bounds is the VC dimension d, a measure of the complexity of the function class F. It has been shown that these bounds are essentially the best distributionindependent bounds possible, in the sense that for any function class, there exists an input distribution for which matching lower bounds on the generalization error can be given [5, 7, 22].
DistributionDependent VapnikChervonenkis Bounds
, 1999
"... . VapnikChervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical mean ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
. VapnikChervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical means to their expectations over the years. The result obtained by Talagrand in 1994 seems to provide more or less the final word to this issue as far as universal bounds are concerned. Though for fixed distributions, this bound can be practically outperformed. We show indeed that it is possible to replace the 2ffl 2 under the exponential of the deviation term by the corresponding Cram'er transform as shown by large deviations theorems. Then, we formulate rigorous distributionsensitive VC bounds and we also explain why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures. 1 Introduction and motivations One of t...
Generalization errors of the simple perceptron
 Journal of Physics A
, 1998
"... Abstract. To find an exact form for the generalization error of a learning machine is an open ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Abstract. To find an exact form for the generalization error of a learning machine is an open
Rigorous Learning Curve Bounds from Statistical Mechanics
, 1996
"... . In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are al ..."
Abstract
 Add to MetaCart
. In this paper we introduce and investigate a mathematically rigorous theory of learning curves that is based on ideas from statistical mechanics. The advantage of our theory over the wellestablished VapnikChervonenkis theory is that our bounds can be considerably tighter in many cases, and are also more reflective of the true behavior of learning curves. This behavior can often exhibit dramatic properties such as phase transitions, as well as power law asymptotics not explained by the VC theory. The disadvantages of our theory are that its application requires knowledge of the input distribution, and it is limited so far to finite cardinality function classes. We illustrate our results with many concrete examples of learning curve bounds derived from our theory. Keywords: learning curves, statistical mechanics, phase transitions, VC dimension 1. Introduction According to the VapnikChervonenkis (VC) theory of learning curves (Vapnik, 1982; Vapnik & Chervonenkis, 1971), minimizing e...
!()+, ./01 23456
, 1995
"... Computing the maximum bichromatic discrepancy is an interesting theoretical problem with important applications in computational learning theory, computational geometry and computer graphics. In this paper we give algorithms to compute the maximum bichromatic discrepancy for simple geometric ranges, ..."
Abstract
 Add to MetaCart
Computing the maximum bichromatic discrepancy is an interesting theoretical problem with important applications in computational learning theory, computational geometry and computer graphics. In this paper we give algorithms to compute the maximum bichromatic discrepancy for simple geometric ranges, including rectangles and halfspaces. In addition, we give extensions to other discrepancy problems. 1 Introduction The main theme of this paper is to present efficient algorithms that solve the problem of computing the maximum bichromatic discrepancy for axis oriented rectangles. This problem arises naturally in different areas of computer science, such as computational learning theory, computational geometry and computer graphics ([Ma], [DG]), and has applications in all these areas. In computational learning theory, the problem of agnostic PAClearning with simple geometric hypotheses can be reduced to the problem of computing the maximum bichromatic discrepancy for simple geometric ra...
Downloaded from mdm.sagepub.com at PENNSYLVANIA STATE UNIV on August 10, 2010Tutorial Neural Networks in Clinical Medicine
, 1996
"... Additional services and information for Medical Decision Making can be found at: ..."
Abstract
 Add to MetaCart
Additional services and information for Medical Decision Making can be found at: