Results 1  10
of
188
Boosting the margin: A new explanation for the effectiveness of voting methods
 In Proceedings International Conference on Machine Learning
, 1997
"... Abstract. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show ..."
Abstract

Cited by 721 (52 self)
 Add to MetaCart
Abstract. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik’s support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the biasvariance decomposition. 1
Selective sampling using the Query by Committee algorithm
 Machine Learning
, 1997
"... We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queri ..."
Abstract

Cited by 336 (7 self)
 Add to MetaCart
We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.
Scalesensitive Dimensions, Uniform Convergence, and Learnability
, 1997
"... Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distributionfree convergence property of means to expectations uniformly over classes of random variables. Classes of realvalued functions enjoy ..."
Abstract

Cited by 208 (1 self)
 Add to MetaCart
Learnability in Valiant's PAC learning model has been shown to be strongly related to the existence of uniform laws of large numbers. These laws define a distributionfree convergence property of means to expectations uniformly over classes of random variables. Classes of realvalued functions enjoying such a property are also known as uniform GlivenkoCantelli classes. In this paper we prove, through a generalization of Sauer's lemma that may be interesting in its own right, a new characterization of uniform GlivenkoCantelli classes. Our characterization yields Dudley, Gin'e, and Zinn's previous characterization as a corollary. Furthermore, it is the first based on a simple combinatorial quantity generalizing the VapnikChervonenkis dimension. We apply this result to obtain the weakest combinatorial condition known to imply PAC learnability in the statistical regression (or "agnostic") framework. Furthermore, we show a characterization of learnability in the probabilistic concept model, solving an open problem posed by Kearns and Schapire. These results show that the accuracy parameter plays a crucial role in determining the effective complexity of the learner's hypothesis class.
A Model of Inductive Bias Learning
 Journal of Artificial Intelligence Research
, 2000
"... A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonablysized training sets. Typically such bias is suppl ..."
Abstract

Cited by 143 (0 self)
 Add to MetaCart
A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonablysized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Exp...
Which Problems Have Strongly Exponential Complexity?
 Journal of Computer and System Sciences
, 1998
"... For several NPcomplete problems, there have been a progression of better but still exponential algorithms. In this paper, we address the relative likelihood of subexponential algorithms for these problems. We introduce a generalized reduction which we call SubExponential Reduction Family (SERF) t ..."
Abstract

Cited by 128 (5 self)
 Add to MetaCart
For several NPcomplete problems, there have been a progression of better but still exponential algorithms. In this paper, we address the relative likelihood of subexponential algorithms for these problems. We introduce a generalized reduction which we call SubExponential Reduction Family (SERF) that preserves subexponential complexity. We show that CircuitSAT is SERFcomplete for all NPsearch problems, and that for any fixed k, kSAT, kColorability, kSet Cover, Independent Set, Clique, Vertex Cover, are SERFcomplete for the class SNP of search problems expressible by second order existential formulas whose first order part is universal. In particular, subexponential complexity for any one of the above problems implies the same for all others. We also look at the issue of proving strongly exponential lower bounds for AC 0 ; that is, bounds of the form 2 \Omega\Gamma n) . This problem is even open for depth3 circuits. In fact, such a bound for depth3 circuits with even l...
Bounds on the Sample Complexity of Bayesian Learning Using Information Theory and the VC Dimension
 Machine Learning
, 1994
"... In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the l ..."
Abstract

Cited by 108 (12 self)
 Add to MetaCart
In this paper we study a Bayesian or averagecase model of concept learning with a twofold goal: to provide more precise characterizations of learning curve (sample complexity) behavior that depend on properties of both the prior distribution over concepts and the sequence of instances seen by the learner, and to smoothly unite in a common framework the popular statistical physics and VC dimension theories of learning curves. To achieve this, we undertake a systematic investigation and comparison of two fundamental quantities in learning and information theory: the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This study leads to a new understanding of the sample complexity of learning in several existing models. 1 Introduction Consider a simple concept learning model in which the learner attempts to infer an unknown target concept f , chosen from a known concept class F of f0; 1gvalued functions over an instance space X....
Mechanism design via differential privacy
 Proceedings of the 48th Annual Symposium on Foundations of Computer Science
, 2007
"... We study the role that privacypreserving algorithms, which prevent the leakage of specific information about participants, can play in the design of mechanisms for strategic agents, which must encourage players to honestly report information. Specifically, we show that the recent notion of differen ..."
Abstract

Cited by 103 (3 self)
 Add to MetaCart
We study the role that privacypreserving algorithms, which prevent the leakage of specific information about participants, can play in the design of mechanisms for strategic agents, which must encourage players to honestly report information. Specifically, we show that the recent notion of differential privacy [15, 14], in addition to its own intrinsic virtue, can ensure that participants have limited effect on the outcome of the mechanism, and as a consequence have limited incentive to lie. More precisely, mechanisms with differential privacy are approximate dominant strategy under arbitrary player utility functions, are automatically resilient to coalitions, and easily allow repeatability. We study several special cases of the unlimited supply auction problem, providing new results for digital goods auctions, attribute auctions, and auctions with arbitrary structural constraints on the prices. As an important prelude to developing a privacypreserving auction mechanism, we introduce and study a generalization of previous privacy work that accommodates the high sensitivity of the auction setting, where a single participant may dramatically alter the optimal fixed price, and a slight change in the offered price may take the revenue from optimal to zero. 1
On LinearTime Deterministic Algorithms for Optimization Problems in Fixed Dimension
, 1992
"... We show that with recently developed derandomization techniques, one can convert Clarkson's randomized algorithm for linear programming in fixed dimension into a lineartime deterministic one. The constant of proportionality is d O(d) , which is better than for previously known such algorithms. We s ..."
Abstract

Cited by 94 (11 self)
 Add to MetaCart
We show that with recently developed derandomization techniques, one can convert Clarkson's randomized algorithm for linear programming in fixed dimension into a lineartime deterministic one. The constant of proportionality is d O(d) , which is better than for previously known such algorithms. We show that the algorithm works in a fairly general abstract setting, which allows us to solve various other problems (such as finding the maximum volume ellipsoid inscribed into the intersection of n halfspaces) in linear time.