Results 1  10
of
100
Boosting a Weak Learning Algorithm By Majority
, 1995
"... We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas pr ..."
Abstract

Cited by 516 (15 self)
 Add to MetaCart
We present an algorithm for improving the accuracy of algorithms for learning binary concepts. The improvement is achieved by combining a large number of hypotheses, each of which is generated by training the given learning algorithm on a different set of examples. Our algorithm is based on ideas presented by Schapire in his paper "The strength of weak learnability", and represents an improvement over his results. The analysis of our algorithm provides general upper bounds on the resources required for learning in Valiant's polynomial PAC learning framework, which are the best general upper bounds known today. We show that the number of hypotheses that are combined by our algorithm is the smallest number possible. Other outcomes of our analysis are results regarding the representational power of threshold circuits, the relation between learnability and compression, and a method for parallelizing PAC learning algorithms. We provide extensions of our algorithms to cases in which the conc...
An Efficient MembershipQuery Algorithm for Learning DNF with Respect to the Uniform Distribution
, 1994
"... We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this alg ..."
Abstract

Cited by 173 (13 self)
 Add to MetaCart
We present a membershipquery algorithm for efficiently learning DNF with respect to the uniform distribution. In fact, the algorithm properly learns with respect to uniform the class TOP of Boolean functions expressed as a majority vote over parity functions. We also describe extensions of this algorithm for learning DNF over certain nonuniform distributions and for learning a class of geometric concepts that generalizes DNF. Furthermore, we show that DNF is weakly learnable with respect to uniform from noisy examples. Our strong learning algorithm utilizes one of Freund's boosting techniques and relies on the fact that boosting does not require a completely distributionindependent weak learner. The boosted weak learner is a nonuniform extension of a parityfinding algorithm discovered by Goldreich and Levin. 3 1 Introduction Consider the following 20questionslike game between two players, Bob and Alice. Bob has a Disjunctive Normal Form (DNF) expression f in mind. Alice is allo...
Game Theory, Online Prediction and Boosting
 PROCEEDINGS OF THE NINTH ANNUAL CONFERENCE ON COMPUTATIONAL LEARNING THEORY
, 1996
"... We study the close connections between game theory, online prediction and boosting. After a brief review of game theory, we describe an algorithm for learning to play repeated games based on the online prediction methods of Littlestone and Warmuth. The analysis of this algorithm yields a simple pr ..."
Abstract

Cited by 161 (14 self)
 Add to MetaCart
We study the close connections between game theory, online prediction and boosting. After a brief review of game theory, we describe an algorithm for learning to play repeated games based on the online prediction methods of Littlestone and Warmuth. The analysis of this algorithm yields a simple proof of von Neumann’s famous minmax theorem, as well as a provable method of approximately solving a game. We then show that the online prediction model is obtained by applying this gameplaying algorithm to an appropriate choice of game and that boosting is obtained by applying the same algorithm to the “dual” of this game.
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 83 (3 self)
 Add to MetaCart
(Show Context)
To Mom, Dad, and Susan, for their support and encouragement.
On the size of weights for threshold gates
 SIAM JOURNAL ON DISCRETE MATHEMATICS
, 1994
"... We prove that if n is a power of 2 then there is a threshold function that on n inputs that requires weights of size around 2 (n log n)=2;n. This almost matches the known upper bounds. ..."
Abstract

Cited by 74 (0 self)
 Add to MetaCart
(Show Context)
We prove that if n is a power of 2 then there is a threshold function that on n inputs that requires weights of size around 2 (n log n)=2;n. This almost matches the known upper bounds.
Bounds for the Computational Power and Learning Complexity of Analog Neural Nets
 Proc. of the 25th ACM Symp. Theory of Computing
, 1993
"... . It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. ..."
Abstract

Cited by 63 (17 self)
 Add to MetaCart
(Show Context)
. It is shown that high order feedforward neural nets of constant depth with piecewise polynomial activation functions and arbitrary real weights can be simulated for boolean inputs and outputs by neural nets of a somewhat larger size and depth with heaviside gates and weights from f\Gamma1; 0; 1g. This provides the first known upper bound for the computational power of the former type of neural nets. It is also shown that in the case of first order nets with piecewise linear activation functions one can replace arbitrary real weights by rational numbers with polynomially many bits, without changing the boolean function that is computed by the neural net. In order to prove these results we introduce two new methods for reducing nonlinear problems about weights in multilayer neural nets to linear problems for a transformed set of parameters. These transformed parameters can be interpreted as weights in a somewhat larger neural net. As another application of our new proof technique we s...
Software Abstractions
, 2006
"... We give an algorithm that with high probability properly learns random monotone t(n)term DNF under the uniform distribution on the Boolean cube {0, 1} n. For any polynomially bounded function t(n) ≤ poly(n) the algorithm runs in time poly(n, 1/ɛ) and with high probability outputs an ɛaccurate mon ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
We give an algorithm that with high probability properly learns random monotone t(n)term DNF under the uniform distribution on the Boolean cube {0, 1} n. For any polynomially bounded function t(n) ≤ poly(n) the algorithm runs in time poly(n, 1/ɛ) and with high probability outputs an ɛaccurate monotone DNF hypothesis. This is the first algorithm that can learn monotone DNF of arbitrary polynomial size in a reasonable averagecase model of learning from random examples only.
Separating AC 0 from depth2 majority circuits
 In Proc. of the 39th Symposium on Theory of Computing (STOC
, 2007
"... Abstract. We construct a function in AC 0 that cannot be computed by a depth2 majority circuit of size less than exp(Θ(n 1/5)). This solves an open problem due to Krause and Pudlák (1994) and matches Allender’s classic result (1989) that AC 0 can be efficiently simulated by depth3 majority circuit ..."
Abstract

Cited by 49 (19 self)
 Add to MetaCart
(Show Context)
Abstract. We construct a function in AC 0 that cannot be computed by a depth2 majority circuit of size less than exp(Θ(n 1/5)). This solves an open problem due to Krause and Pudlák (1994) and matches Allender’s classic result (1989) that AC 0 can be efficiently simulated by depth3 majority circuits. To obtain our result, we develop a novel technique for proving lower bounds on communication complexity. This technique, the Degree/Discrepancy Theorem, is of independent interest. It translates lower bounds on the threshold degree of a Boolean function into upper bounds on the discrepancy of a related function. Upper bounds on the discrepancy, in turn, immediately imply lower bounds on communication and circuit size. In particular, our work yields the first known function in AC 0 with exponentially small discrepancy, exp(−Ω(n 1/5)). Key words. Majority circuits, constantdepth AND/OR/NOT circuits, communication complexity, discrepancy, threshold degree of Boolean functions. AMS subject classifications. 03D15, 68Q15, 68Q17
Cryptographic hardness for learning intersections of halfspaces
 J. Comput. Syst. Sci
"... ..."
(Show Context)
Bounded Independence Fools Halfspaces
 In Proc. 50th Annual Symposium on Foundations of Computer Science (FOCS), 2009
"... We show that any distribution on {−1, +1} n that is kwise independent fools any halfspace (a.k.a. linear threshold function) h: {−1, +1} n → {−1, +1}, i.e., any function of the form h(x) = sign ( ∑n i=1 wixi − θ) where the w1,..., wn, θ are arbitrary real numbers, with error ɛ for k = O(ɛ−2 log 2 ..."
Abstract

Cited by 43 (17 self)
 Add to MetaCart
(Show Context)
We show that any distribution on {−1, +1} n that is kwise independent fools any halfspace (a.k.a. linear threshold function) h: {−1, +1} n → {−1, +1}, i.e., any function of the form h(x) = sign ( ∑n i=1 wixi − θ) where the w1,..., wn, θ are arbitrary real numbers, with error ɛ for k = O(ɛ−2 log 2 (1/ɛ)). Our result is tight up to log(1/ɛ) factors. Using standard constructions of kwise independent distributions, we obtain the first explicit pseudorandom generators G: {−1, +1} s → {−1, +1} n that fool halfspaces. Specifically, we fool halfspaces with error ɛ and seed length s = k · log n = O(log n · ɛ−2 log 2 (1/ɛ)). Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Comput. Complexity 2007).