Results 1  10
of
68
Learning Decision Trees using the Fourier Spectrum
, 1991
"... This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each ..."
Abstract

Cited by 182 (10 self)
 Add to MetaCart
This work gives a polynomial time algorithm for learning decision trees with respect to the uniform distribution. (This algorithm uses membership queries.) The decision tree model that is considered is an extension of the traditional boolean decision tree model that allows linear operations in each node (i.e., summation of a subset of the input variables over GF (2)). This paper shows how to learn in polynomial time any function that can be approximated (in norm L 2 ) by a polynomially sparse function (i.e., a function with only polynomially many nonzero Fourier coefficients). The authors demonstrate that any function f whose L 1 norm (i.e., the sum of absolute value of the Fourier coefficients) is polynomial can be approximated by a polynomially sparse function, and prove that boolean decision trees with linear operations are a subset of this class of functions. Moreover, it is shown that the functions with polynomial L 1 norm can be learned deterministically. The algorithm can a...
PP is Closed Under Intersection
 Journal of Computer and System Sciences
, 1991
"... In his seminal paper on probabilistic Turing machines, Gill [13] asked whether the class PP is closed under intersection and union. We give a positive answer to this question. We also show that PP is closed under a variety of polynomialtime truthtable reductions. Consequences in complexity theory ..."
Abstract

Cited by 89 (9 self)
 Add to MetaCart
In his seminal paper on probabilistic Turing machines, Gill [13] asked whether the class PP is closed under intersection and union. We give a positive answer to this question. We also show that PP is closed under a variety of polynomialtime truthtable reductions. Consequences in complexity theory include the definite collapse and (assuming P<F NaN> 6= PP) separation of certain query hierarchies over PP. Similar techniques allow us to combine several threshold gates into a single threshold gate. Consequences in the study of circuits include the simulation of circuits with a small number of threshold gates by circuits having only a single threshold gate at the root (perceptrons), and a lower bound on the number of threshold gates needed to compute the parity function. 1. Introduction The class PP was defined in 1972 by John Gill [13, 14] and independently by Janos Simon [26] in 1974. PP is the class of languages accepted by a polynomialtime bounded nondeterministic Turing machine t...
The Polynomial Method in Circuit Complexity
 In Proceedings of the 8th IEEE Structure in Complexity Theory Conference
, 1993
"... The representation of functions as lowdegree polynomials over various rings has provided many insights in the theory of smalldepth circuits. We survey some of the closure properties, upper bounds, and lower bounds obtained via this approach. 1. Introduction There is a long history of using polyno ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
The representation of functions as lowdegree polynomials over various rings has provided many insights in the theory of smalldepth circuits. We survey some of the closure properties, upper bounds, and lower bounds obtained via this approach. 1. Introduction There is a long history of using polynomials in order to prove complexity bounds. Minsky and Papert [39] used polynomials to prove early lower bounds on the order of perceptrons. Razborov [46] and Smolensky [49] used them to prove lower bounds on the size of ANDOR circuits. Other lower bounds via polynomials are due to [50, 4, 10, 51, 9, 55]. Paturi and Saks [44] discovered that rational functions could be used for lower bounds on the size of threshold circuits. Toda [53] used polynomials to prove upper bounds on the power of the polynomial hierarchy. This led to a series of upper bounds on the power of the polynomial hierarchy [54, 52], AC 0 [2, 3, 52, 19], and ACC [58, 20, 30, 37], and related classes [21, 42]. Beigel and Gi...
Learning Intersections and Thresholds of Halfspaces
"... We give the first polynomial time algorithm to learn any function of a constant number of halfspaces under the uniform distribution to within any constant error parameter. We also give the first quasipolynomial time algorithm for learning any function of a polylog number of polynomialweight halfsp ..."
Abstract

Cited by 65 (22 self)
 Add to MetaCart
We give the first polynomial time algorithm to learn any function of a constant number of halfspaces under the uniform distribution to within any constant error parameter. We also give the first quasipolynomial time algorithm for learning any function of a polylog number of polynomialweight halfspaces under any distribution. As special cases of these results we obtain algorithms for learning intersections and thresholds of halfspaces. Our uniform distribution learning algorithms involve a novel nongeometric approach to learning halfspaces; we use Fourier techniques together with a careful analysis of the noise sensitivity of functions of halfspaces. Our algorithms for learning under any distribution use techniques from real approximation theory to construct low degree polynomial threshold functions.
A Polynomialtime Algorithm for Learning Noisy Linear Threshold Functions
, 1996
"... In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in p ..."
Abstract

Cited by 61 (12 self)
 Add to MetaCart
In this paper we consider the problem of learning a linear threshold function (a halfspace in n dimensions, also called a "perceptron"). Methods for solving this problem generally fall into two categories. In the absence of noise, this problem can be formulated as a Linear Program and solved in polynomial time with the Ellipsoid Algorithm or Interior Point methods. Alternatively, simple greedy algorithms such as the Perceptron Algorithm are often used in practice and have certain provable noisetolerance properties; but, their running time depends on a separation parameter, which quanties the amount of "wiggle room" available for a solution, and can be exponential in the description length of the input. In this paper, we show how simple greedy methods can be used to nd weak hypotheses (hypotheses that correctly classify noticeably more than half of the examples) in polynomial time, without dependence on any separation parameter. Suitably combining these hypotheses results in a polynomialtime algorithm for learning linear threshold functions in the PAC model in the presence of random classification noise. (Also, a polynomialtime algorithm for learning linear threshold functions in the Statistical Query model of Kearns.) Our algorithm is based on a new method for removing outliers in data. Specifically, for any set S of points in R n , each given to b bits of precision, we show that one can remove only a small fraction of S so that in the remaining set T , for every vector v, max x2T (v x) 2 poly(n; b)E x2T (v x) 2 ; i.e., for any hyperplane through the origin, the maximum distance (squared) from a point in T to the plane is at most polynomially larger than the average. After removing these outliers, we are able to show that a modified v...
Unprovability of Lower Bounds on the Circuit Size in Certain Fragments of Bounded Arithmetic
 in Izvestiya of the Russian Academy of Science, mathematics
, 1995
"... To appear in Izvestiya of the RAN We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators ..."
Abstract

Cited by 54 (6 self)
 Add to MetaCart
To appear in Izvestiya of the RAN We show that if strong pseudorandom generators exist then the statement “α encodes a circuit of size n (log ∗ n) for SATISFIABILITY ” is not refutable in S2 2 (α). For refutation in S1 2 (α), this is proven under the weaker assumption of the existence of generators secure against the attack by small depth circuits, and for another system which is strong enough to prove exponential lower bounds for constantdepth circuits, this is shown without using any unproven hardness assumptions. These results can be also viewed as direct corollaries of interpolationlike theorems for certain “split versions ” of classical systems of Bounded Arithmetic introduced in this paper.
On computation and communication with small bias
 In Proc. of the 22nd Conf. on Computational Complexity (CCC
, 2007
"... We present two results for computational models that allow error probabilities close to 1/2. First, most computational complexity classes have an analogous class in communication complexity. The class PP in fact has two, a version with weakly restricted bias called PP cc, and a version with unrestri ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
We present two results for computational models that allow error probabilities close to 1/2. First, most computational complexity classes have an analogous class in communication complexity. The class PP in fact has two, a version with weakly restricted bias called PP cc, and a version with unrestricted bias called UPP cc. Ever since their introduction by Babai, Frankl, and Simon in 1986, it has been open whether these classes are the same. We show that PP cc � UPP cc. Our proof combines a query complexity separation due to Beigel with a technique of Razborov that translates the acceptance probability of quantum protocols to polynomials. Second, we study how small the bias of minimaldegree polynomials that signrepresent Boolean functions needs to be. We show that the worstcase bias is at worst doubleexponentially small in the signdegree (which was very recently shown to be optimal by Podolski), while the averagecase bias can be made singleexponentially small in the signdegree (which we show to be close to optimal). 1
Lower Bounds for Polynomial Calculus: NonBinomial Case
, 2001
"... We generalize recent linear lower bounds for Polynomial Calculus based on binomial ideals. We produce a general hardness criterion (that we call immunity) which is satisfied by a random function and prove linear lower bounds on the degree of PC refutations for a wide class of tautologies based on im ..."
Abstract

Cited by 37 (9 self)
 Add to MetaCart
We generalize recent linear lower bounds for Polynomial Calculus based on binomial ideals. We produce a general hardness criterion (that we call immunity) which is satisfied by a random function and prove linear lower bounds on the degree of PC refutations for a wide class of tautologies based on immune functions. As some applications of our techniques, we introduce mod p Tseitin tautologies in the Boolean case (e.g. in the presence of axioms x 2 i = x i ), prove that they are hard for PC over fields with characteristic different from p, and generalize them to Flow tautologies which are based on the MAJORITY function and are proved to be hard over any field. We also show the Ω(n) lower bound for random kCNF's over fields of characteristic 2.
Separating AC 0 from depth2 majority circuits
 In Proc. of the 39th Symposium on Theory of Computing (STOC
, 2007
"... Abstract. We construct a function in AC 0 that cannot be computed by a depth2 majority circuit of size less than exp(Θ(n 1/5)). This solves an open problem due to Krause and Pudlák (1994) and matches Allender’s classic result (1989) that AC 0 can be efficiently simulated by depth3 majority circuit ..."
Abstract

Cited by 36 (17 self)
 Add to MetaCart
Abstract. We construct a function in AC 0 that cannot be computed by a depth2 majority circuit of size less than exp(Θ(n 1/5)). This solves an open problem due to Krause and Pudlák (1994) and matches Allender’s classic result (1989) that AC 0 can be efficiently simulated by depth3 majority circuits. To obtain our result, we develop a novel technique for proving lower bounds on communication complexity. This technique, the Degree/Discrepancy Theorem, is of independent interest. It translates lower bounds on the threshold degree of a Boolean function into upper bounds on the discrepancy of a related function. Upper bounds on the discrepancy, in turn, immediately imply lower bounds on communication and circuit size. In particular, our work yields the first known function in AC 0 with exponentially small discrepancy, exp(−Ω(n 1/5)). Key words. Majority circuits, constantdepth AND/OR/NOT circuits, communication complexity, discrepancy, threshold degree of Boolean functions. AMS subject classifications. 03D15, 68Q15, 68Q17
Learnability Beyond AC^0
"... We give an algorithm to learn constantdepth polynomialsize circuits augmented with majority gates under the uniform distribution using random examples only. For circuits which contain a polylogarithmic number of majority gates the algorithm runs in quasipolynomial time. This is the first algorithm ..."
Abstract

Cited by 35 (15 self)
 Add to MetaCart
We give an algorithm to learn constantdepth polynomialsize circuits augmented with majority gates under the uniform distribution using random examples only. For circuits which contain a polylogarithmic number of majority gates the algorithm runs in quasipolynomial time. This is the first algorithm for learning a more expressive circuit class than the class AC° of constantdepth polynomialsize circuits, a class which was shown to be learnable in quasipolynomial time by Linial, Mansour and Nisan in 1989. Our approach combines an extension of some of the Fourier analysis from Linial et al. with hypothesis boosting. We also show that under a standard cryptographic assumption our algorithm is essentially optimal with respect to both running time and expressiveness (number of majority gates) of the circuits being learned.