## On learning monotone DNF under product distributions (2001)

### Cached

### Download Links

- [www.cs.columbia.edu]
- [www.cs.columbia.edu]
- [www1.cs.columbia.edu]
- [www.cs.columbia.edu]
- [www.deas.harvard.edu]
- [www.deas.harvard.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory |

Citations: | 32 - 15 self |

### BibTeX

@INPROCEEDINGS{Servedio01onlearning,

author = {Rocco A. Servedio},

title = {On learning monotone DNF under product distributions},

booktitle = {In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory},

year = {2001},

pages = {473--489}

}

### Years of Citing Articles

### OpenURL

### Abstract

We show that the class of monotone 2 O( √ log n)-term DNF formulae can be PAC learned in polynomial time under the uniform distribution from random examples only. This is an exponential improvement over the best previous polynomial-time algorithms in this model, which could learn monotone o(log 2 n)-term DNF. We also show that various classes of small constant-depth circuits which compute monotone functions are PAC learnable in polynomial time under the uniform distribution. All of our results extend to learning under any constant-bounded product distribution.

### Citations

1690 | A Theory of the Learnable
- Valiant
- 1984
(Show Context)
Citation Context ...ive normal form formula, or DNF, is a disjunction of conjunctions of Boolean literals. The size of a DNF is the number of conjunctions (also known as terms) which it contains. In a seminal 1984 paper =-=[25]-=- Valiant introduced the distribution-free model of Probably Approximately Correct (PAC) learning from random examples and posed the question of whether polynomial-size DNF are PAC learnable in polynom... |

644 |
Types of Queries for Concept Learning
- ANGLUIN
- 1986
(Show Context)
Citation Context ...intriguing question since, as described below, efficient algorithms are known for several related problems. It is known that if membership queries are allowed, then Angluin’s exact learning algorithm =-=[2]-=- for monotone DNF yields an efficient algorithm for PAC learning polynomial size monotone DNF under any probability distribution. On the other hand, if membership queries are not allowed then a simple... |

278 |
Constant depth circuits, Fourier transform, and learnability
- Linial, Mansour, et al.
- 1993
(Show Context)
Citation Context ...h circuits which compute monotone functions. All of our results extend to learning under any constant-bounded product distribution. Our algorithm combines ideas from Linial et al.’s influential paper =-=[21]-=- on learning AC0 functions using the Fourier transform and Bshouty and Tamon’s paper [10] on learning monotone functions using the Fourier transform. By analyzing the Fourier transform of AC0 function... |

222 | The influence of variables on Boolean functions
- Kahn, Kalai, et al.
- 1988
(Show Context)
Citation Context .../δ) time steps for all ɛ, δ > 0, and with probability at least 1 − δ outputs a set Sf ⊆ [n] such that i ∈ Sf implies � ˆf(A) 2 � ɛ/2 and i /∈ Sf implies � ˆf(A) 2 � ɛ. A:i∈A A:i∈A Proof: Kahn et al. (=-=[16]-=- Section 3) have shown that IU,i(f) = � ˆf(A) 2 . (3) A:i∈A To prove the lemma it thus suffices to show that IU,i(f) can be estimated to within accuracy ɛ/4 with high probability. By Equation (1) from... |

202 | A Guided Tour to Chernoff Bounds - C - 1989 |

182 |
Computational limitations of small-depth circuits
- H˚astad
- 1987
(Show Context)
Citation Context ...ρp,D has been chosen, f⌈ρ is a specific deterministic function; the randomness stems entirely from the choice of ρp,D as described above. The following variant of H˚astad’s well known switching lemma =-=[14]-=- follows directly from the argument in Section 4 of [3]: Lemma 11 Let D be a product distribution with parameters µi and β as defined above, let f be a CNF formula where each clause has at most t lite... |

177 |
Sipser M.: The Complexity of Finite Functions
- Boppana
- 1989
(Show Context)
Citation Context ... n)1/(d+1) ) monotone circuits (i.e. circuits of the stated size and depth which contain only AND and OR gates). This follows from results of Okol’nishnikova [23] and Ajtai and Gurevich [1] (see also =-=[7]-=- Section 3.6) which show that there are monotone functions which can be computed by AC 0 circuits but are not computable by AC 0 circuits which have no negations. 4 Product Distributions A product dis... |

167 | On the learnability of boolean formulae
- Kearns, Li, et al.
- 1987
(Show Context)
Citation Context ...ity distribution. On the other hand, if membership queries are not allowed then a simple reduction shows that PAC learning monotone DNF under any distribution is as hard as PAC learning arbitrary DNF =-=[17]-=-. This equivalence is not preserved for distribution-specific learning, though, and thus it is possible that monotone DNF are efficiently learnable under the uniform distribution while general DNF are... |

164 | An efficient membership-query algorithm for learning DNF with respect to the uniform distribution
- Jackson
- 1997
(Show Context)
Citation Context ...plies that Pr m � � � − p� > ɛ � δ. Our learning model is a distribution-specific version of Valiant’s Probably Approximately Correct (PAC) model [25] which has been studied by many researchers, e.g. =-=[4, 6, 9, 10, 11, 13, 15, 19, 20, 21, 22, 26]-=-. Let C be a class of Boolean functions over {0, 1} n , let D be a probability distribution over {0, 1} n , and let f ∈ C be an unknown target function. A learning algorithm A for C takes as input an ... |

118 | Weakly learning DNF and characterizing statistical query learning using Fourier analysis
- Blum, Furst, et al.
- 1994
(Show Context)
Citation Context ...plies that Pr m � � � − p� > ɛ � δ. Our learning model is a distribution-specific version of Valiant’s Probably Approximately Correct (PAC) model [25] which has been studied by many researchers, e.g. =-=[4, 6, 9, 10, 11, 13, 15, 19, 20, 21, 22, 26]-=-. Let C be a class of Boolean functions over {0, 1} n , let D be a probability distribution over {0, 1} n , and let f ∈ C be an unknown target function. A learning algorithm A for C takes as input an ... |

82 | Exact learning via the monotone theory - Bshouty - 1993 |

52 |
Learning DNF under the uniform distribution in quasipolynomial time
- Verbeurgt
- 1990
(Show Context)
Citation Context ...ce is not preserved for distribution-specific learning, though, and thus it is possible that monotone DNF are efficiently learnable under the uniform distribution while general DNF are not. Verbeurgt =-=[26]-=- gave an algorithm which can learn polynomial-size DNF (including monotone DNF) under the uniform distribution in time n log n . In the model of weak learning, Kearns et al. [18] showed that the class... |

51 |
A Representation of the Joint Distribution of Responses to n Dichotomous Items
- Bahadur
- 1961
(Show Context)
Citation Context ...1sand a corresponding norm �f�D = � 〈f, f〉D. We refer to this norm as the D-norm. For i = 1, . . . , n let zi = (xi − µi)/σi. Given A ⊆ [n], let φA be defined as φA(x) = � i∈A zi. As noted by Bahadur =-=[5]-=- and Furst et al. [11], the 2n functions φA form an orthonormal basis for the vector space of real valued functions on {0, 1} n with respect to the D-norm, i.e. 〈φA, φB〉D is 1 if A = B and is 0 otherw... |

51 | Fast learning of k-term DNF formulas with queries - Blum, Rudich - 1995 |

49 | Exact learning boolean functions via the monotone theory
- Bshouty
- 1995
(Show Context)
Citation Context ...rned in polynomial time under arbitrary distributions. More recently Sakai and Maruoka [24] gave a polynomial-time algorithm for learning monotone O(log n)-term DNF under the uniform distribution. In =-=[8]-=- Bshouty gave a polynomial-time uniform-distribution algorithm for learning a class which includes monotone O(log n)-term DNF. Later Bshouty and Tamon [10] gave a polynomial-time algorithm for learnin... |

48 | On the Fourier spectrum of monotone functions
- Bshouty, Tamon
- 1996
(Show Context)
Citation Context ...)-term DNF under the uniform distribution. In [8] Bshouty gave a polynomial-time uniform-distribution algorithm for learning a class which includes monotone O(log n)-term DNF. Later Bshouty and Tamon =-=[10]-=- gave a polynomial-time algorithm for learning (under any constantbounded product distribution) a class which includes monotone O(log 2 n/(log log n) 3 )- 2sterm DNF. 1.2 Our Results We give an algori... |

48 | An O.n log log n / learning algorithm for DNF under the uniform distribution - Mansour - 1995 |

47 |
A switching lemma primer
- Beame
- 1994
(Show Context)
Citation Context ...nction; the randomness stems entirely from the choice of ρp,D as described above. The following variant of H˚astad’s well known switching lemma [14] follows directly from the argument in Section 4 of =-=[3]-=-: Lemma 11 Let D be a product distribution with parameters µi and β as defined above, let f be a CNF formula where each clause has at most t literals, and let ρp,D be a random restriction. Then with p... |

38 | More efficient PAC-learning of DNF with membership queries under the uniform distribution
- Bshouty, Jackson, et al.
- 2004
(Show Context)
Citation Context ...plies that Pr m � � � − p� > ɛ � δ. Our learning model is a distribution-specific version of Valiant’s Probably Approximately Correct (PAC) model [25] which has been studied by many researchers, e.g. =-=[4, 6, 9, 10, 11, 13, 15, 19, 20, 21, 22, 26]-=-. Let C be a class of Boolean functions over {0, 1} n , let D be a probability distribution over {0, 1} n , and let f ∈ C be an unknown target function. A learning algorithm A for C takes as input an ... |

38 | Boosting and hard-core sets
- Klivans, Servedio
- 2003
(Show Context)
Citation Context |

36 | A guided tour of Cherno bounds - Hagerup, Rub - 1990 |

34 | Monotone versus positive
- Ajtai, Gurevich
- 1987
(Show Context)
Citation Context ... size 2 O((log n)1/(d+1) ) monotone circuits (i.e. circuits of the stated size and depth which contain only AND and OR gates). This follows from results of Okol’nishnikova [23] and Ajtai and Gurevich =-=[1]-=- (see also [7] Section 3.6) which show that there are monotone functions which can be computed by AC 0 circuits but are not computable by AC 0 circuits which have no negations. 4 Product Distributions... |

28 | Simple learning algorithms using divide and conquer - Bshouty - 1997 |

27 | Improved learning of AC 0 functions
- Furst, Jackson, et al.
- 1991
(Show Context)
Citation Context |

25 | Learning sub-classes of monotone dnf on the uniform distribution - Verbeurgt - 1998 |

23 | On using the Fourier transform to learn disjoint DNF
- Khardon
- 1994
(Show Context)
Citation Context |

21 |
Learning monotone log-term DNF formulas under the uniform distribution
- Sakai, Maruoka
(Show Context)
Citation Context ...niform distribution. It has long been known [25] that DNF formulas with a constant number of terms can be PAC learned in polynomial time under arbitrary distributions. More recently Sakai and Maruoka =-=[24]-=- gave a polynomial-time algorithm for learning monotone O(log n)-term DNF under the uniform distribution. In [8] Bshouty gave a polynomial-time uniform-distribution algorithm for learning a class whic... |

18 | On learning monotone DNF formulae under uniform distributions - Kucera, Marchetti-Spaccamela, et al. - 1994 |

17 | Improved learning of AC ~ functions - Furst, Jackson, et al. - 1991 |

16 | A technique for upper bounding the spectral norm with applications to learning
- BELLARE
- 1992
(Show Context)
Citation Context |

13 | Learning monotone k-� DNF formulas on product distributions - HANCOCK, MANSOUR - 1991 |

12 |
Learning monotone k-µ DNF formulas on product distributions
- Hancock, Mansour
- 1991
(Show Context)
Citation Context ...→ {−1, 1} is monotone if changing the value of an input bit from 0 to 1 never causes the value of f to change from 1 to −1. If D is a distribution and f is a Boolean function on {0, 1} n , then as in =-=[10, 13]-=- we say that the influence of xi on f with respect to D is the probability that f(x) differs from f(y), where y is x with the i-th bit flipped and x is drawn from D. For ease of notation let fi,0 deno... |

8 | Queries and concept learning. Machine learning 2(4):319–342 - Angluin - 1988 |

7 | The complexity of learning formulas and decision trees that have restricted reads - Hancock - 1992 |

7 | The complexity of functions - Boppana, Sipser - 1990 |

6 | Learning monotone k-u DNF formulas on product distributions - Hancock, Mansour - 1991 |

5 | More e#cient PAC learning of DNF with membership queries under the uniform distribution - Bshouty, Jackson, et al. - 1999 |

2 | On the applicationf of multiplicity automata - Beimel, Bergadano, et al. - 1996 |

1 |
On the influence of negations on the complexity of a realization of monotone Boolean functions by formulas of bounded depth, Metody Diskret
- Okol'nishnikova
- 1982
(Show Context)
Citation Context ...cludes the class of depth d, size 2 O((log n)1/(d+1) ) monotone circuits (i.e. circuits of the stated size and depth which contain only AND and OR gates). This follows from results of Okol’nishnikova =-=[23]-=- and Ajtai and Gurevich [1] (see also [7] Section 3.6) which show that there are monotone functions which can be computed by AC 0 circuits but are not computable by AC 0 circuits which have no negatio... |

1 | A simple algorithm for learing O(log n)-term DNF - Kushilevitz - 1997 |