## Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative

Citations: | 6 - 3 self |

### BibTeX

@MISC{Tu_unsupervisedlearning,

author = {Kewei Tu and Vasant Honavar},

title = {Unsupervised Learning of Probabilistic Context-Free Grammar using Iterative},

year = {}

}

### OpenURL

### Abstract

Abstract. This paper presents PCFG-BCL, an unsupervised algorithm that learns a probabilistic context-free grammar (PCFG) from positive samples. The algorithm acquires rules of an unknown PCFG through iterative biclustering of bigrams in the training corpus. Our analysis shows that this procedure uses a greedy approach to adding rules such that each set of rules that is added to the grammar results in the largest increase in the posterior of the grammar given the training corpus. Results of our experiments on several benchmark datasets show that PCFG-BCL is competitive with existing methods for unsupervised CFG learning. 1

### Citations

382 |
The estimation of stochastic context-free grammars using the Inside-Outside algorithm, Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...stitutability heuristic to learn “equivalence classes” (OR symbols). In comparison, our algorithm learns the two kinds of symbols simultaneously in a more unified manner. The inside-outside algorithm =-=[13, 14]-=-, one of the earliest algorithms for learning PCFG, assumes a fixed, usually fully connected grammar structure and tries to maximize the likelihood, making it very likely to overfit the training corpu... |

316 | Biclustering algorithms for biological data analysis: a survey
- Madeira, Oliveira
- 2004
(Show Context)
Citation Context ... {x|A → x} corresponds to a set of rows in the table T , and the set {y|B → y} corresponds to a set of columns in T . Therefore, the AND-OR group that contains N, A and B is represented by a bicluster=-=[10]-=- (i.e., a submatrix) in T , and each pair xy in this bicluster can be reduced to N. See Fig.1 (a), (b) for an example, where the AND-OR group shown in Fig.1(a) corresponds to the bicluster shown in Fi... |

178 | Corpus-based induction of syntactic structure: models of dependency and constituency
- Klein, Manning
- 2004
(Show Context)
Citation Context ...there is no structure search, the prior used tends to concentrate the probability mass on a small number of rules, thereby biasing the learning in favor of compact grammars. Some unsupervised methods =-=[15, 16]-=- for learning grammatical structures other than CFG with the goal of parsing natural language sentences also employ some techniques similar to those used in CFG learning. 6.2 Summary and Future Work W... |

73 | The infinite pcfg using hierarchical Dirichlet processes
- Liang, Petrov, et al.
- 2007
(Show Context)
Citation Context ...ss (the ability to find any CFG); and the posterior re-estimation in PCFG-BCL is more straightforward and efficient (by using Eq.2 and 3). An interesting recent proposal within the Bayesian framework =-=[9]-=- involves maximizing the posterior using a non-parametric model. Although there is no structure search, the prior used tends to concentrate the probability mass on a small number of rules, thereby bia... |

71 | Unsupervised learning of natural languages
- Solan, Horn, et al.
- 2005
(Show Context)
Citation Context ... CFGs were converted into CNF with uniform probabilities assigned to the grammar rules. Training corpora were then generated from the resulting grammars. We compared PCFG-BCL with EMILE [1] and ADIOS =-=[5]-=-. Both EMILE and ADIOS produce a CFG from a training corpus, so we again assigned uniform distributions to the rules of the learned CFG in order to evaluate them. Grammar Name Size (in CNF) Recursion ... |

54 | Bayesian grammar induction for language modeling
- Chen
- 1995
(Show Context)
Citation Context ... to maximize the likelihood, making it very likely to overfit the training corpus. Subsequent work has adopted the Bayesian framework to maximize the posterior of the learned grammar given the corpus =-=[6, 7]-=-, and has incorporated grammar structure search [6, 8]. Our choice of prior over the set of candidate grammars is inspired by [6]. However, compared with the approach used in [6], PCFG-BCL adds more g... |

49 | Unsupervised induction of stochastic context-free grammars using distributional clustering
- Clark
- 2001
(Show Context)
Citation Context ...nds of rules. ABL [2] employs the substitutability heuristic to group possible constituents to nonterminals. Clark’s algorithm [4] uses the “substitution-graph” heuristic or distributional clustering =-=[3]-=- to induce new nonterminals and rules. These techniques could be less robust than the biclustering method, especially in the presence of ambiguity as discussed in Section 1 and also in [1]. Both ABL a... |

40 |
Trainable Grammars for Speech Recognition. Speech Communication Papers for the 97th Meeting of the Acoustical
- Baker
- 1979
(Show Context)
Citation Context ...stitutability heuristic to learn “equivalence classes” (OR symbols). In comparison, our algorithm learns the two kinds of symbols simultaneously in a more unified manner. The inside-outside algorithm =-=[13, 14]-=-, one of the earliest algorithms for learning PCFG, assumes a fixed, usually fully connected grammar structure and tries to maximize the likelihood, making it very likely to overfit the training corpu... |

38 | ABL: Alignment-Based Learning
- Zaanen
(Show Context)
Citation Context ...s − x∈A ⎛ ⎜ + ⎝ ∑ p∈EC-row ⎛ + α ⎝4 ∑ x∈A,y∈B y∈B r ′ p log r ′ p + ∑ q∈EC-col ∑ x∈A,y∈B c ′ q log c ′ q − s ′ log s ′ − ⎞ axy log axy ∑ p∈EC-row q∈EC-col ⎞ ⎠ a ′ pq log a ′ pq axy − 2|A| − 2|B| − 8⎠ =-=(2)-=- ⎞ ⎟ ⎠where LP G(BC) denotes the logarithmic posterior gain resulting from extraction of the bicluster BC; α is a parameter in the prior that specifies how much the prior favors compact grammars, and... |

31 |
An all-subtrees approach to unsupervised parsing
- Bod
(Show Context)
Citation Context ...there is no structure search, the prior used tends to concentrate the probability mass on a small number of rules, thereby biasing the learning in favor of compact grammars. Some unsupervised methods =-=[15, 16]-=- for learning grammatical structures other than CFG with the goal of parsing natural language sentences also employ some techniques similar to those used in CFG learning. 6.2 Summary and Future Work W... |

25 | Variational Bayesian grammar induction for natural language
- Kurihara, Sato
- 2006
(Show Context)
Citation Context ...overfit the training corpus. Subsequent work has adopted the Bayesian framework to maximize the posterior of the learned grammar given the corpus [6, 7], and has incorporated grammar structure search =-=[6, 8]-=-. Our choice of prior over the set of candidate grammars is inspired by [6]. However, compared with the approach used in [6], PCFG-BCL adds more grammar rules at each step without sacrificing complete... |

23 | An application of the variational bayesian approach to probabilistic contextfree grammars
- Kurihara, Sato
- 2004
(Show Context)
Citation Context ... to maximize the likelihood, making it very likely to overfit the training corpus. Subsequent work has adopted the Bayesian framework to maximize the posterior of the learned grammar given the corpus =-=[6, 7]-=-, and has incorporated grammar structure search [6, 8]. Our choice of prior over the set of candidate grammars is inspired by [6]. However, compared with the approach used in [6], PCFG-BCL adds more g... |

18 |
Towards high speed grammar induction on large text corpora
- Adriaans, Trautwein, et al.
(Show Context)
Citation Context ...ly in the presence of ambiguity, e.g., when a symbol can be reduced to different nonterminals in different contexts, or when a context can contain symbols of different nonterminals, as illustrated in =-=[1]-=-. PCFG-BCL can be understood within a Bayesian structure search framework. Specifically, it uses a greedy approach to adding rules to a partially constructed grammar, choosing at each step a set of ru... |

7 | Learning deterministic context free grammars: The Omphalos competition
- Clark
- 2007
(Show Context)
Citation Context ... In contrast, PCFG-BCL performs iterative biclustering that finds both kinds of rules. ABL [2] employs the substitutability heuristic to group possible constituents to nonterminals. Clark’s algorithm =-=[4]-=- uses the “substitution-graph” heuristic or distributional clustering [3] to induce new nonterminals and rules. These techniques could be less robust than the biclustering method, especially in the pr... |

1 |
ftp://ftp.icsi.berkeley.edu/pub/ai/stolcke/software/boogie .shar.z
- Stolcke
- 1993
(Show Context)
Citation Context ...we again assigned uniform distributions to the rules of the learned CFG in order to evaluate them. Grammar Name Size (in CNF) Recursion Source Num-agr 19 Terminals, 15 Nonterminals, 30 Rules No Boogie=-=[12]-=- Langley1 9 Terminals, 9 Nonterminals, 18 Rules Yes Boogie[12] Langley2 8 Terminals, 9 Nonterminals, 14 Rules Yes Boogie[12] Emile2k 29 Terminals, 15 Nonterminals, 42 Rules Yes EMILE[1] TA1 47 Termina... |