## Variational Bayesian grammar induction for natural language (2006)

### Cached

### Download Links

- [mi.cs.titech.ac.jp]
- [sato-www.cs.titech.ac.jp]
- DBLP

### Other Repositories/Bibliography

Venue: | In International Colloquium on Grammatical Inference |

Citations: | 25 - 0 self |

### BibTeX

@INPROCEEDINGS{Kurihara06variationalbayesian,

author = {Kenichi Kurihara and Taisuke Sato},

title = {Variational Bayesian grammar induction for natural language},

booktitle = {In International Colloquium on Grammatical Inference},

year = {2006},

pages = {84--96}

}

### OpenURL

### Abstract

Abstract. This paper presents a new grammar induction algorithm for probabilistic context-free grammars (PCFGs). There is an approach to PCFG induction that is based on parameter estimation. Following this approach, we apply the variational Bayes to PCFGs. The variational Bayes (VB) is an approximation of Bayesian learning. It has been empirically shown that VB is less likely to cause overfitting. Moreover, the free energy of VB has been successfully used in model selection. Our algorithm can be seen as a generalization of PCFG induction algorithms proposed before. In the experiments, we empirically show that induced grammars achieve better parsing results than those of other PCFG induction algorithms. Based on the better parsing results, we give examples of recursive grammatical structures found by the proposed algorithm. 1

### Citations

382 |
The estimation of stochastic context-free grammars using the Inside-Outside algorithm, Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...supports that induced grammars are well-organized, we give the examples of grammatically meaningful structures in induced grammars. 2 Parameter-Estimation-Based Grammar Induction Since Lari and Young =-=[11]-=- empirically showed the possibility of statistical induction of PCFGs using the Inside-Outside algorithm [2], parameter-estimationbased grammar induction has received a great deal of attention. Largel... |

282 | Discriminative reranking for natural language parsing
- Collins
(Show Context)
Citation Context ... parse, r∗ , is given by summing out parameters, � dθ p(r,x|θ)q(θ|u ∗ ). (14) r ∗ =arg max r∈Φ(x) = Note that it is impossible to apply Viterbi-style parsing to Eqn.14. We therefore exploit reranking =-=[5]-=-. First, 10 Viterbi parses with ˆ θ are collected, then the most likely derivation is chosen by calculating Eqn.14 of each derivation. 5.1 Comparison with Other Grammar Induction We compared our algor... |

278 |
Trainable grammars for speech recognition
- Baker
- 1979
(Show Context)
Citation Context ...edures. One is to estimate parameters, and the other is to choose a better grammatical structure based on a criterion. In previous work, parameter estimation is done with the Inside-Outside algorithm =-=[2]-=-, which is an EM algorithm for PCFGs, and grammars are chosen by an approximate Bayesian posterior probability [16, 4]. Our algorithm cans2 be seen as a Bayesian extension of parameter-estimation-base... |

278 | Inside-outside reestimation from partially bracketed corpora
- Pereira, Schabes
- 1992
(Show Context)
Citation Context ...g grammar induction. There is one practical approach to induce context-free grammars in natural language processing, which exploits parameter estimation of probabilistic context-free grammars (PCFGs) =-=[13, 15, 4, 7]-=-. Although this approach is not optimal, the empirical results showed good performance, e.g. over 90 % bracketing accuracy on the Wall Street Journal [15, 7]. They also utilize bracketed sentences. Br... |

199 | A Variational Bayesian Framework for Graphical Models
- Attias
- 2000
(Show Context)
Citation Context ...orithm proposed by Hogenhout and Matsumoto [7] and the generalization of Bayesian induction studied by Stolcke and Omohundro [16], Chen [4]. 3 Variational Bayesian Learning The variational Bayes (VB) =-=[1, 6]-=- has succeeded in many applications [12, 17]. It is empirically shown that VB is less likely to cause overfitting than the EM algorithm, and the free energy calculated by VB can be exploited as a crit... |

178 | Corpus-based induction of syntactic structure: models of dependency and constituency
- Klein, Manning
- 2004
(Show Context)
Citation Context ...onstituents. One may criticize using brackets because making brackets have been expensive in terms of time and cost. However, unsupervised induction algorithms have been proposed to annotate brackets =-=[8, 9]-=-. This paper presents a variational Bayesian PCFG induction algorithm. Parameterestimation-based induction has mainly two procedures. One is to estimate parameters, and the other is to choose a better... |

155 | Variational inference for Bayesian mixture of factor analysers
- Ghahramani, Beal
- 2000
(Show Context)
Citation Context ...orithm proposed by Hogenhout and Matsumoto [7] and the generalization of Bayesian induction studied by Stolcke and Omohundro [16], Chen [4]. 3 Variational Bayesian Learning The variational Bayes (VB) =-=[1, 6]-=- has succeeded in many applications [12, 17]. It is empirically shown that VB is less likely to cause overfitting than the EM algorithm, and the free energy calculated by VB can be exploited as a crit... |

134 | Inducing probabilistic grammars by Bayesian model merging
- Stolcke, Omohundro
- 1994
(Show Context)
Citation Context ...ion. In previous work, parameter estimation is done with the Inside-Outside algorithm [2], which is an EM algorithm for PCFGs, and grammars are chosen by an approximate Bayesian posterior probability =-=[16, 4]-=-. Our algorithm cans2 be seen as a Bayesian extension of parameter-estimation-based grammar induction. Moreover, our criterion to choose a grammar generalizes the approximate Bayesian posterior. We ex... |

97 | A generative constituent-context model for improved grammar induction - Klein, Manning - 2002 |

83 | Ensemble learning for hidden Markov models
- MacKay
- 1997
(Show Context)
Citation Context ... [7] and the generalization of Bayesian induction studied by Stolcke and Omohundro [16], Chen [4]. 3 Variational Bayesian Learning The variational Bayes (VB) [1, 6] has succeeded in many applications =-=[12, 17]-=-. It is empirically shown that VB is less likely to cause overfitting than the EM algorithm, and the free energy calculated by VB can be exploited as a criterion of model selection 1 . 1 The minimum d... |

54 | Bayesian grammar induction for language modeling
- Chen
- 1995
(Show Context)
Citation Context ...g grammar induction. There is one practical approach to induce context-free grammars in natural language processing, which exploits parameter estimation of probabilistic context-free grammars (PCFGs) =-=[13, 15, 4, 7]-=-. Although this approach is not optimal, the empirical results showed good performance, e.g. over 90 % bracketing accuracy on the Wall Street Journal [15, 7]. They also utilize bracketed sentences. Br... |

43 |
Parsing the Wall Street Journal with the Inside-Outside Algorithm
- Schabes
- 1993
(Show Context)
Citation Context ...g grammar induction. There is one practical approach to induce context-free grammars in natural language processing, which exploits parameter estimation of probabilistic context-free grammars (PCFGs) =-=[13, 15, 4, 7]-=-. Although this approach is not optimal, the empirical results showed good performance, e.g. over 90 % bracketing accuracy on the Wall Street Journal [15, 7]. They also utilize bracketed sentences. Br... |

42 |
On-line model selection based on the variational Bayes
- Sato
- 2001
(Show Context)
Citation Context ...test corpora. However, this might be untractable when test corpora are very large. On-line learning is another approach. It is straightforward to apply on-line learning based on the variational Bayes =-=[14]-=-. This would turn the proposed algorithm into an efficient incremental semi-supervised induction algorithm. 6 Discussion: Induced Grammars So far, we have shown the parsing results of our grammar indu... |

23 | An application of the variational bayesian approach to probabilistic contextfree grammars
- Kurihara, Sato
- 2004
(Show Context)
Citation Context ...gorithm 4.1 Variational Bayes for PCFGs In our previous work, we proposed a VB algorithm for PCFGs, and empirically showed that VB is less likely to cause overfitting than the Inside-Outside algorithm=-=[10]-=-. In this section, we briefly explain the VB for PCFGs, then derive the free energy as a criterion to search for a grammatical structure. Let G =(VN ,VT ,R,S,θ) beaPCFGwhereVN , VT , R and θ are the s... |

12 | Refining the structure of a stochastic context-free grammar
- Bockhorst, Craven
- 2001
(Show Context)
Citation Context ... as Chomsky’s “the poverty of the stimulus” says. Nonetheless, applications have already been proposed. For example, van Zaanen [18] applied grammar induction to build treebanks. Bockhorst and Craven =-=[3]-=- improved models of RNA sequences using grammar induction. There is one practical approach to induce context-free grammars in natural language processing, which exploits parameter estimation of probab... |

2 | and Zoubin Ghahramani. Bayesian model search for mixture models based on optimizing variational bounds - Ueda - 2002 |

1 |
A fast method for statistical grammar induction
- Hogenhout, Matsumoto
- 1998
(Show Context)
Citation Context |

1 |
Abl: Alighment-based learning
- Zaanen
- 2000
(Show Context)
Citation Context ...tion is one of the most challenging tasks in natural language processing as Chomsky’s “the poverty of the stimulus” says. Nonetheless, applications have already been proposed. For example, van Zaanen =-=[18]-=- applied grammar induction to build treebanks. Bockhorst and Craven [3] improved models of RNA sequences using grammar induction. There is one practical approach to induce context-free grammars in nat... |