## Language Modeling with Tree Substitution Grammars

Citations: | 6 - 1 self |

### BibTeX

@MISC{Post_languagemodeling,

author = {Matt Post and Daniel Gildea},

title = {Language Modeling with Tree Substitution Grammars},

year = {}

}

### OpenURL

### Abstract

We show that a tree substitution grammar (TSG) induced with a collapsed Gibbs sampler results in lower perplexity on test data than both a standard context-free grammar and other heuristically trained TSGs, suggesting that it is better suited to language modeling. Training a more complicated bilexical parsing model across TSG derivations shows further (though nuanced) improvement. We conduct analysis and point to future areas of research using TSGs as language models. 1

### Citations

1023 | Head-driven Statistical Models for Natural Language Parsing
- Collins
- 1999
(Show Context)
Citation Context ...ures behind the text. In this section, we explore this question, using the same grammars from the previous section to train a bilexicalized, Markovized parser. This model is based on Collins Model 1 (=-=Collins, 1999-=-) and is similar to Charniak’s bihead model (Charniak, 2001). 5 The generative model proceeds as follows: given nonterminal P (initially the top-level symbol), we 1. generate the head word and tag (h,... |

301 | Learning accurate, compact, and interpretable tree annotation
- Petrov, Barrett, et al.
- 2006
(Show Context)
Citation Context ...ur vocabulary the set of 23,767 case-sensitive tokens appearing more than once in the training data. All other tokens were converted to a set of eighty unknown word classes based on surface features (=-=Petrov et al., 2006-=-). Trace nodes and node annotations (e.g., temporal, subject, and locative markers) were removed from all of our data. Underlying our experiments are three grammars: 1rules used grammar size F1 perpl... |

134 | A smorgasbord of features for statistical machine translation
- Och, Gildea, et al.
- 2004
(Show Context)
Citation Context .... Table 2 contains results of an additional experiment. A number of research groups have shown that PCFGs are not very helpful in improving BLEU scores for machine translation (Charniak et al., 2003; =-=Och et al., 2004-=-; Post and Gildea, 2008). Furthermore, they do not even appear to be very useful in distinguishing grammatical from ungrammatical text. Cherry and Quirk (2008) used model scores produced by a maximum-... |

128 | Exploiting Syntactic Structure for Language Modeling
- Chelba, Jelinek
- 1998
(Show Context)
Citation Context ...ting that perplexity scores for all the grammars are well above the ngram baselines. This is in contrast to previous work on syntax-based language modeling which has improved upon a trigram baseline (=-=Chelba and Jelinek, 1998-=-; Roark, 2001; Charniak, 2001). It is difficult to compare directly to this body of work: the vocabularies used were much smaller (10K), punctuation was removed and numbers were collapsed to a single ... |

93 | Efcient parsing for bilexical context-free grammars and head automaton grammars
- Eisner, Satta
- 1999
(Show Context)
Citation Context ...exity of induced tree substition grammars in the standard CFG model is encouraging, because parsing in that model is cubic in the size of the input, as opposed to being O(n 4 ) for bilexical parsing (=-=Eisner and Satta, 1999-=-). With the perplexity scores of these TSG grammars under the simple parsing model approaching those of the more complicated bilexicalized parsing models, this kind of modeling could be feasible for a... |

54 | A Bayesian Framework for Word Segmentation: Exploring the Effects of Context - Goldwater, Griffiths, et al. - 2009 |

34 | 2009. Bayesian learning of a tree substitution grammar
- Post, Gildea
(Show Context)
Citation Context ...y a number of groups have had success parsing with tree substitution grammars (TSGs) that were induced from the Penn Treebank with collapsed Gibbs samplers in a Bayesian framework (Cohn et al., 2009; =-=Post and Gildea, 2009-=-). Compared to past heuristic approaches, these grammars are compact and intuitive, have a more natural distribution over rule size, and perform well on parsing accuracy relative to the Treebank gramm... |

29 |
Syntax-based language models for machine translation
- Charniak, Knight, et al.
- 2003
(Show Context)
Citation Context ...r for language modeling. Table 2 contains results of an additional experiment. A number of research groups have shown that PCFGs are not very helpful in improving BLEU scores for machine translation (=-=Charniak et al., 2003-=-; Och et al., 2004; Post and Gildea, 2008). Furthermore, they do not even appear to be very useful in distinguishing grammatical from ungrammatical text. Cherry and Quirk (2008) used model scores prod... |

17 | Discriminative, syntactic language modeling through latent svms - Cherry, Quirk - 2008 |

17 | Inducing compact but accurate tree-substitution grammars - Cohn, Goldwater, et al. - 2009 |

8 | Parsers as language models for statistical machine translation
- Post, Gildea
- 2008
(Show Context)
Citation Context ... results of an additional experiment. A number of research groups have shown that PCFGs are not very helpful in improving BLEU scores for machine translation (Charniak et al., 2003; Och et al., 2004; =-=Post and Gildea, 2008-=-). Furthermore, they do not even appear to be very useful in distinguishing grammatical from ungrammatical text. Cherry and Quirk (2008) used model scores produced by a maximum-likelihood estimated pa... |