Results 1 -
2 of
2
Three Dependency-and-Boundary Models for Grammar Induction
"... We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries — such as English determiners — resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without special knowledge of optimal input sentence lengths or biased, manually-tuned initializers. 1
2012b. Capitalization cues improve dependency grammar induction
- In WILS
"... We show that orthographic cues can be helpful for unsupervised parsing. In the Penn Treebank, transitions between upper- and lowercase tokens tend to align with the boundaries of base (English) noun phrases. Such signals can be used as partial bracketing constraints to train a grammar inducer: in ou ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We show that orthographic cues can be helpful for unsupervised parsing. In the Penn Treebank, transitions between upper- and lowercase tokens tend to align with the boundaries of base (English) noun phrases. Such signals can be used as partial bracketing constraints to train a grammar inducer: in our experiments, directed dependency accuracy increased by 2.2 % (average over 14 languages having case information). Combining capitalization with punctuation-induced constraints in inference further improved parsing performance, attaining state-of-the-art levels for many languages. 1

