Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
Cached
Download Links
| Citations: | 1 - 1 self |
BibTeX
@MISC{Spitkovsky_unsuperviseddependency,
author = {Valentin I. Spitkovsky and Angel X. Chang and Hiyan Alshawi and Daniel Jurafsky},
title = {Unsupervised Dependency Parsing without Gold Part-of-Speech Tags},
year = {}
}
OpenURL
Abstract
We show that categories induced by unsupervised word clustering can surpass the performance of gold part-of-speech tags in dependency grammar induction. Unlike classic clustering algorithms, our method allows a word to have different tags in different contexts. In an ablative analysis, we first demonstrate that this context-dependence is crucial to the superior performance of gold tags — requiring a word to always have the same part-ofspeech significantly degrades the performance of manual tags in grammar induction, eliminating the advantage that human annotation has over unsupervised tags. We then introduce a sequence modeling technique that combines the output of a word clustering algorithm with context-colored noise, to allow words to be tagged differently in different contexts. With these new induced tags as input, our state-ofthe-art dependency grammar inducer achieves 59.1 % directed accuracy on Section 23 (all sentences) of the Wall Street Journal (WSJ) corpus — 0.7 % higher than using gold tags. 1







