@MISC{_unsuperviseddiscovery, author = {}, title = {Unsupervised Discovery of a Statistical Verb Lexicon}, year = {} }
Share
OpenURL
Abstract
This paper demonstrates how unsupervised techniques can be used to learn models of deep linguistic structure. Disambiguating the semantic roles of a verb’s dependents is a necessary natural language understanding task, and the supervised approach, while successful, is inherently limited by the sparsity of observations in the training set. We present an unsupervised method for learning models of verb behavior directly from unannotated text corpora. Our learned lexicons are similar to resources such as VerbNet and PropBank, but include also statistics about the linkings allowed by each verb, and the head words in each role. Our method is based on a structured probabilistic model of the domain, and learning is performed with the EM algorithm. We evaluate performance in two ways. First, we evaluate the learned model as a semantic role labeler, relative to the PropBank annotation, and find that it reduces 30 % of the precision error between an informed baseline and the best supervised system. Second, we inspect the linking models learned for each verb, and show that it learns the correct linking behavior. 1