Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text (2006)
| Citations: | 20 - 7 self |
BibTeX
@TECHREPORT{Smith06novelestimation,
author = {Noah Ashton Smith},
title = {Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text},
institution = {},
year = {2006}
}
Years of Citing Articles
OpenURL
Abstract
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)







