Paramor: From Paradigm Structure to Natural Language Morphology Induction (2008)
| Citations: | 4 - 0 self |
BibTeX
@TECHREPORT{Monson08paramor:from,
author = {Christian Monson and Lori Levin},
title = {Paramor: From Paradigm Structure to Natural Language Morphology Induction},
institution = {},
year = {2008}
}
OpenURL
Abstract
Most of the world’s natural languages have complex morphology. But the expense of building morphological analyzers by hand has prevented the development of morphological analysis systems for the large majority of languages. Unsupervised induction techniques, that learn from unannotated text data, can facilitate the development of computational morphology systems for new languages. Such unsupervised morphological analysis systems have been shown to help natural language processing tasks including speech recognition (Creutz, 2006) and information retrieval (Kurimo and Turunen, 2008). This thesis describes ParaMor, an unsupervised induction algorithm for learning morphological paradigms from large collections of words in any natural language. Paradigms are sets of mutually substitutable morphological operations that organize the inflectional morphology of natural languages. ParaMor focuses on the most common morphological process, suffixation. ParaMor learns paradigms in a three-step algorithm. First, a recall-centric search scours a space of candidate partial paradigms for those which possibly model suffixes of true paradigms. Second, ParaMor merges selected candidates that appear to model portions







