Results 1 -
1 of
1
Weighting Finite-State Morphological Analyzers using HFST Tools ∗
, 2010
"... In a language with very productive compounding and a rich inflectional system, e.g. Finnish, new words are to a large extent formed by compounding. In order to disambiguate between the possible compound segmentations, a probabilistic strategy has been found effective by Lindén and Pirinen [7]. In th ..."
Abstract
- Add to MetaCart
In a language with very productive compounding and a rich inflectional system, e.g. Finnish, new words are to a large extent formed by compounding. In order to disambiguate between the possible compound segmentations, a probabilistic strategy has been found effective by Lindén and Pirinen [7]. In this article, we present a method for implementing the probabilistic framework as a separate process which can be combined through composition with a lexical transducer to create a weighted morphological analyzer. To implement the analyzer, we use the HFST-LexC and related command line tools which are part of the open source Helsinki Finite-State Technology package. Using Finnish as a test language, we show how to use the weighted finite-state lexicon for building a simple unigram tagger with 97 % precision for Finnish words and word segments belonging to the vocabulary of the lexicon. 1

