Results 1 -
4 of
4
LEARNING POS TAGGING FROM A TAGGED MACEDONIAN TEXT CORPUS
"... This paper presents several new linguistic resources for the Macedonian language, in particular a language corpus consisting of the digitized and annotated Orwell's “1984 ” in the Macedonian translation. The produced resources (morphosyntactic specification, lexicon, and corpus) are compatible ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper presents several new linguistic resources for the Macedonian language, in particular a language corpus consisting of the digitized and annotated Orwell's “1984 ” in the Macedonian translation. The produced resources (morphosyntactic specification, lexicon, and corpus) are compatible
East meets West: Producing Multilingual Resources in a European Context
- First International Language Resources and Evaluation Conference
, 1998
"... Abstract The EU concerted action TELRI has released a two-volume CD-ROM, which contains multilingual language resources, namely corpora, lexica, and tools for language engineering. This CD-ROM provides harmonised resources for unprecedented numbers and kinds of languages, mainly from non-EU countri ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
and tagged novel '1984' by George Orwell and accompanying lexica in seven languages. The paper presents the CD-ROM, the methods employed in its creation and its prospective uses.
Learning to Lemmatise Slovene Words
- Cussens and S. Dzˇeroski, Learning Language in Logic, Number 1925 in Lecture notes in artificial intelligence
, 2000
"... . Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex config ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell's
Learning to Lemmatise Slovene Words
"... Abstract. Automatic lemmatisation is a core application for many language processing tasks. In inflectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inflect for number and case, with a complex co ..."
Abstract
- Add to MetaCart
form given the correct morphosyntactic tag. A statistics-based trigram tagger is used to learn to perform morphosyntactic tagging and a first-order decision list learning system is used to learn rules for morphological analysis. The dataset used is the 90.000 word Slovene translation of Orwell’s ‘1984