Results 1 -
8 of
8
Probabilistic Part-of-Speech Tagging Using Decision Trees
, 1994
"... In this paper, a new probabilistic tagging method is presented which avoids problems that Markov Model based taggers face, when they have to estimate transition probabilities from sparse data. In this tagging method, transition probabilities are estimated using a decision tree. Based on this method, ..."
Abstract
-
Cited by 413 (4 self)
- Add to MetaCart
In this paper, a new probabilistic tagging method is presented which avoids problems that Markov Model based taggers face, when they have to estimate transition probabilities from sparse data. In this tagging method, transition probabilities are estimated using a decision tree. Based on this method, a part-of-speech tagger (called TreeTagger) has been implemented which achieves 96.36 % accuracy on Penn-Treebank data which is better than that of a trigram tagger (96.06 %) on the same data. Keywords: Corpus-based NLP, Statistical NLP, Part-of-Speech Tagging. 1 Introduction Word forms are often ambiguous in their part-of-speech (POS). The English word form store for example can be either a noun, a finite verb or an infinitive. In an utterance, this ambiguity is normally resolved by the context of a word: e.g. in the sentence "The 1977 PCs could store two pages of data.", store can only be an infinitive. The predictability of the part-of-speech from the context is used by automatic part-...
Part-of-Speech Tagging with Neural Networks
, 1994
"... Text corpora which are tagged with part-o[-speech information are useful in many areas of linguistic research. In this paper, a new part-of-speech tagging method based on neural networks (Net-7h.qger) is presented and its performance is compared to that of a 11IvlM-tagger (Cutting ct al., 1992) anti ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
Text corpora which are tagged with part-o[-speech information are useful in many areas of linguistic research. In this paper, a new part-of-speech tagging method based on neural networks (Net-7h.qger) is presented and its performance is compared to that of a 11IvlM-tagger (Cutting ct al., 1992) anti a trigrambased tagger (Kempe, 1993). It is shown that the Net-Tagger performs as well ;m the trigram-based tagger and better than the HMM-tagger.
Tagging accurately - Don't guess if you know
- In Proceedings of ANLP '94
, 1994
"... We discuss combining knowledge-based (or rule-based) and statistical part-of-speech taggers. We use two mature taggers, ENGCG and Xerox Tagger, to independently tag the same text and combine the results to produce a fully disambiguated text. In a 27000 word test sample taken from a previously ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
We discuss combining knowledge-based (or rule-based) and statistical part-of-speech taggers. We use two mature taggers, ENGCG and Xerox Tagger, to independently tag the same text and combine the results to produce a fully disambiguated text. In a 27000 word test sample taken from a previously unseen corpus we achieve 98.5 % accuracy. This paper presents the data in detail. We describe the problems we encountered in the course of combining the two taggers and discuss the problem of evaluating taggers.
CORIS/CODIS: A corpus of written Italian based on a defined and a dynamic model
- A Rainbow of Corpora: Corpus Linguistics and the Languages of the World
, 2001
"... A corpus of written Italian -- CORIS -- has been under construction at the Centre for Theoretical and Applied Linguistics of Bologna University (CILTA) since 1998 and will soon be completed and made available on-line. The project aims at creating a representative and sizeable general reference co ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
A corpus of written Italian -- CORIS -- has been under construction at the Centre for Theoretical and Applied Linguistics of Bologna University (CILTA) since 1998 and will soon be completed and made available on-line. The project aims at creating a representative and sizeable general reference corpus of contemporary Italian designed to be easily accessible and user-friendly.
A Contribution to the Question of Authenticity of Rhesus Using Part-of-Speech Tagging
"... . This paper presents the results of an experiment to decide the question of authenticity of the supposedly spurious Rhesus---a attic tragedy sometimes credited to Euripides. The experiment involves the use of statistics in order to test whether significant deviations in the distribution of word cat ..."
Abstract
- Add to MetaCart
. This paper presents the results of an experiment to decide the question of authenticity of the supposedly spurious Rhesus---a attic tragedy sometimes credited to Euripides. The experiment involves the use of statistics in order to test whether significant deviations in the distribution of word categories between Rhesus and the other works of Euripides can or cannot be found. To count frequencies of word categories in the corpus, a part-of-speech tagger for Greek has been implemented. Some special techniques for reducing the problem of sparse data are used resulting in an accuracy of ca. 96.6%. 1 Introduction 1.1 The Philological Problem In the tradition of ancient Greek texts it sometimes happens that, due to a number of different reasons a text is credited incorrectly to a certain author. It is the---sometimes very difficult---task of classical philology to detect these erroneous assignments and, if possible, to correct them. The methods used for this task and the results achieve...
Tagging a Norwegian Speech Corpus
"... This paper describes work on the grammatical tagging of a newly created Norwegian speech corpus: the first corpus of modern Norwegian speech. We use an iterative procedure to perform computer-aided manual tagging of a part of the corpus. This material is then used to train the final taggers, which a ..."
Abstract
- Add to MetaCart
This paper describes work on the grammatical tagging of a newly created Norwegian speech corpus: the first corpus of modern Norwegian speech. We use an iterative procedure to perform computer-aided manual tagging of a part of the corpus. This material is then used to train the final taggers, which are applied to the rest of the corpus. We experiment with taggers that are based on three different data-driven methods: memory-based learning, decision trees, and hidden Markov models, and find that the decision tree tagger performs best. We also test the effects of removing pauses and/or hesitations from the material before training and applying the taggers. We conclude that these attempts at cleaning up hurt the performance of the taggers, indicating that such material, rather than functioning as noise, actually contributes important information about the grammatical function of the words in their nearest context. 1
Stefan Trausan-Matu, Philippe Dessus (Eds.) Natural Language Processing in Support of Learning: Metrics, Feedback and Connectivity
"... In supporting Lifelong Learning (LLL) on the Social Web (Web2.0), Natural Language Technologies (LT) increasingly play a central role due to the fact that text is the leading medium of communication and collaboration. LT cover now a wide range of topics, including advanced semantic resources and app ..."
Abstract
- Add to MetaCart
In supporting Lifelong Learning (LLL) on the Social Web (Web2.0), Natural Language Technologies (LT) increasingly play a central role due to the fact that text is the leading medium of communication and collaboration. LT cover now a wide range of topics, including advanced semantic resources and applications like ontologies, knowledge extraction, text mining, Natural Language Processing (NLP) and Latent Semantic Analysis (LSA). The peculiarities of Web2.0 impose also the consideration of using LT for social software (social networks analysis) and collaborative interactions on chats and forums. Pragmatics, discourse and conversation analysis are very important analysis domains. For LLL, providing feedback entails measuring differences among learners; between learners and their desired characteristics (e.g., knowledge, competences, motivation, self-regulation processes); or between learners and their looked-for resources (e.g. web-links, articles, courses). Difference measuring often have been performed by computing and analyzing 'distances ' using several techniques like factorial analysis, instance-based learning, clustering, and so on. Corpora on which
unknown title
, 1996
"... 1.1 The philological problem In the tradition of ancient Greek texts it sometimes happens that, due to a number of different reasons a text is credited incorrectly to a certain author. ..."
Abstract
- Add to MetaCart
1.1 The philological problem In the tradition of ancient Greek texts it sometimes happens that, due to a number of different reasons a text is credited incorrectly to a certain author.

