Results 1 -
9 of
9
Automatic Rule Induction for Unknown Word Guessing
- Computational Linguistics
, 1997
"... Words unknown to the lexicon present a substantial problem to NLP modules that rely on mor-phosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown ..."
Abstract
-
Cited by 104 (6 self)
- Add to MetaCart
Words unknown to the lexicon present a substantial problem to NLP modules that rely on mor-phosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose lexicon and word frequencies collected from a raw corpus. Three complimentary sets of word-guessing rules are statistically induced: prefix morphological rules, suffix morpho-logical rules and ending-guessing rules. Using the proposed technique, unknown-word-guessing rule sets were induced and integrated into a stochastic tagger and a rule-based tagger, which were then applied to texts with unknown words. 1.
Classifier Combination for Improved Lexical Disambiguation
, 1998
"... One of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. In this paper, we first show that the errors made from three differ ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
One of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. In this paper, we first show that the errors made from three different state of the art part of speech taggers are strongly complementary. Next, we show how this complementary behavior can be used to our advantage. By using contextual cues to guide tagger combination, we are able to derive a new tagger that achieves performance significantly greater than any of the individual taggers.
Unsupervised Learning of Word-Category Guessing Rules
, 1996
"... Words unknown to the lexicon present a substantial problem to part-of-speech tagging. ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Words unknown to the lexicon present a substantial problem to part-of-speech tagging.
POS Tagging Using Relaxation Labelling
- PROCEEDINGS OF 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS, COLING
, 1996
"... Relaxation labelling is an optimization technique used in many fields to solve constraint satisfaction problems. The algorithm finds a combination of values for a set of variables such that satisfies -- to the maximum possible degree -- a set of given constraints. This pat)er scribes some experiment ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Relaxation labelling is an optimization technique used in many fields to solve constraint satisfaction problems. The algorithm finds a combination of values for a set of variables such that satisfies -- to the maximum possible degree -- a set of given constraints. This pat)er scribes some experiments performed applying it to POS tagging, and the results obtained. it also ponders the possibility of applying it, to Word Sense Disambiguation.
Resolving Part-of-Speech Ambiguity in the Greek Language Using Learning Techniques
- In "Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI
, 1999
"... This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different t ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on “management succession events ” and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%.
Part Of Speech Tagging Using A Hybrid System
"... A procedure is proposed for tagging part of speech using a hybrid system that consists of a statistical based rule finder and a genetic algorithm which decides how to use those rules. This procedure will try to improve upon an already very good method of part of speech tagging. 1 ..."
Abstract
- Add to MetaCart
A procedure is proposed for tagging part of speech using a hybrid system that consists of a statistical based rule finder and a genetic algorithm which decides how to use those rules. This procedure will try to improve upon an already very good method of part of speech tagging. 1
A Constraint Satisfaction Alternative for POS Tagging
, 1996
"... Relaxation labelling is an optimization technique used in many fields to solve constraint satisfaction problems (CSP). The algorithm finds a combination of values for a set of variables such that satisfies-to the maximum possible degree- a set of given constraints. This paper describes some experime ..."
Abstract
- Add to MetaCart
Relaxation labelling is an optimization technique used in many fields to solve constraint satisfaction problems (CSP). The algorithm finds a combination of values for a set of variables such that satisfies-to the maximum possible degree- a set of given constraints. This paper describes some experiments performed applying it to POS tagging and the constraints used.
Kannada Part-Of-Speech Tagging with Probabilistic Classifiers
"... Part-Of-Speech (POS) tagging is defined as the Natural Language Processing (NLP) task in which each word in a sentence is labeled with a tag indicating its appropriate part of speech. Of the entire supervised machine learning classification algorithms, second order Hidden Markov Model (HMM) and Cond ..."
Abstract
- Add to MetaCart
Part-Of-Speech (POS) tagging is defined as the Natural Language Processing (NLP) task in which each word in a sentence is labeled with a tag indicating its appropriate part of speech. Of the entire supervised machine learning classification algorithms, second order Hidden Markov Model (HMM) and Conditional Random Fields (CRF) is chosen in this work for POS tagging of Kannada language. Training data includes 51,269 words and test data consists of around 2932 tokens. Both set being disjoint and taken from EMILLE corpus. Experiments show that the accuracy of the tools based on HMM and CRF is 79.9 % and 84.58 % respectively.
A Maximum Entropy Approach to Kannada Part Of Speech Tagging
"... Part Of Speech (POS) tagging is the most important preprocessing step in almost all Natural Language Processing (NLP) applications. It is defined as the process of classifying each word in a text with its appropriate part of speech. In this paper, the probabilistic classifier technique of Maximum En ..."
Abstract
- Add to MetaCart
Part Of Speech (POS) tagging is the most important preprocessing step in almost all Natural Language Processing (NLP) applications. It is defined as the process of classifying each word in a text with its appropriate part of speech. In this paper, the probabilistic classifier technique of Maximum Entropy model is experimented for the tagging of Kannada sentences. Kannada language is agglutinative, morphologically very rich but resource poor. Hence 51267 words from EMILLE corpus were manually tagged and used as training data. The tagset included 25 tags as defined for Indian languages. The best suited feature set for the language was finalised after rigorous experiments. Data size of 2892 word forms was downloaded from Kannada websites for testing. Accuracy of 81.6 % was obtained in the experiments which prove that Maximum Entropy is well suited for Kannada language.

