Results 1 - 10
of
128
TnT - A Statistical Part-Of-Speech Tagger
, 2000
"... Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even sh ..."
Abstract
-
Cited by 293 (3 self)
- Add to MetaCart
Trigrams'n'Tags (TnT) is an efficient statistical part-of-speech tagger. Contrary to claims found elsewhere in the literature, we argue that a tagger based on Markov models performs at least as well as other current approaches, including the Maximum Entropy framework. A recent comparison has even shown that TnT performs significantly better for the tested corpora. We describe the basic model of TnT, the techniques used for smoothing and for handling unknown words. Furthermore, we present evaluations on two corpora.
CoNLL-X shared task on multilingual dependency parsing
- In Proc. of CoNLL
, 2006
"... Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. ..."
Abstract
-
Cited by 161 (2 self)
- Add to MetaCart
Each year the Conference on Computational Natural Language Learning (CoNLL) 1 features a shared task, in which participants train and test their systems on exactly the same data sets, in order to better compare systems. The tenth CoNLL (CoNLL-X) saw a shared task on Multilingual Dependency Parsing. In this paper, we describe how treebanks for 13 languages were converted into the same dependency format and how parsing performance was measured. We also give an overview of the parsing approaches that participants took and the results that they achieved. Finally, we try to draw general conclusions about multi-lingual parsing: What makes a particular language, treebank or annotation scheme easier or harder to parse and which phenomena are challenging for any dependency parser? Acknowledgement Many thanks to Amit Dubey and Yuval Krymolowski, the other two organizers of the shared task, for discussions, converting treebanks, writing software and helping with the papers. 2
Forgetting Exceptions is Harmful in Language Learning
- MACHINE LEARNING, SPECIAL ISSUE ON NATURAL LANGUAGE LEARNING
, 1999
"... We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, pa ..."
Abstract
-
Cited by 94 (38 self)
- Add to MetaCart
We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.
Empirical Methods in Information Extraction
- AI magazine
, 1997
"... this article surveys the use of empirical methods for a particular natural language understanding task that is inherently domain-specific. The task is information extraction. Very generally, an information extraction system takes as input an unrestricted text and "summarizes" the text with respect t ..."
Abstract
-
Cited by 92 (7 self)
- Add to MetaCart
this article surveys the use of empirical methods for a particular natural language understanding task that is inherently domain-specific. The task is information extraction. Very generally, an information extraction system takes as input an unrestricted text and "summarizes" the text with respect to a prespecified topic or domain of interest: it finds useful information about the domain and encodes that information in a structured form, suitable for populating databases. In contrast to in-depth natural language understanding tasks, information extraction systems effectively skim a text to find relevant sections and then focus only on these sections in subsequent processing. The information extraction system in Figure 1, for example, summarizes stories about natural disasters, extracting for each such event the type of disaster, the date and time that it occurred, and data on any property damage or human injury caused by the event. Infor
Classifier Combination for Improved Lexical Disambiguation
, 1998
"... One of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. In this paper, we first show that the errors made from three differ ..."
Abstract
-
Cited by 82 (1 self)
- Add to MetaCart
One of the most exciting recent directions in machine learning is the discovery that the combination of multiple classifiers often results in significantly better performance than what can be achieved with a single classifier. In this paper, we first show that the errors made from three different state of the art part of speech taggers are strongly complementary. Next, we show how this complementary behavior can be used to our advantage. By using contextual cues to guide tagger combination, we are able to derive a new tagger that achieves performance significantly greater than any of the individual taggers.
Memory-Based Shallow Parsing
- In Proceedings of CoNLL
, 1999
"... We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as nemory-based modules. The experiments reported in this paper show competitive results, the Fa= for the Wall Street Journal (WSJ) treebank i ..."
Abstract
-
Cited by 66 (13 self)
- Add to MetaCart
We present a memory-based learning (MBL) approach to shallow parsing in which POS tagging, chunking, and identification of syntactic relations are formulated as nemory-based modules. The experiments reported in this paper show competitive results, the Fa= for the Wall Street Journal (WSJ) treebank is: 93.8% for NP chunking, 94.7% for VP chunking, 77.1% fox' subject detection and 79.0% for object detection.
Improving data driven wordclass tagging by system combination
, 1998
"... In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi-vidua | system. We do this by means of an ex-periment involving the task of morpho-syntactic wordclass tagging. ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best indi-vidua | system. We do this by means of an ex-periment involving the task of morpho-syntactic wordclass tagging. Four well-known tagger gen-erators (Hidden Markov Model, Memory-Based, Transformation Rules and Maximum Entropy)
The Interaction of Knowledge Sources for Word Sense Disambiguation
- Computational Linguistics
, 2001
"... Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Word sense disambiguation (WSD) is a computational linguistics task likely to benefit from the tradition of combining different knowledge sources in artificial in telligence research. An important step in the exploration of this hypothesis is to determine which linguistic knowledge sources are most useful and whether their combination leads to improved results. We present a sense tagger which uses several knowledge sources. Tested accuracy exceeds 94 % on our evaluation corpus. Our system attempts to disambiguate all content words in running text rather than limiting itself to treating a restricted vocabulary of words. It is argued that this approach is more likely to assist the creation of practical systems. 1.
A second-order hidden markov model for part-of-speech tagging
- In Proceedings of the 37th Annual Meeting of the ACL
, 1999
"... This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual info ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
This paper describes an extension to the hidden Markov model for part-of-speech tagging using second-order approximations for both contex-tual and lexical probabilities. This model in-creases the accuracy of the tagger to state of the art levels. These approximations make use of more contextual information than standard statistical systems. New methods of smoothing the estimated probabilities are also introduced to address the sparse data problem. 1
A memory-based approach to learning shallow natural language patterns
, 1998
"... Recognizing shallow linguistic patterns, such as ba-sic syntactic relationships between words, is a com~ mon task in applied natural language and text pro-(:essing. Tile common practice for approaching this task is by tedious manual definition of possible pat-tern structures, often in the h)rm of re ..."
Abstract
-
Cited by 49 (4 self)
- Add to MetaCart
Recognizing shallow linguistic patterns, such as ba-sic syntactic relationships between words, is a com~ mon task in applied natural language and text pro-(:essing. Tile common practice for approaching this task is by tedious manual definition of possible pat-tern structures, often in the h)rm of regular expres-sions or finite automata. This paper presents a novel memory-based learning method that recognizes shal-low patterns in new text based on a bracketed train-ing corpus. The training data are stored as-is, in efficient suttix-tree data structures. Generalization is performed on-line at recognition time by compar-ing subsequences of the new text to positive and negative evidence in the corIms. This way, no in-formation in tit(; training is lost, as can happen in other learning systems that construct a single gen-eralized model at the time of training. The paper presents experimental results for recognizing noun phrase, subject-verb and verb-object patterns in l!]n-glish. Since the learning approach enables easy port-ing to new domains, we plan to apply it to syntac-tic patterns in other languages and to sub-language patterns for information extraction. 1

