Results 1 - 10
of
45
MBT: A Memory-Based Part of Speech Tagger-Generator
- PROC. OF FOURTH WORKSHOP ON VERY LARGE CORPORA
, 1996
"... We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approac ..."
Abstract
-
Cited by 168 (47 self)
- Add to MetaCart
We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, ad with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed.
Generalization Performance Of Backpropagation Learning On A Syllabification Task
- ENSCHEDE. TWENTE UNIVERSITY
, 1992
"... We investigated the generalization capabilities of backpropagation learning in feed-forward and recurrent feed-forward connectionist networks on the assignment of syllable boundaries to orthographic representations in Dutch (hyphenation). This is a difficult task because phonological and morphologic ..."
Abstract
-
Cited by 58 (36 self)
- Add to MetaCart
We investigated the generalization capabilities of backpropagation learning in feed-forward and recurrent feed-forward connectionist networks on the assignment of syllable boundaries to orthographic representations in Dutch (hyphenation). This is a difficult task because phonological and morphological constraints interact, leading to ambiguity in the input patterns. We compared the results to different symbolic pattern matching approaches, and to an exemplar-based generalization scheme, related to a k-nearest neighbour approach, but using a similarity metric weighed by the relative information entropy of positions in the training patterns. Our results indicate that the generalization performance of backpropagation learning for this task is not better than that of the best symbolic pattern matching approaches, and of exemplar-based generalization.
Memory-Based Lexical Acquisition and Processing
- MACHINE TRANSLATION AND THE LEXICON
, 1995
"... Current approaches to computational lexicology in language technology are knowledge-based (competence-oriented) and try to abstract away from specific formalisms, domains, and applications. This results in severe complexity, acquisition and reusability bottlenecks. As an alternative, we propose a pa ..."
Abstract
-
Cited by 47 (23 self)
- Add to MetaCart
Current approaches to computational lexicology in language technology are knowledge-based (competence-oriented) and try to abstract away from specific formalisms, domains, and applications. This results in severe complexity, acquisition and reusability bottlenecks. As an alternative, we propose a particular performance-oriented approach to Natural Language Processing based on automatic memory-based learning of linguistic (lexical) tasks. The consequences of the approach for computational lexicology are discussed, and the application of the approach on a number of lexical acquisition and disambiguation tasks in phonology, morphology and syntax is described.
The acquisition of stress: a data-oriented approach
- COMPUTATIONAL LINGUISTICS
, 1994
"... A data-oriented (empiricist) alternative to the currently pervasive (nativist) Principles and Pa-rameters approach to the acquisition of stress assignment is investigated. A similarity-based algorithm, viz. an augmented version of Instance-Based Learning is used to learn the system of main stress as ..."
Abstract
-
Cited by 47 (20 self)
- Add to MetaCart
A data-oriented (empiricist) alternative to the currently pervasive (nativist) Principles and Pa-rameters approach to the acquisition of stress assignment is investigated. A similarity-based algorithm, viz. an augmented version of Instance-Based Learning is used to learn the system of main stress assignment in Dutch. In this nontrivial task a comprehensive lexicon of Dutch monomorphemes is used instead of the idealized and highly simplified description of the empirical data used in previous approaches. It is demonstrated that a similarity-based learning method is effective in learning the complex stress system of Dutch. The task is accomplished without the a priori knowledge assumed to pre-exist in the learner in a Principles and Parameters framework. A comparison of the system's behavior with a consensus linguistic analysis (in the framework of Metrical Phonology) shows that ease of learning correlates with decreasing degrees of marked-ness of metrical phenomena. It is also shown that the learning algorithm captures subregularities within the stress system of Dutch that cannot be described without going beyond some of the theoretical assumptions of metrical phonology.
Fast NP Chunking Using Memory-Based Learning Techniques
- In Proceedings of BENELEARN'98
, 1998
"... In this paper we discuss the application of Memory-Based Learning (MBL) to fast NP chunking. We first discuss the application of a fast decision tree variant of MBL (IGTree) on the dataset described in (Ramshaw and Marcus, 1995), which consists of roughly 50,000 test and 200,000 train items. In a se ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
In this paper we discuss the application of Memory-Based Learning (MBL) to fast NP chunking. We first discuss the application of a fast decision tree variant of MBL (IGTree) on the dataset described in (Ramshaw and Marcus, 1995), which consists of roughly 50,000 test and 200,000 train items. In a second series of experiments we used an architecture of two cascaded IGTrees. In the second level of this cascaded classifier we added context predictions as extra features so that incorrect predictions from the first level can be corrected, yielding a 97.2% generalisation accuracy with training and testing times in the order of seconds to minutes. Submission Type: regular paper Topic Areas: robust parsing, NP chunking, memory-based learning Author of Record: Jorn Veenstra Under consideration for other conferences (specify)? no Fast NP Chunking Using Memory-Based Learning Techniques Abstract In this paper we discuss the application of Memory-Based Learning (MBL) to fast NP chunking. We fir...
Islands of Reliability for Regular Morphology: Evidence from Italian
- Language
, 2002
"... The representation of regular morphological processes has been the subject of much controversy, particularly in the debate between single and dual route models of morphology. ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
The representation of regular morphological processes has been the subject of much controversy, particularly in the debate between single and dual route models of morphology.
Non-Hybrid Example-Based Machine Translation Architectures
- Proceedings of TMI-92. Montreal
, 1992
"... A general definition of rationalist and empiricist natural language processing is attempted. A classification of empiricist machine translation systems is given based on the rationalist/empiricist distinction. Examples of approaches falling into the two different strategies are discussed. Research r ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
A general definition of rationalist and empiricist natural language processing is attempted. A classification of empiricist machine translation systems is given based on the rationalist/empiricist distinction. Examples of approaches falling into the two different strategies are discussed. Research results are reported from attempts to break new ground in what is referred to as "pure " or non-hybrid example-based machine translation.
Word-based morphology
, 2006
"... This paper examines two contrasting perspectives on morphological analysis, and considers inflectional patterns that bear on the choice between these alternatives. On what is termed an ABSTRACTIVE perspective, surface word forms are regarded as basic morphotactic units of a grammatical system, with ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This paper examines two contrasting perspectives on morphological analysis, and considers inflectional patterns that bear on the choice between these alternatives. On what is termed an ABSTRACTIVE perspective, surface word forms are regarded as basic morphotactic units of a grammatical system, with roots, stems and exponents treated as abstractions over a lexicon of word forms. This traditional standpoint is contrasted with the more CONSTRUCTIVE perspective of post-Bloomfieldian models, in which surface word forms are ‘built’ from sub-word units. Part of the interest of this contrast is that it cuts across conventional divisions of morphological models. Thus, realization-based models are morphosyntactically ‘word-based’ in the sense that they regard words as the minimal meaningful units of a grammatical system. Yet morphotactically, these models tend to adopt a constructive ‘root-based’ or ‘stem-based’ perspective. An examination of some form-class patterns in Saami, Estonian and Georgian highlights advantages of an abstractive model, and suggests that these advantages derive from the fact that sets of words often predict other word forms and determine a morphotactic analysis of their parts, whereas sets of sub-word units are

