Results 1 - 10
of
13
Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger
- In EMNLP/VLC 2000
, 2000
"... This paper presents results for a maximumentropy -based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitali ..."
Abstract
-
Cited by 60 (4 self)
- Add to MetaCart
This paper presents results for a maximumentropy -based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
Interference in Short-term Memory: The Magical Number Two (or Three) in Sentence Processing
, 1996
"... Many theories have been proposed to explain difficulty with center embedded constructions, most attributing the problem to some kind of limited capacity short-term memory. However, these theories have developed for the most part independently of more traditional memory research, which has focused on ..."
Abstract
-
Cited by 41 (7 self)
- Add to MetaCart
Many theories have been proposed to explain difficulty with center embedded constructions, most attributing the problem to some kind of limited capacity short-term memory. However, these theories have developed for the most part independently of more traditional memory research, which has focused on uncovering general principles such as chunking and interference. This article attempts to gain some unification with this research by suggesting that an interesting range of core sentence processing phenomena can be explained as interference effects in a sharply limited syntactic working memory. These include difficult and acceptable embeddings, as well as certain limitations on ambiguity resolution, length effects in garden path structures, and the requirement for locality in syntactic structure. The theory takes the form of an architecture for parsing which can index no more than two constituents under the same syntactic relation. A limitation of two or three items shows up in a variety o...
Specifying Architectures for Language Processing: Process, Control, and Memory in Parsing and Interpretation
, 1997
"... ing away from irrelevant details is a theoretical virtue, but the kinds of abstractions that module geography makes can lead to incorrect inferences from data. That such a possibility exists is clearly demonstrated by the working memory research of Just & Carpenter (1992). Briefly, Just and Carpente ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
ing away from irrelevant details is a theoretical virtue, but the kinds of abstractions that module geography makes can lead to incorrect inferences from data. That such a possibility exists is clearly demonstrated by the working memory research of Just & Carpenter (1992). Briefly, Just and Carpenter have argued that some garden path effects that were previously interpreted in terms of a syntactically encapsulated module can instead be explained by individual differences in working memory capacity. Such an explanation is not considered in a theoretical framework that systematically ignores the role of memory structures in parsing. This point should be taken regardless of whether one is convinced by the current body of empirical support for this particular model---the fact remains that such an explanation could in principle account for the data, and these alternative explanations are only discovered by developing functionally complete architectures. The next few sections describes what ...
Hybrid Natural Language Generation from Lexical Conceptual Structures
- MACHINE TRANSLATION
, 2003
"... This paper describes Lexogen, a system for generating natural-language sentences from Lexical Conceptual Structure, an interlingual representation. The system has been developed as part of a Chinese--English Machine Translation (MT) system; however, it is designed to be used for many other MT langua ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This paper describes Lexogen, a system for generating natural-language sentences from Lexical Conceptual Structure, an interlingual representation. The system has been developed as part of a Chinese--English Machine Translation (MT) system; however, it is designed to be used for many other MT language pairs and natural language applications. The contributions of this work include: (1) development of a large-scale Hybrid Natural Language Generation system with language-independent components; (2) enhancements to an interlingual representation and associated algorithm for generation from ambiguous input; (3) development of an efficient reusable language-independent linearization module with a grammar description language that can be used with other systems; (4) improvements to an earlier algorithm for hierarchically mapping thematic roles to surface positions; and (5) development of a diagnostic tool for lexicon coverage and correctness and use of the tool for verification of English, Spanish, and Chinese lexicons. An evaluation of Chinese--English translation quality shows comparable performance with a commercial translation system. The generation system can also be extended to other languages and this is demonstrated and evaluated for Spanish.
Efficient Language Independent Generation from Lexical Conceptual Structures
- MACHINE TRANSLATION
, 2002
"... This paper describes a system for generating natural-language sentences from an interlingual representation, Lexical Conceptual Structure (LCS). The system has been developed as part of a Chinese-English Machine Translation system; however, it is designed to be used for many other MT language pairs ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
This paper describes a system for generating natural-language sentences from an interlingual representation, Lexical Conceptual Structure (LCS). The system has been developed as part of a Chinese-English Machine Translation system; however, it is designed to be used for many other MT language pairs and Natural Language applications. The contributions of this work include: (1) Development of a language-independent generation system that maximizes efficiency through the use of a hybrid rule-based/statistical module; (2) Enhancements to an interlingual representation and associated algorithms for interpretation of multiply ambiguous input sentences; (3) Development of an efficient reusable language-independent linearization module with a grammar description language that can be used with other systems; (4) Improvements to an earlier algorithm for hierarchically mapping thematic roles to surface positions; (5) Development of a diagnostic tool for lexicon coverage and correctness and use of the tool for verification of English, Spanish, and Chinese lexicons. An evaluation of translation quality shows comparable performance with a commercial translation system. The generation system can also be straightforwardly extended to other languages and this is demonstrated and evaluated for Spanish.
A Theory of Grammatical But Unacceptable Embeddings
, 1996
"... What precisely is the universal nature of the human syntactic parser, such that it copes easily with some embedded structures, yet fails so dramatically on others (e.g., classic double center-embeddings)? A theory is proposed in the form of an architecture for parsing based on two simple ideas. The ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
What precisely is the universal nature of the human syntactic parser, such that it copes easily with some embedded structures, yet fails so dramatically on others (e.g., classic double center-embeddings)? A theory is proposed in the form of an architecture for parsing based on two simple ideas. The first is that human short-term memory is an indexing structure which can give rise to interference effects (storage limitations) when contents overlap with respect to the indices. For parsing, the contents are syntactic structures, and the indices are potential structural relations. The second idea is that the capacity of STM is the minimum capacity required to support the basic functions of parsing. The theory successfully accounts for the contrasts between over 50 difficult and acceptable constructions from English, French, German, Hebrew, Japanese, Mandarin, and Spanish. The theory has independent psychological and computational motivation, and is a functional part of a broader cognitive ...
On the Proper Treatment of Tense
- SALT 5
, 1995
"... This paper is mainly concerned with tense in embedded constructions. I believe that recent research -- notably the work by Ogihara (1989) and Abusch (1993) -- has contributed much to our better understanding of its semantics. The proposals made by the two authors are, however, still too simplistic i ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
This paper is mainly concerned with tense in embedded constructions. I believe that recent research -- notably the work by Ogihara (1989) and Abusch (1993) -- has contributed much to our better understanding of its semantics. The proposals made by the two authors are, however, still too simplistic in some regards. Among other things, they neglect the interplay of tense with temporal adverbs of quantification and with frame-setters. To get this composition right is a touchstone for every theory of tense and tense semanticists have been concerned with this problem from the beginning on, as witnessed by the analyses in Kratzer (1978), Bäuerle (1979), Dowty (1979/1982), to mention a few. The claim I want to stress in this article is that in complements of attitudes, we can never have a "referential" tense, i.e., an absolute or anaphorical tense. Every tense occurring there will turn out to be a bound tense. I think this claim is implicit in Ogihara's (1989) analysis, and it is made explicit in Abusch's (1993) approach. The composition of bound tense with the two kinds of adverbs mentioned will require rather elaborate techniques and I am not sure whether I have been entirely successful, but I hope that the solution is basically correct.
Syntactic information hiding in plain text
- Master’s thesis
, 2001
"... I declare that this dissertation has not been submitted as an exercise for a degree at this or any other university and that is entirely my own work. I agree that the Library may lend or copy this dissertation on request. ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
I declare that this dissertation has not been submitted as an exercise for a degree at this or any other university and that is entirely my own work. I agree that the Library may lend or copy this dissertation on request.
Large Scale Language Independent Generation Using Thematic Hierarchies
- in MT Summit VIII: Machine Translation in the Information Age, Santiago de
, 2001
"... This paper describes a large-scale languageindependent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reflecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that ut ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper describes a large-scale languageindependent evaluation of the use of Thematic Hierarchies in natural language generation. We translate from a corpus of sentences reflecting the full variety of behavior of Levin-based verb classes. The corpus is used as input to a generation system that utilizes the same thematic hierarchy for realizing relative argument surface positions in two languages: English and Spanish. The output was manually evaluated by English and Spanish speakers. The contributions of this work include: (1) an improved thematic hierarchy over an earlier implementation; (2) a large-scale evaluation of the use of thematic hierarchies in two languages; (3) an implementation of a language independent module for natural language generation; and (4) the creation of a single tool for incremental development of multilingual lexicons.

