Results 11 - 20
of
73
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Automatic stochastic tagging of natural language texts
- Computational Linguistics
, 1995
"... Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the ele ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
Five language and tagset independent stochastic taggers, handling morphological and contextual information, are presented and tested in corpora of seven European languages (Dutch, English, French, German, Greek, Italian and Spanish), using two sets of grammatical tags; a small set containing the eleven main grammatical classes and a large set of grammatical categories common to all languages. The unknown words are tagged using an experimentally proven stochastic hypothesis that links the stochastic behavior of the unknown words with that of the less probable known words. A fully automatic training and tagging program has been implemented on an IBM PC-compatible 80386-based computer. Measurements of error rate, time response, and memory requirements have shown that the taggers " performance is satisfactory, even though a small training text is available. The error rate is improved when new texts are used to update the stochastic model parameters. 1.
Practical Unification-based Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL parsing, and describes a bottom-up active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Robust Stochastic Parsing Using the Inside-Outside Algorithm
, 1992
"... this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough to be useful and deals effectively with the problem of undergeneration. We focus on the application of these stochastic algorithms here because, although other statistically based approaches have been proposed (e.g. Sampson et al., 1989; Garside & Leech, 1985; Magerman & Marcus, 1991a,b), these appear most promising as they are computationally-tractable (in principle) and well-integrated with formal language / automata theory. The Viterbi algorithm and Baum-Welch algorithm are optimised algorithms (with polynomial computational complexity) which can be used in conjunction with stochastic regular grammars (finite-state automata, i.e. (hidden) markov models, Baum, 1972) and with probabilistic context-free grammars (Baker, 1982; Fujisaki
Ambiguity Resolution In A Reductionistic Parser
, 1993
"... We ae concerned with dependencyoriented morphosyntactic parsing of run- ning text. While a parsing grammar should avoid introducing structurally unresolvable distinctions in order to optimise on the accuracy of the parser, it also is beneficial for the grammaxian to have as expressive a struc ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
We ae concerned with dependencyoriented morphosyntactic parsing of run- ning text. While a parsing grammar should avoid introducing structurally unresolvable distinctions in order to optimise on the accuracy of the parser, it also is beneficial for the grammaxian to have as expressive a structural representation available as possible.
Text Mining: A new frontier for lossless compression
- In Data Compression Conference
, 1999
"... This paper aims to promote text compression as a key technology for text mining ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
This paper aims to promote text compression as a key technology for text mining
Detecting and Correcting Malapropisms with Lexical Chains
, 1995
"... Because chains of semantically related words express semantic continuity, such lexical chains can play an important role in the detection of malapropisms. A malapropism is a correctly spelled word that does not fit in the context where it is used because it is the result of a spelling error on a dif ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Because chains of semantically related words express semantic continuity, such lexical chains can play an important role in the detection of malapropisms. A malapropism is a correctly spelled word that does not fit in the context where it is used because it is the result of a spelling error on a different word that was intended. I first assume that such a word has much less probability of being inserted in any chain with other words. If this assumption is correct, words that failed to be inserted with other words can be considered as potential malapropisms. A mechanism that generates spelling replacements can then be used to generate replacement candidates. The second assumption is that whenever a spelling replacement can be inserted in a chain with other words, this replacement is likely to be the intended word for which a malapropism has been substituted. The algorithm proposed here to detect lexical chains uses the on-line thesaurus WordNet to automatically quantify semantic relatio...
Incremental Syntactic Parsing of Natural Language Corpora with Simple Synchrony Networks
- IEEE Transactions on Knowledge and Data Engineering
, 2001
"... This article explores the use of Simple Synchrony Networks (SSNs) for learning to parse English sentences drawn from a corpus of naturally occurring text. Parsing natural language sentences requires taking a sequence of words and outputting a hierarchical structure representing how those words fi ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
This article explores the use of Simple Synchrony Networks (SSNs) for learning to parse English sentences drawn from a corpus of naturally occurring text. Parsing natural language sentences requires taking a sequence of words and outputting a hierarchical structure representing how those words fit together to form constituents. Feed-forward and Simple Recurrent Networks have had great difficulty with this task, in part because the number of relationships required to specify a structure is too large for the number of unit outputs they have available. SSNs have the representational power to output the necessary O(n 2 ) possible structural relationships, because SSNs extend the O(n) incremental outputs of Simple Recurrent Networks with the O(n) entity outputs provided by Temporal Synchrony Variable Binding. This article presents an incremental representation of constituent structures which allows SSNs to make effective use of both these dimensions. Experiments on learning to ...
A Connectionist Architecture for Learning to Parse
- University of Montreal
, 1998
"... We present a connectionist architecture and demonstrate that it can learn syntactic parsing from a corpus of parsed text. The architecture can represent syntactic constituents, and can learn generalizations over syntactic constituents, thereby addressing the sparse data problems of previous connecti ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
We present a connectionist architecture and demonstrate that it can learn syntactic parsing from a corpus of parsed text. The architecture can represent syntactic constituents, and can learn generalizations over syntactic constituents, thereby addressing the sparse data problems of previous connectionist architectures. We apply these Simple Synchrony Networks to mapping sequences of word tags to parse trees. After training on parsed samples of the Brown Corpus, the networks achieve precision and recall on constituents that approaches that of statistical methods for this task. 1 Introduction Connectionist networks are popular for many of the same reasons as statistical techniques. They are robust and have effective learning algorithms. They also have the advantage of learning their own internal representations, so they are less constrained by the way the system designer formulates the problem. These properties and their prevalence in cognitive modeling has generated significant intere...
Robust, Applied Morphological Generation
- IN IN PROCEEDINGS OF THE FIRST INTERNATIONAL NATURAL LANGUAGE GENERATION CONFERENCE
, 2000
"... In practical natural language generation systems it is often advantageous to have a separate component that deals purely with morphological processing. We present such a component: a fast and robust morphological generator for English based on finite-state techniques that generates a word form given ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
In practical natural language generation systems it is often advantageous to have a separate component that deals purely with morphological processing. We present such a component: a fast and robust morphological generator for English based on finite-state techniques that generates a word form given a specification of the lemma, part-of-speech, and the type of inflection required. We describe how this morphological generator is used in a prototype system for automatic simplification of English newspaper text, and discuss practical morphological and orthographic issues we have encountered in generation of unrestricted text within this application.

