Results 1 - 10
of
30
Automatic extraction of subcategorization from corpora
- In Proceedings of the 5th ACL Conference on Applied Natural Language Processing
, 1997
"... We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verb ..."
Abstract
-
Cited by 176 (7 self)
- Add to MetaCart
We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount 1. 1
Developing and evaluating a probabilistic LR parser of part-of-speech and punctuation labels
- In Proceedings of the 4th ACL/SIGPARSE International Workshop on Parsing Technologies
, 1995
"... We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the cover ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks. 1
Practical Unification-based Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL parsing, and describes a bottom-up active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Enjoy the Paper: Lexical Semantics via Lexicology
- Proceedings of the 13th International Conference on Computational Linguistics (COLING-90
, 1990
"... Current research being undertaken at beth Cambridge and IBM is aimed at the construction of substantial lexicons containing lexical semantic information capable of use in automated natural language processing (NLP) applications. This work extends previous research on the semi-automatic extraction of ..."
Abstract
-
Cited by 43 (13 self)
- Add to MetaCart
Current research being undertaken at beth Cambridge and IBM is aimed at the construction of substantial lexicons containing lexical semantic information capable of use in automated natural language processing (NLP) applications. This work extends previous research on the semi-automatic extraction of lexical information from machine-readable versions of conventional dictionaries (MRDs) (see e.g. the papers and references in Boguraev & Briscoe, 1989; Walker et al., 1988). The motivation for this and previous research using MRDs is that entirely matreal development of lexicons for practical NLP applications is infeasible, given the labour-intensive nature of lexicography (e.g. Arkins, 1988) and the resources likely to be allocated to NLP in the foreseeable future. In tlfis paper, we motivate a particular approach to lexical semantics, briefly demonstrate its computational tractability, and explore the possibility of extracting the lexical information this approach requires from MRDs and, to stone extent, textual coqxra.
Robust Stochastic Parsing Using the Inside-Outside Algorithm
, 1992
"... this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
this paper, we discuss the application of the Viterbi algorithm and the Baum-Welch algorithm (in wide use for speech recognition) to the parsing problem and describe a recent experiment designed to produce a simple, robust, probabilistic parser which selects an appropriate analysis frequently enough to be useful and deals effectively with the problem of undergeneration. We focus on the application of these stochastic algorithms here because, although other statistically based approaches have been proposed (e.g. Sampson et al., 1989; Garside & Leech, 1985; Magerman & Marcus, 1991a,b), these appear most promising as they are computationally-tractable (in principle) and well-integrated with formal language / automata theory. The Viterbi algorithm and Baum-Welch algorithm are optimised algorithms (with polynomial computational complexity) which can be used in conjunction with stochastic regular grammars (finite-state automata, i.e. (hidden) markov models, Baum, 1972) and with probabilistic context-free grammars (Baker, 1982; Fujisaki
The ACQUILEX LKB: representation issues in semi-automatic acquisition of large lexicons
- Proceedings of the 3rd Conference on Applied Natural Language Processing (ANLP-92
, 1992
"... We describe the lexical knowledge base sys- tem (LKB) which has been designed and implemented as part of the ACQUILEX project x to allow the representation of multilingual syn- tactic and semantic information extracted from machine readable dictionaries (MRDs), in such a way that it is usable ..."
Abstract
-
Cited by 35 (12 self)
- Add to MetaCart
We describe the lexical knowledge base sys- tem (LKB) which has been designed and implemented as part of the ACQUILEX project x to allow the representation of multilingual syn- tactic and semantic information extracted from machine readable dictionaries (MRDs), in such a way that it is usable by natural language processing (NLP) systems. The LKB's lexical representation language (LRL) augments typed graph-based unification with default inheritance, formalised in terms of default unifi- cation of feature structures. We evaluate how well the LRL meets the practical requirements arising from the semi-automatic construction of a large scale, multilingual lexicon. The system as described is fully implemented and is being used to represent substantial amounts of information automatically extracted from MRDs.
Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation
- UNIVERSITY OF PENNSYLVANIA
, 1996
"... We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioririse the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manually-disambiguated analyses.
The Representation of Lexical Semantic Information
- University of Sussex
, 1992
"... This thesis is an investigation of the representation of lexical semantic information from a computational linguistic perspective. An implemented representation language is described which is not specic to lexical semantics, but is based on the use of typed feature structures augmented with default ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
This thesis is an investigation of the representation of lexical semantic information from a computational linguistic perspective. An implemented representation language is described which is not specic to lexical semantics, but is based on the use of typed feature structures augmented with default operations. This language, which is formally specied, allows the lexical semantic representations to be tightly integrated with the syntactic component of the lexical sign, capturing generalisations by use of inheritance, while allowing for exceptions with the default mechanism. Default inheritance and default unication are discussed in detail. Grammar rules and lexical rules can be specied in the same formalism and thus the paradigmatic treatment of lexical semantics can be integrated with an account at the syntagmatic level. The use of the language is illustrated with some examples of the representation of verbs, the treatment of logical metonymy and of sense extension. This is followe...
Probabilistic Normalisation and Unpacking of Packed Parse Forests for Unification-based Grammars
- IN PROCEEDINGS OF THE AAAI FALL SYMPOSIUM ON PROBABILISTIC APPROACHES TO NATURAL LANGUAGE
, 1992
"... The research described below forms part of a wider programme to develop a practical parser for naturally-occurring natural language input which is capable of returning the n-best syntacticallydeterminate analyses, containing that which is semantically and pragmatically most appropriate (preferably ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
The research described below forms part of a wider programme to develop a practical parser for naturally-occurring natural language input which is capable of returning the n-best syntacticallydeterminate analyses, containing that which is semantically and pragmatically most appropriate (preferably as the highest ranked) from the exponential (in sentence length) syntactically legitimate possibilities (Church & Patil 1983), which can frequently run into the thousands with realistic sentences and grammars. We have opted to develop a domain-independent solution to this problem based on integrating statistical Markov modelling techniques, which offer the potential for rapid tuning to different sublanguages / corpora on the basis of supervised training, with linguistically-adequate grammatical (language) models, capable of returning analyses detailed enough to support semantic interpretation.
Parsing (with) Punctuation etc.
- Rank Xerox Research Laboratory
, 1994
"... In this paper, I describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. I desc ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
In this paper, I describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. I describe the coverage of several corpora using this grammar and report an experiment to derive a probabilistic LR parser for the grammar from bracketed training data. I describe a systematic and declarative text grammar for English and its (modular) integration with the syntactic grammar. I evaluate the contribution of punctuation to deriving an accurate syntactic analysis through experiments with the trained parser on identical texts either with or without naturally-occurring punctuation marks. I briefly outline how the resulting system might be used to acquire an accurate valency / argument structure dictionary. 1 . 1 Introduction This paper is part of a continuing effort to develop a robus...

