Results 1 - 10
of
22
Automatic extraction of subcategorization from corpora
- In Proceedings of the 5th ACL Conference on Applied Natural Language Processing
, 1997
"... We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verb ..."
Abstract
-
Cited by 176 (7 self)
- Add to MetaCart
We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount 1. 1
Robust Accurate Statistical Annotation of General Text
, 2002
"... We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it ha ..."
Abstract
-
Cited by 146 (11 self)
- Add to MetaCart
We describe a robust accurate domain-independent approach to statistical parsing incorporated into the new release of the ANLT toolkit, and publicly available as a research tool. The system has been used to parse many well known corpora in order to produce data for lexical acquisition efforts; it has also been used as a component in an open-domain question answering project. The performance of the system is competitive with that of statistical parsers using highly lexicalised parse selection models. However, we plan to extend the system to improve parse coverage, depth and accuracy.
Subcategorization Acquisition
, 2002
"... Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and pr ..."
Abstract
-
Cited by 64 (13 self)
- Add to MetaCart
Manual development of large subcategorised lexicons has proved difficult because predicates change behaviour between sublanguages, domains and over time. Yet access to a comprehensive subcategorization lexicon is vital for successful parsing capable of recovering predicate-argument relations, and probabilistic parsers would greatly benefit from accurate information concerning the relative likelihood of different subcategorisation frames (scfs) of a given predicate. Acquisition of subcategorization lexicons from textual corpora has recently become increasingly popular. Although this work has met with some success, resulting lexicons indicate a need for greater accuracy. One significant source of error lies in the statistical filtering used for hypothesis selection, i.e. for removing noise from automatically acquired scfs. This thesis builds on earlier work in verbal subcategorization acquisition, taking as a starting point the problem with statistical filtering. Our investigation shows that statistical filters tend to work poorly because not only is the underlying distribution zipfian, but there is also very little correlation between conditional distribution of
Developing and evaluating a probabilistic LR parser of part-of-speech and punctuation labels
- In Proceedings of the 4th ACL/SIGPARSE International Workshop on Parsing Technologies
, 1995
"... We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the cover ..."
Abstract
-
Cited by 52 (9 self)
- Add to MetaCart
We describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-ofspeech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. We describe the coverage of several corpora using this grammar and report the results of a parsing experiment using probabilities derived from bracketed training data. We report the first substantial experiments to assess the contribution of punctuation to deriving an accurate syntactic analysis, by parsing identical texts both with and without naturally-occurring punctuation marks. 1
An Efficient Implementation of the Head-Corner Parser
- COMPUTATIONAL LINGUISTICS
, 1996
"... This paper describes an efficient and robust implementation of a bidirectional, head-driven parser for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken dialogue system in which information about public transport can be obtained by telephone. After a Review ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
This paper describes an efficient and robust implementation of a bidirectional, head-driven parser for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken dialogue system in which information about public transport can be obtained by telephone. After a Review
Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation
- UNIVERSITY OF PENNSYLVANIA
, 1996
"... We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-ofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioririse the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manually-disambiguated analyses.
Towards Systematic Grammar Profiling Test Suite Technology Ten Years After
- Special Issue on Evaluation), 411
, 1998
"... An experiment with recent test suite and grammar (engineering) resources is outlined: a critical assessment of the EU-funded tsnlp (Test Suites for Natural Language Processing) package as a diagnostic and benchmarking facility for a distributed (multi-site) large-scale hpsg grammar engineering ..."
Abstract
-
Cited by 26 (9 self)
- Add to MetaCart
An experiment with recent test suite and grammar (engineering) resources is outlined: a critical assessment of the EU-funded tsnlp (Test Suites for Natural Language Processing) package as a diagnostic and benchmarking facility for a distributed (multi-site) large-scale hpsg grammar engineering e#ort. This paper argues for a generalized, systematic, and fully automated testing and diagnosis facility as an integral part of the linguistic engineering cycle and gives a practical assessment of existing resources; both a flexible methodology and tools for competence and performance profiling are presented. By comparison to earlier evaluation work as reflected in the Hewlett-Packard test suite data, released exactly ten years before tsnlp, it is judged where testsuite -based evaluation has improved (and where not) over time. 1 Motivation [...] the study and optimisation of unification-based parsing must rely on empirical data until complexity theory can more accurately p...
Practical Simplification of English Newspaper Text to Assist Aphasic Readers
- In Proc. of AAAI-98 Workshop on Integrating Artificial Intelligence and Assistive Technology
, 1998
"... Aphasia is a disability of language processing often suffered by people as a result of a stroke or head injury. In order to assist aphasic readers we are developing a system which automatically simplifies English newspaper texts as available on the Internet. The system combines state-of-the-art natu ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
Aphasia is a disability of language processing often suffered by people as a result of a stroke or head injury. In order to assist aphasic readers we are developing a system which automatically simplifies English newspaper texts as available on the Internet. The system combines state-of-the-art natural language processing tools with innovative research on text simplification. We present the architecture of the system, discuss the analysis of newspaper text and a number of criteria for simplification. In addition, we provide some initial implementation details and propose an evaluation method. Keywords: robust parsing, text simplification, aphasia, reading assistance Introduction Recently, there has been increasing interest in the use of results from natural language processing for the development of assistive technology. 1 Here, we address this topic by reporting preliminary work carried out in the research project "PSET: Practical Simplification of English Text". 2 The aim of t...
Continuous or discontinuous constituents? a comparison between syntactic analyses for constituent order and their processing systems
- Research on Language and Computation
, 2004
"... Abstract. In this paper I discuss several possible analyses for constituent order in German. Approaches that assume continuous constituents are compared with an approach that assumes discontinuous constituents. I will show that certain proposals that have been made to analyze constituent order are e ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract. In this paper I discuss several possible analyses for constituent order in German. Approaches that assume continuous constituents are compared with an approach that assumes discontinuous constituents. I will show that certain proposals that have been made to analyze constituent order are either not adequate or cannot be implemented with currently available systems. For the proposals that can be implemented I will discuss the amount of work a parser has to do. I then compare two implementations of larger fragments of German: the Verbmobil grammar and the Babel grammar. It is shown that the amount of work to be done to parse the Verbmobil grammar is significantly higher than the work that has to be done parsing with the Babel grammar. Key words: German, HPSG, implementation, linearization, parsing 1.
Measure For Measure: Parser Cross-Fertilization - Towards Increased Component Comparability and Exchange
, 2000
"... Over the past few years significant progress was accomplished in efficient processing with wide-coverage hpsg grammars. hpsg-based parsing systems are now available that can process medium-complexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human readin ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Over the past few years significant progress was accomplished in efficient processing with wide-coverage hpsg grammars. hpsg-based parsing systems are now available that can process medium-complexity sentences (of ten to twenty words, say) in average parse times equivalent to real (i.e. human reading) time. A large number of engineering improvements in current hpsg systems were achieved through collaboration of multiple research centers and mutual exchange of experience, encoding techniques, algorithms, and even pieces of software. This article presents an approach to grammar and system engineering, termed competence & performance profiling, that makes systematic experimentation and the precise empirical study of system properties a focal point in development. Adapting the profiling metaphor familiar from software engineering to constraint-based grammars and parsers, enables developers to maintain an accurate record of system evolution, identify grammar and system deficiencies quickl...

