Results 1 -
9 of
9
Automatic extraction of subcategorization from corpora
- In Proceedings of the 5th ACL Conference on Applied Natural Language Processing
, 1997
"... We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verb ..."
Abstract
-
Cited by 176 (7 self)
- Add to MetaCart
We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount 1. 1
Designing Statistical Language Learners: Experiments on Noun Compounds
, 1995
"... Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i ..."
Abstract
-
Cited by 65 (0 self)
- Add to MetaCart
Statistical language learning research takes the view that many traditional natural language processing tasks can be solved by training probabilistic models of language on a sufficient volume of training data. The design of statistical language learners therefore involves answering two questions: (i) Which of the multitude of possible language models will most accurately reflect the properties necessary to a given task? (ii) What will constitute a sufficient volume of training data? Regarding the first question, though a variety of successful models have been discovered, the space of possible designs remains largely unexplored. Regarding the second, exploration of the design space has so far proceeded without an adequate answer. The goal of this thesis is to advance the exploration of the statistical language learning design space. In pursuit of that goal, the thesis makes two main theoretical contributions: it identifies a new class of designs by providing a novel theory of statistical natural language processing, and it presents the foundations for a predictive theory of data requirements to assist in future design explorations. The first of these contributions is called the meaning distributions theory. This theory
Parsing (with) Punctuation etc.
- Rank Xerox Research Laboratory
, 1994
"... In this paper, I describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. I desc ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
In this paper, I describe an approach to robust domain-independent syntactic parsing of unrestricted naturally-occurring (English) input. The technique involves parsing sequences of part-of-speech and punctuation labels using a unification-based grammar coupled with a probabilistic LR parser. I describe the coverage of several corpora using this grammar and report an experiment to derive a probabilistic LR parser for the grammar from bracketed training data. I describe a systematic and declarative text grammar for English and its (modular) integration with the syntactic grammar. I evaluate the contribution of punctuation to deriving an accurate syntactic analysis through experiments with the trained parser on identical texts either with or without naturally-occurring punctuation marks. I briefly outline how the resulting system might be used to acquire an accurate valency / argument structure dictionary. 1 . 1 Introduction This paper is part of a continuing effort to develop a robus...
Extraction of Predicate-Argument Structures from Texts
- Proceedings of the 2 nd Conference on Recent Advances in Natural Language Processing—RANLP-97, Tzigov Chark (Bulgaria
, 1997
"... We consider extraction of predicate-argument structures from a single text with a substantial narrative part. Working with such texts rather than large corpora requires detailed syntactic analyses, a learning mechanism and a cooperating user who confirms automatically generated results. We des ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We consider extraction of predicate-argument structures from a single text with a substantial narrative part. Working with such texts rather than large corpora requires detailed syntactic analyses, a learning mechanism and a cooperating user who confirms automatically generated results. We describe a system with such capabilities. The system has been tested on a variety of texts and has recently undergone an experimental evaluation. User participation appears not onerous at all, and there is a clear learning pattern. 1 Introduction Work on lexicons has arisen as a central issue in Natural Language Processing (NLP). (Wilks et al. 96; Guthrie et al. 96; Saint-Dizier & Viegas 95) give an up-to-date overview. Despite common acceptance of the pivotal role of lexicons, debates continue on the function, contents and organization of life-size lexicons, and the methods of their creation and maintenance. In this paper, we address some of these questions, and offer a few practical so...
Survey of Parallel Context-Free Parsing Techniques
, 1997
"... This report describes research done in the context of a subproject of the HPCN project IMPACT. The IMPACT project is headed by the ING bank and is founded by the organization for High Performance Computing and Networking (HPCN). The aim of the specific subproject, in the context of which this report ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
This report describes research done in the context of a subproject of the HPCN project IMPACT. The IMPACT project is headed by the ING bank and is founded by the organization for High Performance Computing and Networking (HPCN). The aim of the specific subproject, in the context of which this report has been written, is to develop (techniques for) natural language interfaces to information resources, focusing on the use of high-performance computers to achieve acceptable response times. This report is part of the "Parallel Parsing I" research topic. IMPACT-NLI-1997-1 ii Preface IMPACT IMPACT-NLI-1997-1 IMPACT iii Contents Preface i 1 Introduction 1 2 Basics 3
Towards Automatic Extraction of Argument Structure from Corpora
- ACQUILEX II Working Paper
, 1995
"... rom substantial quantities of corpus material, or the less ambitious and more frequent construction of `disposable' dictionaries or augmentation of `self-updating' dictionaries as and when new corpora need to be parsed. We have developed a system which is potentially capable of delivering putative ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
rom substantial quantities of corpus material, or the less ambitious and more frequent construction of `disposable' dictionaries or augmentation of `self-updating' dictionaries as and when new corpora need to be parsed. We have developed a system which is potentially capable of delivering putative lexical entries for predicates extracted from textual corpora, focussing on the acquisition of `argument structure' (defined as valency, semantic selectional restrictions/preferences, diathesis alternations, bounded dependency rules, such as passive or particle movement, and control of understood arguments in predicative complements) -- though so far we have mostly explored predictions with respect to valency. The approach we have adopted is to construct a `shallow' syntactic but global analysis of sentences for corpus material annotated with part-of-speech and punctuation mark sequences disambiguated by a tagger. We then extract relevant competing subanalyses surrounding a given predicate f
SDP - Spoken Dialogue Parser
, 1998
"... This report describes work done on part of speech tagging and parsing the MapTask Corpus in the \Robust Parsing and Part-of-Speech Tagging of Transcribed Speech Corpora" project, funded by the ESRC (project R000236800). This report concentrates on the implementation of the software developed in the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report describes work done on part of speech tagging and parsing the MapTask Corpus in the \Robust Parsing and Part-of-Speech Tagging of Transcribed Speech Corpora" project, funded by the ESRC (project R000236800). This report concentrates on the implementation of the software developed in the project and the format of the SGML annotation of the parse trees. An overview of the project's aims and results can be found in [McKelvie 98b] and an analysis of the speech disuencies found while parsing the corpus can be found in [McKelvie 98a]. 1 Contents 1 Segmentation 5
Architectural Aspects of Natural Language Processing Systems
, 1998
"... This report is intended to support design decisions that have to be made in the development of the NLP system of the IMPACT project. It is the result of investigating and comparing many different existing NLP systems. The main purpose of this research was to see what techniques for natural language ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report is intended to support design decisions that have to be made in the development of the NLP system of the IMPACT project. It is the result of investigating and comparing many different existing NLP systems. The main purpose of this research was to see what techniques for natural language processing have successfully been used in practical, existing natural language processing systems, and which of these approaches seem suitable candidates for efficient parallel natural language processing.
Parsing and Case Analysis in TANKA
- Proc. of COLING-92
, 1992
"... The TANKA project seeks to build a model of a technical domain by semi-automatically process- ing unedited English text that describes this main. Each sentence is parsed and conceptual elements are extracted from the parse. Concepts are derived from the Case smacture of a sentence, and added to a co ..."
Abstract
- Add to MetaCart
The TANKA project seeks to build a model of a technical domain by semi-automatically process- ing unedited English text that describes this main. Each sentence is parsed and conceptual elements are extracted from the parse. Concepts are derived from the Case smacture of a sentence, and added to a conceptual network that represents knowledge about the domain. The DIPETT parser has a particularly broad coverage of English syntax; its newest version can also process sentence fragments. The HAIKU subsystem is responsible for user-assisted semantic interpretation. It contains a Case Analyzer module that extracts phrases marking concepts from the parse and uses its past processing experience to derive the most likely Case realizations of each with almost no a priori semantic knowledge. The user must validate these selections. A key issue in our research is minimizing the number of interactions with the user by intelligently generating the alternatives offered.

