• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Automatic Acquisition of Subcategorization Frames from Untagged Text (1991)

by M R Brent
Venue:Proc. of the 29th ACL
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 76
Next 10 →

Automatic Acquisition of Hyponyms from Large Text Corpora

by Marti A. Hearst , 1992
"... We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidante of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are ..."
Abstract - Cited by 673 (4 self) - Add to MetaCart
We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidante of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also he acquirable iu this way. A subset of the acquisitiou algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications to areas such as information retrieval are suggested.

Statistical Parsing with a Context-free Grammar and Word Statistics

by Eugene Charniak , 1997
"... We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previou ..."
Abstract - Cited by 324 (17 self) - Add to MetaCart
We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance. Introduction We present a statistical parser that induces its grammar and probabilities from a hand-parsed corpus (a tree-bank). Parsers induced from corpora are of interest both as simply exercises in machine learning and also because they are often the best parsers obtainable by any method. That is, if one desires a parser that produces trees in the tree-bank ...

Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars

by Ted Briscoe, John Carroll - COMPUTATIONAL LINGUISTICS , 1993
"... ..."
Abstract - Cited by 178 (7 self) - Add to MetaCart
Abstract not found

Automatic extraction of subcategorization from corpora

by Ted Briscoe, John Carroll - In Proceedings of the 5th ACL Conference on Applied Natural Language Processing , 1997
"... We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verb ..."
Abstract - Cited by 176 (7 self) - Add to MetaCart
We describe a novel technique and implemented system for constructing a subcategorization dictionary from textual corpora. Each dictionary entry encodes the relative frequency of occurrence of a comprehensive set of subcategorization classes for English. An initial experiment, on a sample of 14 verbs which exhibit multiple complementation patterns, demonstrates that the technique achieves accuracy comparable to previous approaches, which are all limited to a highly restricted set of subcategorization classes. We also demonstrate that a subcategorization dictionary built with the system improves the accuracy of a parser by an appreciable amount 1. 1

A corpus-based approach to Language learning

by Eric Brill , 1993
"... ..."
Abstract - Cited by 158 (5 self) - Add to MetaCart
Abstract not found

Towards History-based Grammars: Using Richer Models for Probabilistic Parsing

by Ezra Black, Fred Jelinek, John Lafferty, David M. Magerman, Robert Mercer, Salim Roukos - In Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics , 1993
"... We describe a generarive probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel ..."
Abstract - Cited by 148 (6 self) - Add to MetaCart
We describe a generarive probabilistic model of natural language, which we call HBG, that takes advantage of detailed linguistic information to resolve ambiguity. HBG incorporates lexical, syntactic, semantic, and structural information from the parse tree into the disambiguation process in a novel way. We use a corpus of bracketed sentences, called a Treebank, in combination with decision tree building to tease out the relevant aspects of a parse tree that will determine the correct parse of a sentence. This stands in contrast to the usual approach of further grammar tailoring via the usual linguistic introspection in the hope of generating the correct parse. In head-to-head tests against one of the best existing robust probabilistic parsing models, which we call P-CFG, the HBG model significantly outperforms P-CFG, increasing the parsing accuracy rate from 60% to 75%, a 37% reduction in error.

Automatic Acquisition Of A Large Subcategorization Dictionary From Corpora

by Christopher D. Manning , 1993
"... This paper presents a new method for producing a dictionary of subcategorization frames from un- labelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates o ..."
Abstract - Cited by 145 (4 self) - Add to MetaCart
This paper presents a new method for producing a dictionary of subcategorization frames from un- labelled text corpora. It is shown that statistical filtering of the results of a finite state parser running on the output of a stochastic tagger produces high quality results, despite the error rates of the tagger and the parser. Further, it is argued that this method can be used to learn all subcategorization frames, whereas previous methods are not extensible to a general solution to the problem.

From Grammar to Lexicon: Unsupervised Learning of Lexical Syntax

by Michael R. Brent - , 1993
"... ... This paper describes an approach based on two principles. First, rely on local morpho-syntactic cues to structure rather than trying to parse entire sentences. Second, treat these cues as probabilistic rather than absolute indicators of syntactic structure. Apply inferential statistics to the da ..."
Abstract - Cited by 123 (4 self) - Add to MetaCart
... This paper describes an approach based on two principles. First, rely on local morpho-syntactic cues to structure rather than trying to parse entire sentences. Second, treat these cues as probabilistic rather than absolute indicators of syntactic structure. Apply inferential statistics to the data collected using the cues, rather than drawing a categorical conclusion from a single occurrence of a cue. The effectiveness of this approach for inferring the syntactic frames of verbs is supported by experiments on an English corpus using a program called Lerner. Lerner starts out with no knowledge of content words--it bootstraps from determiners, auxiliaries, modals, prepositions, pronouns, complementizers, coordinating conjunctions, and punctuation.

Generalizing Case Frames Using a Thesaurus and the MDL Principle

by Hang Li, Naoki Abe - Computational Linguistics , 1998
"... this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue ..."
Abstract - Cited by 95 (4 self) - Add to MetaCart
this paper, we confine ourselves to the former issue, and refer the interested reader to Li and Abe (1996), which deals with the latter issue

Part-of-Speech Tagging and Partial Parsing

by Steven Abney - Corpus-Based Methods in Language and Speech , 1996
"... m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the va ..."
Abstract - Cited by 85 (0 self) - Add to MetaCart
m we can carve o# next. `Partial parsing' is a cover term for a range of di#erent techniques for recovering some but not all of the information contained in a traditional syntactic analysis. Partial parsing techniques, like tagging techniques, aim for reliability and robustness in the face of the vagaries of natural text, by sacrificing completeness of analysis and accepting a low but non-zero error rate. 1 Tagging The earliest taggers [35, 51] had large sets of hand-constructed rules for assigning tags on the basis of words' character patterns and on the basis of the tags assigned to preceding or following words, but they had only small lexica, primarily for exceptions to the rules. TAGGIT [35] was used to generate an initial tagging of the Brown corpus, which was then hand-edited. (Thus it provided the data that has since been used to train other taggers [20].) The tagger described by Garside [56, 34], CLAWS, was a probabilistic version of TAGGIT, and the DeRose tagger improved on
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University