Results 1  10
of
43
Statistical Parsing with a Contextfree Grammar and Word Statistics
, 1997
"... We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previou ..."
Abstract

Cited by 410 (18 self)
 Add to MetaCart
We describe a parsing system based upon a language model for English that is, in turn, based upon assigning probabilities to possible parses for a sentence. This model is used in a parsing system by finding the parse for the sentence with the highest probability. This system outperforms previous schemes. As this is the third in a series of parsers by different authors that are similar enough to invite detailed comparisons but different enough to give rise to different levels of performance, we also report on some experiments designed to identify what aspects of these systems best explain their relative performance. Introduction We present a statistical parser that induces its grammar and probabilities from a handparsed corpus (a treebank). Parsers induced from corpora are of interest both as simply exercises in machine learning and also because they are often the best parsers obtainable by any method. That is, if one desires a parser that produces trees in the treebank ...
Supertagging: An Approach to Almost Parsing
 Computational Linguistics
, 1999
"... this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated wit ..."
Abstract

Cited by 167 (23 self)
 Add to MetaCart
(Show Context)
this paper, we have proposed novel methods for robust parsing that integrate the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. Our thesis is that the computation of linguistic structure can be localized if lexical items are associated with rich descriptions (Supertags) that impose complex constraints in a local context. The supertags are designed such that only those elements on which the lexical item imposes constraints appear within a given supertag. Further, each lexical item is associated with as many supertags as the number of different syntactic contexts in which the lexical item can appear. This makes the number of different descriptions for each lexical item much larger, than when the descriptions are less complex; thus increasing the local ambiguity for a parser. But this local ambiguity can be resolved by using statistical distributions of supertag cooccurrences collected from a corpus of parses. We have explored these ideas in the context of Lexicalized TreeAdjoining Grammar (LTAG) framework. The supertags in LTAG combine both phrase structure information and dependency information in a single representation. Supertag disambiguation results in a representation that is effectively a parse (almost parse), and the parser needs `only' combine the individual supertags. This method of parsing can also be used to parse sentence fragments such as in spoken utterances where the disambiguated supertag sequence may not combine into a single structure. 1 Introduction In this paper, we present a robust parsing approach called supertagging that integrates the flexibility of linguistically motivated lexical descriptions with the robustness of statistical techniques. The idea underlying the approach is that the ...
Parsing InsideOut
, 1998
"... Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract

Cited by 99 (2 self)
 Add to MetaCart
Probabilistic ContextFree Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given nonterminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and nbest lists. We also present three novel uses for the inside and outside probabilities. T...
Efficient Algorithms for Parsing the DOP Model
, 1996
"... Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that mus ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
Excellent results have been reported for DataOriented Parsing (DOP) of natural language texts (Bod, 1993c). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive due to two factors: the exponential number of rules that must be generated and the use of a Monte Carlo p arsing algorithm. In this paper we solve the first problem by a novel reduction of the DOP model toga small, equivalent probabilistic contextfree grammar. We solve the second problem by a novel deterministic parsing strategy that maximizes the expected number of correct con stituents, rather than the probability of a correct parse tree. Using ithe optimizations, experiments yield a 97% crossing brackets rate and 88% zero crossing brackets rate. This differs significantly from the results reported by Bod, and is compara ble to results from a duplication of Pereira and Schabes's (1992) experiment on the same data. We show that Bod's results are at least partially due to an extremely fortuitous choice of test data, and partially due to using cleaner data than other researchers.
Developing and evaluating a probabilistic LR parser of partofspeech and punctuation labels
 Proceedings of the ACL/SIGPARSE 4th International Workshop on Parsing Technologies
, 1995
"... ..."
A Lexicalized Tree Adjoining Grammar for English
, 1995
"... This document describes a sizable grammar of English written in the TAG formalism and implemented for use with the XTAG system. This report and the grammar described herein supersedes the TAG grammar described in [Abeill'e et al., 1990]. The English grammar described in this report is based on ..."
Abstract

Cited by 46 (0 self)
 Add to MetaCart
This document describes a sizable grammar of English written in the TAG formalism and implemented for use with the XTAG system. This report and the grammar described herein supersedes the TAG grammar described in [Abeill'e et al., 1990]. The English grammar described in this report is based on the TAG formalism developed in [Joshi et al., 1975], which has been extended to include lexicalization ([Schabes et al., 1988]), and unificationbased feature structures ([VijayShanker and Joshi, 1991]). The grammar discussed in this report extends the grammar presented in [Abeill'e et al., 1990] in at least two ways. First, this grammar has more detailed linguistic analyses, and second, the grammar presented in this paper is fully implemented. The range of syntactic phenomena that can be handled is large and includes auxiliaries (including inversion), copula, raising and small clause constructions, topicalization, relative clauses, infinitives, gerunds, passives, adjuncts, itclefts, whclefts,...
Stochastic Lexicalized ContextFree Grammar
, 1993
"... Stochastic lexicalized contextfree grammar (SLCFG) is an attractive compromise between the parsing efficiency of stochastic contextfree grammar (SCFG) and the lexical sensitivity of stochastic lexicalized treeadjoining grammar (SLTAG). SLCFG is a restricted form of SLTAG that can only generate ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
Stochastic lexicalized contextfree grammar (SLCFG) is an attractive compromise between the parsing efficiency of stochastic contextfree grammar (SCFG) and the lexical sensitivity of stochastic lexicalized treeadjoining grammar (SLTAG). SLCFG is a restricted form of SLTAG that can only generate contextfree languages and can be parsed in cubic time. However, SLCFG retains the lexical sensitivity of SLTAG and is therefore a much better basis for capturing distributional information about words than SCFG.
Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation
 UNIVERSITY OF PENNSYLVANIA
, 1996
"... We describe an implemented system for robust domainindependent syntactic parsing of English, using a unificationbased grammar of partofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; ..."
Abstract

Cited by 38 (11 self)
 Add to MetaCart
We describe an implemented system for robust domainindependent syntactic parsing of English, using a unificationbased grammar of partofspeech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioririse the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manuallydisambiguated analyses.