Results 1 - 10
of
103
Principles and implementation of deductive parsing
- JOURNAL OF LOGIC PROGRAMMING
, 1995
"... We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generaliz ..."
Abstract
-
Cited by 150 (4 self)
- Add to MetaCart
We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definiteclause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.
Wide-coverage efficient statistical parsing with CCG and log-linear models
- COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract
-
Cited by 87 (20 self)
- Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Parsing Inside-Out
, 1998
"... Probabilistic Context-Free Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probabili ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
Probabilistic Context-Free Grammars (PCFGs) and variations on them have recently become some of the most common formalisms for parsing. It is common with PCFGs to compute the inside and outside probabilities. When these probabilities are multiplied together and normalized, they produce the probability that any given non-terminal covers any piece of the input sentence. The traditional use of these probabilities is to improve the probabilities of grammar rules. In this thesis we show that these values are useful for solving many other problems in Statistical Natural Language Processing. We give a framework for describing parsers. The framework generalizes the inside and outside values to semirings. It makes it easy to describe parsers that compute a wide variety of interesting quantities, including the inside and outside probabilities, as well as related quantities such as Viterbi probabilities and n-best lists. We also present three novel uses for the inside and outside probabilities. T...
The Equivalence Of Four Extensions Of Context-Free Grammars
- Mathematical Systems Theory
, 1994
"... There is currently considerable interest among computational linguists in grammatical formalisms with highly restricted generative power. This paper concerns the relationship between the class of string languages generated by several such formalisms viz. Combinatory Categorial Grammars, Head Grammar ..."
Abstract
-
Cited by 64 (5 self)
- Add to MetaCart
There is currently considerable interest among computational linguists in grammatical formalisms with highly restricted generative power. This paper concerns the relationship between the class of string languages generated by several such formalisms viz. Combinatory Categorial Grammars, Head Grammars, Linear Indexed Grammars and Tree Adjoining Grammars. Each of these formalisms is known to generate a larger class of languages than Context-Free Grammars. The four formalisms under consideration were developed independently and appear superficially to be quite different from one another. The result presented in this paper is that all four of the formalisms under consideration generate exactly the same class of string languages. 1 Introduction There is currently considerable interest among computational linguists in grammatical formalisms with highly restricted generative power. This is based on the argument that a grammar formalism should not merely be viewed as a notation, but as part o...
Practical Unification-based Parsing of Natural Language
, 1993
"... The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such gr ..."
Abstract
-
Cited by 46 (7 self)
- Add to MetaCart
The thesis describes novel techniques and algorithms for the practical parsing of realistic Natural Language (NL) texts with a wide-coverage unification-based grammar of English. The thesis tackles two of the major problems in this area: firstly, the fact that parsing realistic inputs with such grammars can be computationally very expensive, and secondly, the observation that many analyses are often assigned to an input, only one of which usually forms the basis of the correct interpretation. The thesis starts by presenting a new unification algorithm, justifies why it is well-suited to practical NL parsing, and describes a bottom-up active chart parser which employs this unification algorithm together with several other novel processing and optimisation techniques. Empirical results demonstrate that an implementation of this parser has significantly better practical
Efficiency, Robustness and Accuracy in Picky Chart Parsing
- UNIVERSITY OF DELAWARE
, 1992
"... This paper describes Picky, a probabilistic agenda-based chart parsing algorithm which uses a technique called probabilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. Using a suboptimal search method, Picky significantly reduces the number of e ..."
Abstract
-
Cited by 43 (2 self)
- Add to MetaCart
This paper describes Picky, a probabilistic agenda-based chart parsing algorithm which uses a technique called probabilistic prediction to predict which grammar rules are likely to lead to an acceptable parse of the input. Using a suboptimal search method, Picky significantly reduces the number of edges produced by CKY-like chart parsing algorithms, while maintaining the robustness of pure bottom-up parsers and the accuracy of existing probabilistic parsers. Experiments using Picky demonstrate how probabilistic modelling can impact upon the efficiency, robustness and accuracy of a parser.
Statistical language model adaptation: review and perspectives
- Speech Communication
, 2004
"... Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate ..."
Abstract
-
Cited by 35 (0 self)
- Add to MetaCart
Speech recognition performance is severely affected when the lexical, syntactic, or semantic characteristics of the discourse in the training and recognition tasks differ. The aim of language model adaptation is to exploit specific, albeit limited, knowledge about the recognition task to compensate for this mismatch. More generally, an adaptive language model seeks to maintain an adequate representation of the current task domain under changing conditions involving potential variations in vocabulary, syntax, content, and style. This paper presents an overview of the major approaches proposed to address this issue, and offers some perspectives regarding their comparative merits and associated tradeoffs. Ó 2003 Elsevier B.V. All rights reserved. 1.
Recognition can be Harder than Parsing
- Computational Intelligence
, 1992
"... this paper is to discuss the scope and limitations of this approach, and to examine the suitability of several syntactic formalisms on the criterion of their ability to handle it. 2 Parsing as intersection ..."
Abstract
-
Cited by 34 (0 self)
- Add to MetaCart
this paper is to discuss the scope and limitations of this approach, and to examine the suitability of several syntactic formalisms on the criterion of their ability to handle it. 2 Parsing as intersection
An optimized algorithm for Data Oriented Parsing
, 1996
"... This paper presents an optimization of a syntactic disambiguation algorithm for Data Oriented Parsing (DOP) (Bod 93) in particular, and for Stochastic Tree-Substitution Grammars (STSGs) in general. The main advantage of this algorithm on existing alternatives ((Bod 93), (Schabes & Waters 93), (Sima' ..."
Abstract
-
Cited by 31 (5 self)
- Add to MetaCart
This paper presents an optimization of a syntactic disambiguation algorithm for Data Oriented Parsing (DOP) (Bod 93) in particular, and for Stochastic Tree-Substitution Grammars (STSGs) in general. The main advantage of this algorithm on existing alternatives ((Bod 93), (Schabes & Waters 93), (Sima'an et al. 94)) is that its time-complexity is linear, instead of square, in grammarsize (and cubic in sentence length). It is particularly suitable for natural language STSGs which have many deep elementary-trees and a small underlying Context-Free Grammar (CFG). A first implementation of this algorithm is operational and is exhibiting substantial speed up in comparison to the unoptimized version. In addition to presenting the optimized algorithm, the paper reports experiments for measuring the disambiguation-accuracy, the expected sizes and the execution-times of various DOP models, which are projected from the ATIS domain. Keywords: Corpus-based statistical NLP, syntactic disambiguation...

