Results 1 - 10
of
45
Parsing the WSJ using CCG and log-linear models
- In Proceedings of the 42nd Meeting of the ACL
, 2004
"... This paper describes and evaluates log-linear parsing models for Combinatory Categorial Grammar (CCG). A parallel implementation of the L-BFGS optimisation algorithm is described, which runs on a Beowulf cluster allowing the complete Penn Treebank to be used for estimation. We also develop a new eff ..."
Abstract
-
Cited by 131 (16 self)
- Add to MetaCart
This paper describes and evaluates log-linear parsing models for Combinatory Categorial Grammar (CCG). A parallel implementation of the L-BFGS optimisation algorithm is described, which runs on a Beowulf cluster allowing the complete Penn Treebank to be used for estimation. We also develop a new efficient parsing algorithm for CCG which maximises expected recall of dependencies. We compare models which use all CCG derivations, including nonstandard derivations, with normal-form models. The performances of the two models are comparable and the results are competitive with existing wide-coverage CCG parsers.
Max-margin parsing
- In Proceedings of EMNLP
, 2004
"... We present a novel discriminative approach to parsing inspired by the large-margin criterion underlying support vector machines. Our formulation uses a factorization analogous to the standard dynamic programs for parsing. In particular, it allows one to efficiently learn a model which discriminates ..."
Abstract
-
Cited by 114 (14 self)
- Add to MetaCart
We present a novel discriminative approach to parsing inspired by the large-margin criterion underlying support vector machines. Our formulation uses a factorization analogous to the standard dynamic programs for parsing. In particular, it allows one to efficiently learn a model which discriminates among the entire space of parse trees, as opposed to reranking the top few candidates. Our models can condition on arbitrary features of input sentences, thus incorporating an important kind of lexical information without the added algorithmic complexity of modeling headedness. We provide an efficient algorithm for learning such models and show experimental evidence of the model’s improved performance over a natural baseline model and a lexicalized probabilistic context-free grammar. 1
Contrastive estimation: Training log-linear models on unlabeled data
- In Proc. of ACL
, 2005
"... Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabele ..."
Abstract
-
Cited by 89 (11 self)
- Add to MetaCart
Conditional random fields (Lafferty et al., 2001) are quite effective at sequence labeling tasks like shallow parsing (Sha and Pereira, 2003) and namedentity extraction (McCallum and Li, 2003). CRFs are log-linear, allowing the incorporation of arbitrary features into the model. To train on unlabeled data, we require unsupervised estimation methods for log-linear models; few exist. We describe a novel approach, contrastive estimation. We show that the new technique can be intuitively understood as exploiting implicit negative evidence and is computationally efficient. Applied to a sequence labeling problem—POS tagging given a tagging dictionary and unlabeled text—contrastive estimation outperforms EM (with the same feature set), is more robust to degradations of the dictionary, and can largely recover by modeling additional features. 1
Wide-coverage efficient statistical parsing with CCG and log-linear models
- COMPUTATIONAL LINGUISTICS
, 2007
"... This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminativ ..."
Abstract
-
Cited by 87 (20 self)
- Add to MetaCart
This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,
Speed and accuracy in shallow and deep stochastic parsing
- IN PROCEEDINGS OF HLT-NAACL’04
, 2004
"... This paper reports some initial experiments that compare the accuracy and performance of two stochastic parsing systems. The currently popular Collins parser is a shallow parser whose output contains more detailed semantically-relevant information than other such parsers. The XLE parser is a deep-pa ..."
Abstract
-
Cited by 39 (3 self)
- Add to MetaCart
This paper reports some initial experiments that compare the accuracy and performance of two stochastic parsing systems. The currently popular Collins parser is a shallow parser whose output contains more detailed semantically-relevant information than other such parsers. The XLE parser is a deep-parsing system that couples a Lexical Functional Grammar to a loglinear disambiguation component and provides the much richer representations of LFG theory. We measured the accuracy of both systems against a gold standard of the PARC 700 dependency bank, and also measured their processing times. We found that the deep-parsing system to be significantly more accurate than the Collins parser with only a slight reduction in parsing speed.
Probabilistic disambiguation models for wide-coverage HPSG parsing
- University of Michigan, Ann Arbor
, 2005
"... This paper reports the development of loglinear models for the disambiguation in wide-coverage HPSG parsing. The estimation of log-linear models requires high computational cost, especially with widecoverage grammars. Using techniques to reduce the estimation cost, we trained the models using 20 sec ..."
Abstract
-
Cited by 34 (11 self)
- Add to MetaCart
This paper reports the development of loglinear models for the disambiguation in wide-coverage HPSG parsing. The estimation of log-linear models requires high computational cost, especially with widecoverage grammars. Using techniques to reduce the estimation cost, we trained the models using 20 sections of Penn Treebank. A series of experiments empirically evaluated the estimation techniques, and also examined the performance of the disambiguation models on the parsing of real-world sentences.
Learning for Semantic Parsing with Statistical Machine Translation
, 2006
"... We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of- ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.
High Efficiency Realization for a Wide-Coverage Unification Grammar
, 2005
"... We give a detailed account of an algorithm for efficient tactical generation from underspecified logical-form semantics, using a wide-coverage grammar and a corpus of real-world target utterances. Some earlier claims about chart realization are critically reviewed and corrected in the light of a ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
We give a detailed account of an algorithm for efficient tactical generation from underspecified logical-form semantics, using a wide-coverage grammar and a corpus of real-world target utterances. Some earlier claims about chart realization are critically reviewed and corrected in the light of a series of practical experiments. As well as a set of algorithmic refinements, we present two novel techniques: the integration of subsumption-based local ambiguity factoring, and a procedure to selectively unpack the generation forest according to a probability distribution given by a conditional, discriminative model.
Parsing with soft and hard constraints on dependency length
- In Proceedings of the International Workshop on Parsing Technologies (IWPT
, 2005
"... In lexicalized phrase-structure or dependency parses, a word’s modifiers tend to fall near it in the string. We show that a crude way to use dependency length as a parsing feature can substantially improve parsing speed and accuracy in English and Chinese, with more mixed results on German. We then ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
In lexicalized phrase-structure or dependency parses, a word’s modifiers tend to fall near it in the string. We show that a crude way to use dependency length as a parsing feature can substantially improve parsing speed and accuracy in English and Chinese, with more mixed results on German. We then show similar improvements by imposing hard bounds on dependency length and (additionally) modeling the resulting sequence of parse fragments. This simple “vine grammar ” formalism has only finite-state power, but a context-free parameterization with some extra parameters for stringing fragments together. We exhibit a linear-time chart parsing algorithm with a low grammar constant. 1
Log-Linear Models for Wide-Coverage CCG Parsing
, 2003
"... This paper describes log-linear parsing models for Combinatory Categorial Grammar (CCG). Log-linear models can easily encode the long-range dependencies inherent in coordination and extraction phenomena, which CCG was designed to handle. Log-linear models have previously been applied to stati ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
This paper describes log-linear parsing models for Combinatory Categorial Grammar (CCG). Log-linear models can easily encode the long-range dependencies inherent in coordination and extraction phenomena, which CCG was designed to handle. Log-linear models have previously been applied to statistical parsing, under the assumption that all possible parses for a sentence can be enumerated. Enumerating all

