• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Wide-coverage efficient statistical parsing with CCG and log-linear models (2007)

Cached

  • Download as a PDF

Download Links

  • [web.comlab.ox.ac.uk]
  • [www.cl.cam.ac.uk]
  • [www.cl.cam.ac.uk]
  • [www.it.usyd.edu.au]
  • [sydney.edu.au]
  • [sydney.edu.au]
  • [www.cs.usyd.edu.au]
  • [aclweb.org]
  • [www.aclweb.org]
  • [wing.comp.nus.edu.sg]
  • [www.aclweb.org]
  • [aclweb.org]
  • [aclweb.org]
  • [www.aclweb.org]
  • [wing.comp.nus.edu.sg]
  • [web.comlab.ox.ac.uk]
  • [www.cs.ox.ac.uk]
  • [www.cl.cam.ac.uk]
  • [www.cl.cam.ac.uk]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Stephen Clark , James R. Curran
Venue:COMPUTATIONAL LINGUISTICS
Citations:216 - 43 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Clark07wide-coverageefficient,
    author = {Stephen Clark and James R. Curran},
    title = {Wide-coverage efficient statistical parsing with CCG and log-linear models},
    journal = {COMPUTATIONAL LINGUISTICS},
    year = {2007},
    volume = {33}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper describes a number of log-linear parsing models for an automatically extracted lexicalized grammar. The models are "full" parsing models in the sense that probabilities are defined for complete parses, rather than for independent events derived by decomposing the parse tree. Discriminative training is used to estimate the models, which requires incorrect parses for each sentence in the training data as well as the correct parse. The lexicalized grammar formalism used is Combinatory Categorial Grammar (CCG), and the grammar is automatically extracted from CCGbank, a CCG version of the Penn Treebank. The combination of discriminative training and an automatically extracted grammar leads to a significant memory requirement (over 20 GB), which is satisfied using a parallel implementation of the BFGS optimisation algorithm running on a Beowulf cluster. Dynamic programming over a packed chart, in combination with the parallel implementation, allows us to solve one of the largest-scale estimation problems in the statistical parsing literature in under three hours. A key component of the parsing system, for both training and testing, is a Maximum Entropy supertagger which assigns CCG lexical categories to words in a sentence. The supertagger makes the discriminative training feasible, and also leads to a highly efficient parser. Surprisingly,

Keyphrases

discriminative training    wide-coverage efficient statistical parsing    log-linear model    parallel implementation    correct parse    bfgs optimisation algorithm    maximum entropy supertagger    complete par    parse tree    statistical parsing literature    beowulf cluster    independent event    log-linear parsing model    parsing system    penn treebank    combinatory categorial grammar    packed chart    significant memory requirement    lexicalized grammar    largest-scale estimation problem    ccg version    efficient parser    dynamic programming    extracted grammar    training data    lexicalized grammar formalism    incorrect par    ccg lexical category    key component   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University