Results 1 -
5 of
5
Chart Pruning for Fast Lexicalised-Grammar Parsing
"... Given the increasing need to process massive amounts of textual data, efficiency of NLP tools is becoming a pressing concern. Parsers based on lexicalised grammar formalisms, such as TAG and CCG, can be made more efficient using supertagging, which for CCG is so effective that every derivation consi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Given the increasing need to process massive amounts of textual data, efficiency of NLP tools is becoming a pressing concern. Parsers based on lexicalised grammar formalisms, such as TAG and CCG, can be made more efficient using supertagging, which for CCG is so effective that every derivation consistent with the supertagger output can be stored in a packed chart. However, wide-coverage CCG parsers still produce a very large number of derivations for typical newspaper or Wikipedia sentences. In this paper we investigate two forms of chart pruning, and develop a novel method for pruning complete cells in a parse chart. The result is a widecoverage CCG parser that can process almost 100 sentences per second, with little or no loss in accuracy over the baseline with no pruning. 1
Efficient CCG Parsing: A * versus Adaptive Supertagging
"... We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using ..."
Abstract
- Add to MetaCart
We present a systematic comparison and combination of two orthogonal techniques for efficient parsing of Combinatory Categorial Grammar (CCG). First we consider adaptive supertagging, a widely used approximate search technique that prunes most lexical categories from the parser’s search space using a separate sequence model. Next we consider several variants on A*, a classic exact search technique which to our knowledge has not been applied to more expressive grammar formalisms like CCG. In addition to standard hardware-independent measures of parser effort we also present what we believe is the first evaluation of A * parsing on the more realistic but more stringent metric of CPU time. By itself, A * substantially reduces parser effort as measured by the number of edges considered during parsing, but we show that for CCG this does not always correspond to improvements in CPU time over a CKY baseline. Combining A * with adaptive supertagging decreases CPU time by 15 % for our best model. 1
Punctuation normalisation for cleaner treebanks and parsers
"... Although punctuation is pervasive in written text, their treatment in parsers and corpora is often second-class. We examine the treatment of commas in CCGbank, a wide-coverage corpus for Combinatory Categorial Grammar (CCG), reanalysing its comma structures in order to eliminate a class of redundant ..."
Abstract
- Add to MetaCart
Although punctuation is pervasive in written text, their treatment in parsers and corpora is often second-class. We examine the treatment of commas in CCGbank, a wide-coverage corpus for Combinatory Categorial Grammar (CCG), reanalysing its comma structures in order to eliminate a class of redundant rules, obtaining a more consistent treebank. We then eliminate these rules from C&C, a wide-coverage statistical CCG parser, obtaining a 37 % increase in parsing speed on the standard CCGbank test set and a considerable reduction in memory consumed, without affecting parser accuracy. 1
Artificial
"... In previous work ([Zettlemoyer and Collins, 2007]), a system has been constructed that uses a small set of mappings between syntactic categories and logical forms in order to learn to convert natural language in to a formal language. The data used originates from database queries, offering a simple ..."
Abstract
- Add to MetaCart
In previous work ([Zettlemoyer and Collins, 2007]), a system has been constructed that uses a small set of mappings between syntactic categories and logical forms in order to learn to convert natural language in to a formal language. The data used originates from database queries, offering a simple semantic form for the task. The methods employed allow for the system to be scaled up to larger and more complex domains by extending the set of mappings. This paper describes building a semantic parser around this previously used framework and attempting to learn to convert natural language from a richer source of semantically annotated data. In using this data set many complex linguistic constructions must be accounted for and the system is assessed specifically on its ability to capture a range of these constructions. The results reveal the requirements for more specialised features in order to control the multitude of powerful rules that such complex semantic descriptions demand. i Acknowledgements I would like to thank those who have given me help, advice and support while completing this project. Especially to my supervisor, Alex Lascarides, who formulated this

