Results 1 -
7 of
7
Creating Robust Supervised Classifiers via Web-Scale N-gram Data
"... In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that in ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
Unbounded Dependency Recovery for Parser Evaluation
"... This paper introduces a new parser evaluation corpus containing around 700 sentences annotated with unbounded dependencies, from seven different grammatical constructions. We run a series of off-theshelf parsers on the corpus to evaluate how well state-of-the-art parsing technology is able to recove ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper introduces a new parser evaluation corpus containing around 700 sentences annotated with unbounded dependencies, from seven different grammatical constructions. We run a series of off-theshelf parsers on the corpus to evaluate how well state-of-the-art parsing technology is able to recover such dependencies. The overall results range from 25 % accuracy to 59%. These low scores call into question the validity of using Parseval scores as a general measure of parsing capability. We discuss the importance of parsers being able to recover unbounded dependencies, given their relatively low frequency in corpora. We also analyse the various errors made on these constructions by one of the more successful parsers. 1
Unsupervised Acquisition of Lexical Knowledge From N-grams: Final Report of the 2009 JHU CLSP Workshop
"... This report describes a variety of work that uses web-scale N-gram data. This ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report describes a variety of work that uses web-scale N-gram data. This
Unsupervised Parse Selection for HPSG
"... Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant ..."
Abstract
- Add to MetaCart
Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as treebanking is generally streamlined with parse selection models, creating the initial treebank without a model requires more resources than subsequent treebanks. In this work, we show that, by taking advantage of the constrained nature of these HPSG grammars, we can learn a discriminative parse selection model from raw text in a purely unsupervised fashion. This allows us to bootstrap the treebanking process and provide better parsers faster, and with less resources. 1
Large-Scale Syntactic Processing . . .
, 2009
"... Scalable syntactic processing will underpin the sophisticated language technology needed for next generation information access. Companies are already using nlp tools to create web-scale question answering and “semantic search” engines. Massive amounts of parsed web data will also allow the automati ..."
Abstract
- Add to MetaCart
Scalable syntactic processing will underpin the sophisticated language technology needed for next generation information access. Companies are already using nlp tools to create web-scale question answering and “semantic search” engines. Massive amounts of parsed web data will also allow the automatic creation of semantic knowledge resources on an unprecedented scale. The web is a challenging arena for syntactic parsing, because of its scale and variety of styles, genres, and domains. The goals of our workshop were to scale and adapt an existing wide-coverage parser to Wikipedia text; improve the efficiency of the parser through various methods of chart pruning; use self-training to improve the efficiency and accuracy of the parser; use the parsed wiki data for an innovative form of bootstrapping to make the parser both more efficient and more accurate; and finally use the parsed web data for improved disambiguation of coordination structures, using a variety of syntactic and semantic knowledge sources. The focus of the research was the c&c parser (Clark and Curran, 2007c), a stateof-the-art statistical parser based on Combinatory Categorial Grammar (ccg). The parser has been evaluated on a number of standard test sets achieving state-of-the-art accuracies. It has also recently been adapted successfully to the biomedical domain (Rimell and Clark, 2009). The parser is surprisingly efficient, given its detailed output, processing tens of sentences per second. For web-scale text processing, we aimed to make the parser an order of magnitude faster still. The c&c parser is one of only very few parsers currently available which has the potential to produce detailed, accurate analyses at the scale we were considering.
Parsing Natural Language Queries for Life Science Knowledge
"... This paper presents our preliminary work on adaptation of parsing technology toward natural language query processing for biomedical domain. We built a small treebank of natural language queries, and tested a state-of-theart parser, the results of which revealed that a parser trained on Wall-Street- ..."
Abstract
- Add to MetaCart
This paper presents our preliminary work on adaptation of parsing technology toward natural language query processing for biomedical domain. We built a small treebank of natural language queries, and tested a state-of-theart parser, the results of which revealed that a parser trained on Wall-Street-Journal articles and Medline abstracts did not work well on query sentences. We then experimented an adaptive learning technique, to seek the chance to improve the parsing performance on query sentences. Despite the small scale of the experiments, the results are encouraging, enlightening the direction for effective improvement. 1
Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
"... State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between ..."
Abstract
- Add to MetaCart
State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages. 1

