Results 1 - 10
of
18
Creating Robust Supervised Classifiers via Web-Scale N-gram Data
"... In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that in ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
Using Web-scale N-grams to Improve Base NP Parsing Performance
"... We use web-scale N-grams in a base NP parser that correctly analyzes 95.4 % of the base NPs in natural text. Web-scale data improves performance. That is, there is no data like more data. Performance scales log-linearly with the number of parameters in the model (the number of unique N-grams). The w ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We use web-scale N-grams in a base NP parser that correctly analyzes 95.4 % of the base NPs in natural text. Web-scale data improves performance. That is, there is no data like more data. Performance scales log-linearly with the number of parameters in the model (the number of unique N-grams). The web-scale N-grams are particularly helpful in harder cases, such as NPs that contain conjunctions. 1
Learning bilingual lexicons using the visual similarity of labeled web images
- In Proceedings of the International Joint Conference on Artificial Intelligence
, 2011
"... Speakers of many different languages use the Internet. A common activity among these users is uploading images and associating these images with words (in their own language) as captions, filenames, or surrounding text. We use these explicit, monolingual, image-to-word connections to successfully le ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Speakers of many different languages use the Internet. A common activity among these users is uploading images and associating these images with words (in their own language) as captions, filenames, or surrounding text. We use these explicit, monolingual, image-to-word connections to successfully learn implicit, bilingual, word-to-word translations. Bilingual pairs of words are proposed as translations if their corresponding images have similar visual features. We generate bilingual lexicons in 15 language pairs, focusing on words that have been automatically identified as physical objects. The use of visual similarity substantially improves performance over standard approaches based on string similarity: for generated lexicons with 1000 translations, including visual information leads to an absolute improvement in accuracy of 8-12 % over string edit distance alone. 1
Paraphrastic sentence compression with a character-based metric: Tightening without deletion
- In Proceedings of ACL, Workshop on Monolingual Text-To-Text Generation
, 2011
"... We present a substitution-only approach to sentence compression which “tightens ” a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60 % of the original length. In support of this task, we introduce a novel technique ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We present a substitution-only approach to sentence compression which “tightens ” a sentence by reducing its character length. Replacing phrases with shorter paraphrases yields paraphrastic compressions as short as 60 % of the original length. In support of this task, we introduce a novel technique for re-ranking paraphrases extracted from bilingual corpora. At high compression rates1 paraphrastic compressions outperform a state-of-the-art deletion model in an oracle experiment. For further compression, deleting from oracle paraphrastic compressions preserves more meaning than deletion alone. In either setting, paraphrastic compression shows promise for surpassing deletion-only methods. 1
Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
"... Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don’t do very well on this problem (79%). We explore systems trained u ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don’t do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Europarl), and (3) unannotated monolingual (e.g. Google N-grams). Size matters: (1) is a million words, (2) is potentially billions of words and (3) is potentially trillions of words. The unannotated monolingual data is helpful when the ambiguity can be resolved through associations among the lexical items. The bilingual data is helpful when the ambiguity can be resolved by the order of words in the translation. We train separate classifiers with monolingual and bilingual features and iteratively improve them via co-training. The co-trained classifier achieves close to 96 % accuracy on Treebank data and makes 20 % fewer errors than a supervised system trained with Treebank annotations. 1
Shared Components Topic Models with Application to Selectional Preference
"... Introduction Predicate argument selectional preference1 is the notion that the roles, or argument positions, of a given predicate tend to prefer some arguments to others. Automatically inferring these preferences has been a topic of interest within the computational linguistics community since the e ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Introduction Predicate argument selectional preference1 is the notion that the roles, or argument positions, of a given predicate tend to prefer some arguments to others. Automatically inferring these preferences has been a topic of interest within the computational linguistics community since the early 1990’s, with Resnik [3] giving examples such as: Mary drank some {wine, gasoline, pencils, sadness}, where the provided nouns in the syntactic object position of the verb drink are of
Predicting the Semantic Compositionality of Prefix Verbs
"... In many applications, replacing a complex word form by its stem can reduce sparsity, revealing connections in the data that would not otherwise be apparent. In this paper, we focus on prefix verbs: verbs formed by adding a prefix to an existing verb stem. A prefix verb is considered compositional if ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In many applications, replacing a complex word form by its stem can reduce sparsity, revealing connections in the data that would not otherwise be apparent. In this paper, we focus on prefix verbs: verbs formed by adding a prefix to an existing verb stem. A prefix verb is considered compositional if it can be decomposed into a semantically equivalent expression involving its stem. We develop a classifier to predict compositionality via a range of lexical and distributional features, including novel features derived from web-scale N-gram data. Results on a new annotated corpus show that prefix verb compositionality can be predicted with high accuracy. Our system also performs well when trained and tested on conventional morphological segmentations of prefix verbs. 1
Unsupervised Acquisition of Lexical Knowledge From N-grams: Final Report of the 2009 JHU CLSP Workshop
"... This report describes a variety of work that uses web-scale N-gram data. This ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report describes a variety of work that uses web-scale N-gram data. This
Reranking Bilingually Extracted Paraphrases Using Monolingual Distributional Similarity
"... This paper improves an existing bilingual paraphrase extraction technique using monolingual distributional similarity to rerank candidate paraphrases. Raw monolingual data provides a complementary and orthogonal source of information that lessens the commonly observed errors in bilingual pivotbased ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper improves an existing bilingual paraphrase extraction technique using monolingual distributional similarity to rerank candidate paraphrases. Raw monolingual data provides a complementary and orthogonal source of information that lessens the commonly observed errors in bilingual pivotbased methods. Our experiments reveal that monolingual scoring of bilingually extracted paraphrases has a significantly stronger correlation with human judgment for grammaticality than the probabilities assigned by the bilingual pivoting method does. The results also show that monolingual distribution similarity can serve as a threshold for high precision paraphrase selection. 1
Monolingual Distributional Similarity for Text-to-Text Generation
"... Previous work on paraphrase extraction and application has relied on either parallel datasets, or on distributional similarity metrics over large text corpora. Our approach combines these two orthogonal sources of information and directly integrates them into our paraphrasing system’s log-linear mod ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Previous work on paraphrase extraction and application has relied on either parallel datasets, or on distributional similarity metrics over large text corpora. Our approach combines these two orthogonal sources of information and directly integrates them into our paraphrasing system’s log-linear model. We compare different distributional similarity feature-sets and show significant improvements in grammaticality and meaning retention on the example text-to-text generation task of sentence compression, achieving stateof-the-art quality. 1

