Results 11 - 20
of
65
Using the web as an implicit training set: application to structural ambiguity resolution
- In: Proceedings of HLT-EMNLP, Vancouver, British
, 2005
"... Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Recent work has shown that very large corpora can act as training data for NLP algorithms even without explicit labels. In this paper we show how the use of surface features and paraphrases in queries against search engines can be used to infer labels for structural ambiguity resolution tasks. Using unsupervised algorithms, we achieve 84 % precision on PP-attachment and 80 % on noun compound coordination. 1
Measuring the usefulness of function words for authorship attribution
- In Proceedings of the 2005 ACH/ALLC Conference
, 2005
"... S ome ..."
Using verbs to characterize noun-noun relations
- In Proc. of the 12th International Conference on Artificial Intelligence: Methodology, Systems, Applications (AIMSA), Bularia
, 2006
"... Abstract. We present a novel, simple, unsupervised method for characterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover predicates that make explicit the hidden relations between the nouns. This is accomplished by writing Web search engine que ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
Abstract. We present a novel, simple, unsupervised method for characterizing the semantic relations that hold between nouns in noun-noun compounds. The main idea is to discover predicates that make explicit the hidden relations between the nouns. This is accomplished by writing Web search engine queries that restate the noun compound as a relative clause containing a wildcard character to be filled in with a verb. A comparison to results from the literature suggest this is a promising approach.
Bootstrapping Coreference Classifiers with Multiple Machine Learning Algorithms
, 2003
"... Successful application of multi-view cotraining algorithms relies on the ability to factor the available features into views that are compatible and uncorrelated. This can potentially preclude their use on problems such as coreference resolution that lack an obvious feature split. To bootstrap ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Successful application of multi-view cotraining algorithms relies on the ability to factor the available features into views that are compatible and uncorrelated. This can potentially preclude their use on problems such as coreference resolution that lack an obvious feature split. To bootstrap coreference classifiers, we propose and evaluate a single-view weakly supervised algorithm that relies on two different learning algorithms in lieu of the two different views required by co-training. In addition, we investigate a method for ranking unlabeled instances to be fed back into the bootstrapping loop as labeled data, aiming to alleviate the problem of performance deterioration that is commonly observed in the course of bootstrapping.
Testing the Efficacy of Part-of-Speech Information in Word Completion
- Proceedings of the 10 th Conference of the European Chapter of the Association for Computational Linguistics
, 2003
"... We investigate the effect of incorporating syntactic information into a wordcompletion algorithm. We introduce two new algorithms that combine partof -speech tag trigrams with word bigrams, and evaluate them with a testbench constructed for the purpose. The results show a small but statistica ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
We investigate the effect of incorporating syntactic information into a wordcompletion algorithm. We introduce two new algorithms that combine partof -speech tag trigrams with word bigrams, and evaluate them with a testbench constructed for the purpose. The results show a small but statistically significant improvement in keystroke savings for one of our algorithms over baselines that use only word n-grams.
Combined Optimization of Feature Selection and Algorithm Parameters in Machine Learning of Language
- In Proc
, 2003
"... Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the `right bias' to solve specific natural language processing tasks, and (ii) to investigate which sourc ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the `right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach.
An Empirical Study of Active Learning with Support Vector Machines for Japanese Word Segmentation
, 2002
"... We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing. ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
We explore how active learning with Support Vector Machines works well for a non-trivial task in natural language processing.
Exploring web scale language models for search query processing
- In Proceedings of WWW 2010
"... It has been widely observed that search queries are composed in a very different style from that of the body or the title of a document. Many techniques explicitly accounting for this language style discrepancy have shown promising results for information retrieval, yet a large scale analysis on the ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
It has been widely observed that search queries are composed in a very different style from that of the body or the title of a document. Many techniques explicitly accounting for this language style discrepancy have shown promising results for information retrieval, yet a large scale analysis on the extent of the language differences has been lacking. In this paper, we present an extensive study on this issue by examining the language model properties of search queries and the three text streams associated with each web document: the body, the title, and the anchor text. Our information theoretical analysis shows that queries seem to be composed in a way most similar to how authors summarize documents in anchor texts or titles, offering a quantitative explanation to the observations in past work. We apply these web scale n-gram language models to three search query processing (SQP) tasks: query spelling correction, query bracketing and long query segmentation. By controlling the size and the order of different language models, we find that the perplexity metric to be a good accuracy indicator for these query processing tasks. We show that using smoothed language models yields significant accuracy gains for query bracketing for instance, compared to using web counts as in the literature. We also demonstrate that applying web-scale language models can have marked accuracy advantage over smaller ones.
Web Text Corpus for Natural Language Processing
, 2006
"... Web text has been successfully used as training data for many NLP applications. ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Web text has been successfully used as training data for many NLP applications.

