Results 1 - 10
of
19
A unified architecture for natural language processing: Deep neural networks with multitask learning
, 2008
"... We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and sem ..."
Abstract
-
Cited by 52 (3 self)
- Add to MetaCart
We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in stateof-the-art performance. 1.
Finding advertising keywords on web pages
- In Proceedings of WWW
, 2006
"... A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and this is a substantial source of revenue supporting the web today. Despite the importance of this area, little formal, published research exists. We describe a system that learns how to extract keywords from web pages for advertisement targeting. The system uses a number of features, such as term frequency of each
Search advertising using web relevance feedback
- In Proc 17th. Intl. Conf. on Information and Knowledge Management
, 2008
"... The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Iden ..."
Abstract
-
Cited by 25 (10 self)
- Add to MetaCart
The business of Web search, a $10 billion industry, relies heavily on sponsored search, whereas a few carefully-selected paid advertisements are displayed alongside algorithmic search results. A key technical challenge in sponsored search is to select ads that are relevant for the user’s query. Identifying relevant ads is challenging because queries are usually very short, and because users, consciously or not, choose terms intended to lead to optimal Web search results and not to optimal ads. Furthermore, the ads themselves are short and usually formulated to capture the reader’s attention rather than to facilitate query matching. Traditionally, matching of ads to queries employed standard information retrieval techniques using the bag of words approach. Here we propose to go beyond the bag of words, and augment both queries and ads with additional knowledgerich features. We use Web search results initially returned for the query to create a pool of relevant documents. Classifying these documents with respect to an external taxonomy and identifying salient named entities give rise to two new feature types. Empirical evaluation based on over 9,000 query-ad pairwise judgments confirms that using augmented queries produces highly relevant ads. Our methodology also relaxes the requirement for each ad to explicitly specify the exhaustive list of queries (“bid phrases”) that can trigger it.
To transfer or not to transfer
- In NIPS’05 Workshop, Inductive Transfer: 10 Years Later
, 2005
"... With transfer learning, one set of tasks is used to bias learning and improve performance on another task. However, transfer learning may actually hinder performance if the tasks are too dissimilar. As described in this paper, one challenge for transfer learning research is to develop approaches tha ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
With transfer learning, one set of tasks is used to bias learning and improve performance on another task. However, transfer learning may actually hinder performance if the tasks are too dissimilar. As described in this paper, one challenge for transfer learning research is to develop approaches that detect and avoid negative transfer using very little data from the target task. 1
A comparative study of methods for transductive transfer learning
- In ICDM Workshop on Mining and Management of Biological Data
, 2007
"... The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In this paper we address the subproblem of domain adaptation, in which a model trained over a source domain is generalized to ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In this paper we address the subproblem of domain adaptation, in which a model trained over a source domain is generalized to perform well on a related target domain, where these two domains ’ data are distributed similarly, but not identically. Previous work has studied the supervised version of this problem in which labeled data from both source and target domains are available for training. In this work, however, we study the more challenging problem of unsupervised transductive transfer learning, where no labeled data from the target domain are available at training time, but instead, unlabeled target test data are available during training. We describe some current state-of-the-art inductive and transductive approaches involving three popular learning models, namely the maximum entropy, support vector machines and naive Bayes models. We then adapt these models to the problem of transfer learning for protein name extraction. In the process, we introduce a novel maximum entropy based technique, Iterative Feature Transformation (IFT), and show that it achieves comparable performance with state-of-the-art transductive SVMs. Finally, we compare the relative strengths and weaknesses of these models across the various learning settings, shedding light both on the algorithms examined and the difficulty of the respective problems. In addition, we show how simple relaxations, such as providing additional information like the proportion of positive examples in the test data, can significantly improve the performance of some of the transductive transfer learners. 1
A dual-layer CRF based joint decoding method for cascade segmentation and labelling tasks
- In Proceedings of IJCAI
, 2007
"... Many problems in NLP require solving a cascade of subtasks. Traditional pipeline approaches yield to error propagation and prohibit joint training/decoding between subtasks. Existing solutions to this problem do not guarantee non-violation of hard-constraints imposed by subtasks and thus give rise t ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Many problems in NLP require solving a cascade of subtasks. Traditional pipeline approaches yield to error propagation and prohibit joint training/decoding between subtasks. Existing solutions to this problem do not guarantee non-violation of hard-constraints imposed by subtasks and thus give rise to inconsistent results, especially in cases where segmentation task precedes labeling task. We present a method that performs joint decoding of separately trained Conditional Random Field (CRF) models, while guarding against violations of hard-constraints. Evaluated on Chinese word segmentation and part-of-speech (POS) tagging tasks, our proposed method achieved state-of-the-art performance on both the Penn Chinese Treebank and First SIGHAN Bakeoff datasets. On both segmentation and POS tagging tasks, the proposed method consistently improves over baseline methods that do not perform joint decoding. 1
Harnessing the Expertise of 70,000 Human Editors: Knowledge-Based Feature Generation for Text Categorization
"... Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Most existing methods for text categorization employ induction algorithms that use the words appearing in the training documents as features. While they perform well in many categorization tasks, these methods are inherently limited when faced with more complicated tasks where external knowledge is essential. Recently, there have been efforts to augment these basic features with external knowledge, including semi-supervised learning and transfer learning. In this work, we present a new framework for automatic acquisition of world knowledge and methods for incorporating it into the text categorization process. Our approach enhances machine learning algorithms with features generated from domain-specific and common-sense knowledge. This knowledge is represented by ontologies that contain hundreds of thousands of concepts, further enriched through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts that augment the bag of words used in simple supervised learning. Feature generation is accomplished through contextual analysis of document text, thus implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses two significant problems in natural language processing—synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the training documents alone. We applied our methodology using the Open Directory Project, the largest existing Web directory built by over 70,000 human editors. Experimental results over a range of datasets confirm improved performance compared to the bag of words document representation.
Two Algorithms for Transfer Learning
"... Summary. Transfer learning aims at improving the performance on a target task given some degree of learning on one or more source tasks. This chapter introduces two transfer learning algorithms that can be employed when the source and target domains share the same feature space and class labels. The ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Summary. Transfer learning aims at improving the performance on a target task given some degree of learning on one or more source tasks. This chapter introduces two transfer learning algorithms that can be employed when the source and target domains share the same feature space and class labels. The first algorithm is a hierarchical Bayesian extension of naive Bayes; the second is a version of logistic regression in which the prior distribution over the weight values is learned from an ensemble of source tasks. The methods are tested on a real-world task of predicting whether a person will accept or decline a meeting invitation. The results demonstrate consistent successful transfer of learning when there is an ensemble of source tasks. 1
Exploiting feature hierarchy for transfer learning in named entity recognition
- In ACL:HLT ’08
, 2008
"... We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. The problem of transfer learning, where information gained in one learning task is used t ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We present a novel hierarchical prior structure for supervised transfer learning in named entity recognition, motivated by the common structure of feature spaces for this task across natural language data sets. The problem of transfer learning, where information gained in one learning task is used to improve performance in another related task, is an important new area of research. In the subproblem of domain adaptation, a model trained over a source domain is generalized to perform well on a related target domain, where the two domains’ data are distributed similarly, but not identically. We introduce the concept of groups of closely-related domains, called genres, and show how inter-genre adaptation is related to domain adaptation. We also examine multitask learning, where two domains may be related, but where the concept to be learned in each case is distinct. We show that our prior conveys useful information across domains, genres and tasks, while remaining robust to spurious signals not related to the target domain and concept. We further show that our model generalizes a class of similar hierarchical priors, smoothed to varying degrees, and lay the groundwork for future exploration in this area. 1
Leveraging machine readable dictionaries in discriminative sequence models
- In Language Resources and Evaluation Conference, LREC 2006
, 2006
"... Many natural language processing tasks make use of a lexicon – typically the words collected from some annotated training data along with their associated properties. We demonstrate here the utility of corpora-independent lexicons derived from machine readable dictionaries. Lexical information is en ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Many natural language processing tasks make use of a lexicon – typically the words collected from some annotated training data along with their associated properties. We demonstrate here the utility of corpora-independent lexicons derived from machine readable dictionaries. Lexical information is encoded in the form of features in a Conditional Random Field tagger providing improved performance in cases where: i) limited training data is made available ii) the data is case-less and iii) the test data genre or domain is different than that of the training data. We show substantial error reductions, especially on unknown words, for the tasks of part-of-speech tagging and shallow parsing, achieving up to 20 % error reduction on Penn TreeBank part-of-speech tagging and up to a 15.7 % error reduction for shallow parsing using the CoNLL 2000 data. Our results here point towards a simple, but effective methodology for increasing the adaptability of text processing systems by training models with annotated data in one genre augmented with general lexical information or lexical information pertinent to the target genre (or domain). 1.

