Results 1 -
8 of
8
Automatic Keyphrase Extraction by Bridging Vocabulary Gap
, 2011
"... Keyphrase extraction aims to select a set of terms from a document as a short summary of the document. Most methods extract keyphrases according to their statistical properties in the given document. Appropriate keyphrases, however, are not always statistically significant or even do not appear in t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Keyphrase extraction aims to select a set of terms from a document as a short summary of the document. Most methods extract keyphrases according to their statistical properties in the given document. Appropriate keyphrases, however, are not always statistically significant or even do not appear in the given document. This makes a large vocabulary gap between a document and its keyphrases. In this paper, we consider that a document and its keyphrases both describe the same object but are written in two different languages. By regarding keyphrase extraction as a problem of translating from the language of documents to the language of keyphrases, we use word alignment models in statistical machine translation to learn translation probabilities between the words in documents and the words in keyphrases. According to the translation model, we suggest keyphrases given a new document. The suggested keyphrases are not necessarily statistically frequent in the document, which indicates that our method is more flexible and reliable. Experiments on news articles demonstrate that our method outperforms existing unsupervised methods on precision, recall and F-measure.
Learning Lexicon Models from Search Logs for Query Expansion
"... This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods. 1
PARAPHRASE AND TEXTUAL ENTAILMENT RECOGNITION AND GENERATION
"... Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads ( ..."
Abstract
- Add to MetaCart
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often very similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. In this thesis, we focus on paraphrase and textual entailment recognition, as well as paraphrase generation. We propose three paraphrase and textual entailment recognition methods, experimentally evaluated on existing benchmarks. The key idea is that by capturing similarities at various abstractions of the inputs, we can recognize paraphrases and textual entailment reasonably well. Additionally, we exploit WordNet and use features that operate on the syntactic level of the language expressions. The best
A Generate and Rank Approach to Sentence Paraphrasing
"... We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality o ..."
Abstract
- Add to MetaCart
We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind. 1
A Simple Word Trigger Method for Social Tag Suggestion
"... It is popular for users in Web 2.0 era to freely annotate online resources with tags. To ease the annotation process, it has been great interest in automatic tag suggestion. We propose a method to suggest tags according to the text description of a resource. By considering both the description and t ..."
Abstract
- Add to MetaCart
It is popular for users in Web 2.0 era to freely annotate online resources with tags. To ease the annotation process, it has been great interest in automatic tag suggestion. We propose a method to suggest tags according to the text description of a resource. By considering both the description and tags of a given resource as summaries to the resource written in two languages, we adopt word alignment models in statistical machine translation to bridge their vocabulary gap. Based on the translation probabilities between the words in descriptions and the tags estimated on a large set of description-tags pairs, we build a word trigger method (WTM) to suggest tags according to the words in a resource description. Experiments on real world datasets show that WTM is effective and robust compared with other methods. Moreover, WTM is relatively simple and efficient, which is practical for Web applications. 1
Answer Sentence Retrieval by Matching Dependency Paths Acquired from Question/Answer Sentence Pairs
"... ..."
Voice Query Refinement
"... We describe a system for the refinement of spoken search queries. Given an initial query (Northern Italian restaurants in New York), instead of requiring a fully-specified followup query (Korean restaurants in New York), a more natural, abbreviated update query (Korean instead) may be spoken. The sy ..."
Abstract
- Add to MetaCart
We describe a system for the refinement of spoken search queries. Given an initial query (Northern Italian restaurants in New York), instead of requiring a fully-specified followup query (Korean restaurants in New York), a more natural, abbreviated update query (Korean instead) may be spoken. The system consists of a parsing step to identify the type and arguments of the refinement, a candidate generation step to enumerate the possible refinements, and a model classification step to select the best refinement. We present results on test query refinements given both to this system and to human judges that show the automated system outperforms the human judges on that data set. Index terms: spoken dialog systems, voice search, query refinement 1.
Towards Concept-Based Translation Models Using Search Logs for Query Expansion
"... Query logs have been successfully used to improve Web search. One of the directions exploits user clickthrough data to extract related terms to a query to perform query expansion (QE). However, term relations have been created between isolated terms without considering their context, giving rise to ..."
Abstract
- Add to MetaCart
Query logs have been successfully used to improve Web search. One of the directions exploits user clickthrough data to extract related terms to a query to perform query expansion (QE). However, term relations have been created between isolated terms without considering their context, giving rise to the problem of term ambiguity. To solve this problem, we propose several ways to place terms in their contexts. On the one hand, contiguous terms can form a phrase; and on the other hand, terms at proximity can provide less strict but useful contextual constraints mutually. Relations extracted between such more constrained groups of terms are expected to be less noisy than those between single terms. In this paper, the constrained groups of terms are called concepts. We exploit user query logs to build statistical translation models between concepts, which are then used for QE. We perform experiments on the Web search task using a real world data set. Results show that the concept-based statistical translation model trained on clickthrough data outperforms significantly other state-of-the-art QE systems.

