Results 1 -
4 of
4
Statistical Machine Translation for Query Expansion in Answer Retrieval
"... We present an approach to query expansion in answer retrieval that uses Statistical Machine Translation (SMT) techniques to bridge the lexical gap between questions and answers. SMT-based query expansion is done by i) using a full-sentence paraphraser to introduce synonyms in context of the entire q ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
We present an approach to query expansion in answer retrieval that uses Statistical Machine Translation (SMT) techniques to bridge the lexical gap between questions and answers. SMT-based query expansion is done by i) using a full-sentence paraphraser to introduce synonyms in context of the entire query, and ii) by translating query terms into answer terms using a full-sentence SMT model trained on question-answer pairs. We evaluate these global, context-aware query expansion techniques on tfidf retrieval from 10 million question-answer pairs extracted from FAQ pages. Experimental results show that SMTbased expansion improves retrieval performance over local expansion and over retrieval without expansion. 1
Flexible UIMA Components for Information Retrieval Research
"... In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA components is beneficial for configuration management, co ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In this paper, we present a suite of flexible UIMA-based components for information retrieval research which have been successfully used (and re-used) in several projects in different application domains. Implementing the whole system as UIMA components is beneficial for configuration management, component reuse, implementation costs, analysis and visualization.
Answering Learners ’ Questions by Retrieving Question Paraphrases from Social Q&A Sites
"... Information overload is a well-known problem which can be particularly detrimental to learners. In this paper, we propose a method to support learners in the information seeking process which consists in answering their questions by retrieving question paraphrases and their corresponding answers fro ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Information overload is a well-known problem which can be particularly detrimental to learners. In this paper, we propose a method to support learners in the information seeking process which consists in answering their questions by retrieving question paraphrases and their corresponding answers from social Q&A sites. Given the novelty of this kind of data, it is crucial to get a better understanding of how questions in social Q&A sites can be automatically analysed and retrieved. We discuss and evaluate several pre-processing strategies and question similarity metrics, using a new question paraphrase corpus collected from the WikiAnswers Q&A site. The results show that viable performance levels of more than 80 % accuracy can be obtained for the task of question paraphrase retrieval. 1
Collecting a Why-question corpus for development and evaluation of an automatic QA-system
"... Question answering research has only recently started to spread from short factoid questions to more complex ones. One significant challenge is the evaluation: manual evaluation is a difficult, time-consuming process and not applicable within efficient development of systems. Automatic evaluation re ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Question answering research has only recently started to spread from short factoid questions to more complex ones. One significant challenge is the evaluation: manual evaluation is a difficult, time-consuming process and not applicable within efficient development of systems. Automatic evaluation requires a corpus of questions and answers, a definition of what is a correct answer, and a way to compare the correct answers to automatic answers produced by a system. For this purpose we present a Wikipedia-based corpus of Whyquestions and corresponding answers and articles. The corpus was built by a novel method: paid participants were contacted through a Web-interface, a procedure which allowed dynamic, fast and inexpensive development of data collection methods. Each question in the corpus has several corresponding, partly overlapping answers, which is an asset when estimating the correctness of answers. In addition, the corpus contains information related to the corpus collection process. We believe this additional information can be used to post-process the data, and to develop an automatic approval system for further data collection projects conducted in a similar manner. 1

