Results 1 - 10
of
89
Web Question Answering: Is More Always Better?
, 2002
"... This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online. Most question answering systems use a wide variety of linguistic resources. We focus instead on the redundancy available in large corpora as an important res ..."
Abstract
-
Cited by 107 (9 self)
- Add to MetaCart
This paper describes a question answering system that is designed to capitalize on the tremendous amount of data that is now available online. Most question answering systems use a wide variety of linguistic resources. We focus instead on the redundancy available in large corpora as an important resource. We use this redundancy to simplify the query rewrites that we need to use, and to support answer mining from returned snippets. Our system performs quite well given the simplicity of the techniques being utilized. Experimental results show that question answering accuracy can be greatly improved by analyzing more and more matching passages. Simple passage ranking and n-gram extraction techniques work well in our system making it efficient to use with many backend retrieval engines.
IBM's Statistical Question Answering System
- In Proceedings of the Tenth Text REtrieval Conference (TREC
, 2000
"... We describe the IBM Statistical Question Answering for TREC-9 system in detail and look at several examples and errors. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval whi ..."
Abstract
-
Cited by 81 (0 self)
- Add to MetaCart
We describe the IBM Statistical Question Answering for TREC-9 system in detail and look at several examples and errors. The system is an application of maximum entropy classification for question/answer type prediction and named entity marking. We describe our system for information retrieval which in the first step did document retrieval from a local encyclopedia, and in the second step performed an expansion of the query words and finally did passage retrieval from the TREC collection. We will also discuss the answer selection algorithm which determines the best sentence given both the question and the occurrence of a phrase belonging to the answer class desired by the question. Results at the 250 byte and 50 byte levels for the overall system as well as results on each subcomponent are presented.
Offline Strategies for Online Question Answering: Answering Questions Before . . .
, 2003
"... Recent work in Question Answering has focused on web-based systems that extract answers using simple lexicosyntactic patterns. We present an alternative strategy in which patterns are used to extract highly precise relational information offline, creating a data repository that is used to eff ..."
Abstract
-
Cited by 47 (9 self)
- Add to MetaCart
Recent work in Question Answering has focused on web-based systems that extract answers using simple lexicosyntactic patterns. We present an alternative strategy in which patterns are used to extract highly precise relational information offline, creating a data repository that is used to efficiently answer questions. We evaluate our strategy on a challenging subset of questions, i.e. "Who is ..." questions, against a state of the art web-based Question Answering system. Results indicate that the extracted relations answer 25% more questions correctly and do so three orders of magnitude faster than the state of the art system.
KnowItNow: Fast, scalable information extraction from the web
- IN PROCEEDINGS OF THE HUMAN LANGUAGE TECHNOLOGY CONFERENCE (HLT-EMNLP-05
, 2005
"... Numerous NLP applications rely on search-engine queries, both to extract information from and to compute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applications such as Information Extraction (IE) distribute their ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
Numerous NLP applications rely on search-engine queries, both to extract information from and to compute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applications such as Information Extraction (IE) distribute their query load over several days, making IE a slow, offline process. This paper introduces a novel architecture for IE that obviates queries to commercial search engines. The architecture is embodied in a system called KNOWITNOW that performs high-precision IE in minutes instead of days. We compare KNOWITNOW experimentally with the previouslypublished KNOWITALL system, and quantify the tradeoff between recall and speed. KNOWITNOW’s extraction rate is two to three orders of magnitude higher than KNOWITALL’s.
A Noisy-Channel Approach to Question Answering
, 2003
"... We introduce a probabilistic noisychannel model for question answering and we show how it can be exploited in the context of an end-to-end QA system. Our noisy-channel system outperforms a stateof -the-art rule-based QA system that uses similar resources. We also show that the model we propos ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
We introduce a probabilistic noisychannel model for question answering and we show how it can be exploited in the context of an end-to-end QA system. Our noisy-channel system outperforms a stateof -the-art rule-based QA system that uses similar resources. We also show that the model we propose is flexible enough to accommodate within one mathematical framework many QA-specific resources and techniques, which range from the exploitation of WordNet, structured, and semi-structured databases to reasoning, and paraphrasing.
Question answering from the web using knowledge annotation and knowledge mining techniques
- In Proc. CIKM
, 2003
"... We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Know ..."
Abstract
-
Cited by 40 (6 self)
- Add to MetaCart
We present a strategy for answering fact-based natural language questions that is guided by a characterization of realworld user queries. Our approach, implemented in a system called Aranea, extracts answers from the Web using two different techniques: knowledge annotation and knowledge mining. Knowledge annotation is an approach to answering large classes of frequently occurring questions by utilizing semistructured and structured Web sources. Knowledge mining is a statistical approach that leverages massive amounts of Web data to overcome many natural language processing challenges. We have integrated these two different paradigms into a question answering system capable of providing users with concise answers that directly address their information needs.
Towards terascale knowledge acquisition
- In Proceedings of Conference on Computational Linguistics (COLING-04
, 2004
"... Although vast amounts of textual data are freely available, many NLP algorithms exploit only a minute percentage of it. In this paper, we study the challenges of working at the terascale. We present an algorithm, designed for the terascale, for mining is-a relations that achieves similar performance ..."
Abstract
-
Cited by 40 (10 self)
- Add to MetaCart
Although vast amounts of textual data are freely available, many NLP algorithms exploit only a minute percentage of it. In this paper, we study the challenges of working at the terascale. We present an algorithm, designed for the terascale, for mining is-a relations that achieves similar performance to a state-of-the-art linguistically-rich method. We focus on the accuracy of these two systems as a function of processing time and corpus size. 1
Unsupervised Activity Recognition Using Automatically Mined Common Sense
- In AAAI
, 2005
"... A fundamental difficulty in recognizing human activities is obtaining the labeled data needed to learn models of those activities. Given emerging sensor technology, however, it is possible to view activity data as a stream of natural language terms. Activity models are then mappings from such t ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
A fundamental difficulty in recognizing human activities is obtaining the labeled data needed to learn models of those activities. Given emerging sensor technology, however, it is possible to view activity data as a stream of natural language terms. Activity models are then mappings from such terms to activity names, and may be extracted from text corpora such as the web.
What Makes a Good Answer? The Role of Context in Question Answering
- PROCEEDINGS OF INTERACT 2003
, 2003
"... Question answering systems have proven to be helpful to users because they can provide succinct answers that do not require users to wade through a large number of documents. However, despite recent advances in the underlying question answering technology, the problem of designing effective interfac ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
Question answering systems have proven to be helpful to users because they can provide succinct answers that do not require users to wade through a large number of documents. However, despite recent advances in the underlying question answering technology, the problem of designing effective interfaces has been largely unexplored. We conducted a user study to investigate this area and discovered that, overall, users prefer paragraph-sized chunks of text over just an exact phrase as the answer to their questions. Furthermore, users generally prefer answers embedded in context, regardless of the perceived reliability of the source documents. When researching a topic, increasing the amount of text returned to users significantly decreases the number of queries that they pose to the system, suggesting that users utilize supporting text to answer related questions. We believe that these results can serve to guide future developments in question answering interfaces.
Base noun phrase translation using web data and the EM algorithm
- In Proceedings of CoLing
, 2002
"... We consider here the problem of Base Noun Phrase translation. We propose a new method to perform the task. For a given Base NP, we first search its translation candidates from the web. We next determine the possible translation(s) from among the candidates using one of the two methods that we have d ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
We consider here the problem of Base Noun Phrase translation. We propose a new method to perform the task. For a given Base NP, we first search its translation candidates from the web. We next determine the possible translation(s) from among the candidates using one of the two methods that we have developed. In one method, we employ an ensemble of Naïve Bayesian Classifiers constructed with the EM Algorithm. Inthe other method, we use TF-IDF vectors also constructed with the EM Algorithm. Experimental results indicate that the coverage and accuracy of our method are significantly better than those of the baseline methods relying on existing technologies. 1.

