Results 1 - 10
of
30
Organizing and searching the World Wide Web of facts - step one: the one-million fact extraction challenge
- In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06
, 2006
"... Due to the inherent difficulty of processing noisy text, the potential of the Web as a decentralized repository of human knowledge remains largely untapped during Web search. The access to billions of binary relations among named entities would enable new search paradigms and alternative methods for ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
Due to the inherent difficulty of processing noisy text, the potential of the Web as a decentralized repository of human knowledge remains largely untapped during Web search. The access to billions of binary relations among named entities would enable new search paradigms and alternative methods for presenting the search results. A first concrete step towards building large searchable repositories of factual knowledge is to derive such knowledge automatically at large scale from textual documents. Generalized contextual extraction patterns allow for fast iterative progression towards extracting one million facts of a given type (e.g., Person-BornIn-Year) from 100 million Web documents of arbitrary quality. The extraction starts from as few as 10 seed facts, requires no additional input knowledge or annotated text, and emphasizes scale and coverage by avoiding the use of syntactic parsers, named entity recognizers, gazetteers, and similar text processing tools and resources.
Towards terascale knowledge acquisition
- In Proceedings of Conference on Computational Linguistics (COLING-04
, 2004
"... Although vast amounts of textual data are freely available, many NLP algorithms exploit only a minute percentage of it. In this paper, we study the challenges of working at the terascale. We present an algorithm, designed for the terascale, for mining is-a relations that achieves similar performance ..."
Abstract
-
Cited by 40 (10 self)
- Add to MetaCart
Although vast amounts of textual data are freely available, many NLP algorithms exploit only a minute percentage of it. In this paper, we study the challenges of working at the terascale. We present an algorithm, designed for the terascale, for mining is-a relations that achieves similar performance to a state-of-the-art linguistically-rich method. We focus on the accuracy of these two systems as a function of processing time and corpus size. 1
Answer Selection in a Multi-Stream Open Domain Question Answering System
- Proceedings 26th European Conference on Information Retrieval (ECIR’04),, volume 2997 of LNCS
, 2004
"... Question answering systems aim to meet users' information needs by returning exact answers in response to a question. Traditional open domain question answering systems are built around a single pipeline architecture. In an attempt to exploit multiple resources as well as multiple answering stra ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
Question answering systems aim to meet users' information needs by returning exact answers in response to a question. Traditional open domain question answering systems are built around a single pipeline architecture. In an attempt to exploit multiple resources as well as multiple answering strategies, systems based on a multi-stream architecture have recently been introduced. Such systems face the challenging problem of having to select a single answer from pools of answers obtained using essentially di#erent techniques. We report on experiments aimed at understanding and evaluating the e#ect of di#erent options for answer selection in a multi-stream question answering system.
The Omega Ontology
- In prep
, 2005
"... We present the Omega ontology, a large terminological ontology obtained by remerging WordNet and Mikrokosmos, adding information from various other sources, and subordinating the result to a newly designed feature-oriented upper model. We explain the organizing principles of the representation used ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
We present the Omega ontology, a large terminological ontology obtained by remerging WordNet and Mikrokosmos, adding information from various other sources, and subordinating the result to a newly designed feature-oriented upper model. We explain the organizing principles of the representation used for Omega and discuss the methodology used to merge the constituent conceptual hierarchies. We survey a range of auxiliary knowledge sources (including instances, verb frame annotations, and domainspecific sub-ontologies) incorporated into the basic conceptual structure and applications that have benefited from Omega. Omega is available for browsing at
Instance-based question answering: A data driven approach
- In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP-04
, 2004
"... Anticipating the availability of large questionanswer datasets, we propose a principled, datadriven Instance-Based approach to Question Answering. Most question answering systems incorporate three major steps: classify questions according to answer types, formulate queries for document retrieval, an ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Anticipating the availability of large questionanswer datasets, we propose a principled, datadriven Instance-Based approach to Question Answering. Most question answering systems incorporate three major steps: classify questions according to answer types, formulate queries for document retrieval, and extract actual answers. Under our approach, strategies for answering new questions are directly learned from training data. We learn models of answer type, query content, and answer extraction from clusters of similar questions. We view the answer type as a distribution, rather than a class in an ontology. In addition to query expansion, we learn general content features from training data and use them to enhance the queries. Finally, we treat answer extraction as a binary classification problem in which text snippets are labeled as correct or incorrect answers. We present a basic implementation of these concepts that achieves a good performance on TREC test data. 1
The University of Amsterdam at the TREC 2003 Question Answering Track
- In Proceedings TREC 2003
, 2003
"... We describe our participation in the TREC 2003 Question Answering track. We explain the ideas underlying our approaches to the task, report on our results, provide an error analysis, and give a summary of our findings so far. ..."
Abstract
-
Cited by 17 (12 self)
- Add to MetaCart
We describe our participation in the TREC 2003 Question Answering track. We explain the ideas underlying our approaches to the task, report on our results, provide an error analysis, and give a summary of our findings so far.
Information Extraction for Question Answering: Improving Recall Through Syntactic Patterns
- In Coling 2004
, 2004
"... We investigate the impact of the precision/recall trade-off of information extraction on the performance of an offline corpus-based question answering (QA) system. One of our findings is that, because of the robust final answer selection mechanism of the QA system, recall is more important. ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
We investigate the impact of the precision/recall trade-off of information extraction on the performance of an offline corpus-based question answering (QA) system. One of our findings is that, because of the robust final answer selection mechanism of the QA system, recall is more important.
Multi-document person name resolution
- In Proceedings of ACL-42, Reference Resolution Workshop
, 2004
"... Multi-document person name resolution focuses on the problem of determining if two instances with the same name and from different documents refer to the same individual. We present a two-step approach in which a Maximum Entropy model is trained to give the probability that two names refer to the sa ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Multi-document person name resolution focuses on the problem of determining if two instances with the same name and from different documents refer to the same individual. We present a two-step approach in which a Maximum Entropy model is trained to give the probability that two names refer to the same individual. We then apply a modified agglomerative clustering technique to partition the instances according to their referents. 1 Intro Artists and philosophers have long noted that multiple distinct entities are often referred to by one and the same name (Cohen and Cohen, 1998; Martinich, 2000). Recently, this referential ambiguity of names has become of increasing concern to computational linguists, as well. As the Internet increases in size and coverage, it becomes less and less likely that a single name will refer to the same individual on two different web sites. This poses a great challenge to information retrieval (IR) and question-answering (QA) applications, which often rely on little data when responding to user queries. Another area in which referential ambiguity is problematic involves the automatic population of ontologies with instances. For such tasks, conceptinstance pairs (such as Paul Simon/pop star) are extracted from the web, cleaned of noise, and then inserted into an already existing ontology. The
A Bootstrapping Algorithm for automatically harvesting semantic relations
- in Proceedings of Inference in Computational Semantics (ICoS-06
, 2006
"... In this paper, we present Espresso, a weakly-supervised iterative algorithm combined with a web-based knowledge expansion technique, for extracting binary semantic relations. Given a small set of seed instances for a particular relation, the system learns lexical patterns, applies them to extract ne ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this paper, we present Espresso, a weakly-supervised iterative algorithm combined with a web-based knowledge expansion technique, for extracting binary semantic relations. Given a small set of seed instances for a particular relation, the system learns lexical patterns, applies them to extract new instances, and then uses the Web to filter and expand the instances. Preliminary experiments show that Espresso extracts highly precise lists of a wide variety of semantic relations when compared with two state of the art systems. 1.
Automatic discovery of attribute words from Web documents
- In Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP-05), pages 106–118, Jeju Island, Korea
, 2005
"... Abstract. We propose a method of acquiring attribute words for a wide range of objects from Japanese Web documents. The method is a simple unsupervised method that utilizes the statistics of words, lexico-syntactic patterns, and HTML tags. To evaluate the attribute words, we also establish criteria ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Abstract. We propose a method of acquiring attribute words for a wide range of objects from Japanese Web documents. The method is a simple unsupervised method that utilizes the statistics of words, lexico-syntactic patterns, and HTML tags. To evaluate the attribute words, we also establish criteria and a procedure based on question-answerability about the candidate word. 1

