Results 1 - 10
of
82
Extracting Relations from Large Plain-Text Collections
, 2000
"... Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables fr ..."
Abstract
-
Cited by 275 (21 self)
- Add to MetaCart
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use for answering precise queries or for running data mining tasks. We explore a technique for extracting such tables from document collections that requires only a handful of training examples from users. These examples are used to generate extraction patterns, that in turn result in new tuples being extracted from the document collection. We build on this idea and present our Snowball system. Snowball introduces novel strategies for generating patterns and extracting tuples from plain-text documents. At each iteration of the extraction process, Snowball evaluates the quality of these patterns and tuples without human intervention, and keeps only the most reliable ones for the next iteration. In this paper we also develop a scalable evaluation methodology and metrics for our task, and present a t...
A Logic-Based Theory of Deductive Arguments
, 2001
"... We explore a framework for argumentation (based on classical logic) in which an argument is a pair where the first item in the pair is a minimal consistent set of formulae that proves the second item (which is a formula). We provide some basic definitions for arguments, and various kinds of counter- ..."
Abstract
-
Cited by 69 (16 self)
- Add to MetaCart
We explore a framework for argumentation (based on classical logic) in which an argument is a pair where the first item in the pair is a minimal consistent set of formulae that proves the second item (which is a formula). We provide some basic definitions for arguments, and various kinds of counter-arguments (defeaters). This leads us to the definition of canonical undercuts which we argue are the only defeaters that we need to take into account. We then motivate and formalise the notion of argument trees and argument structures which provide a way of exhaustively collating arguments and counter-arguments. We use argument structures as the basis of our general proposal for argument aggregation.
A Web Browser for Small Terminals
- In Proc. UIST
, 1999
"... Abstract. We describe WEST, a WEb browser for Small Terminals, that aims to solve some of the problems associated with accessing web pages on hand-held devices. Through a novel combination of text reduction and focus+context visualization, users can access web pages from a very limited display envir ..."
Abstract
-
Cited by 52 (7 self)
- Add to MetaCart
Abstract. We describe WEST, a WEb browser for Small Terminals, that aims to solve some of the problems associated with accessing web pages on hand-held devices. Through a novel combination of text reduction and focus+context visualization, users can access web pages from a very limited display environment, since the system will provide an overview of the contents of a web page even when it is too large to be displayed in its entirety. To make maximum use of the limited resources available on a typical hand-held terminal, much of the most demanding work is done by a proxy server, allowing the terminal to concentrate on the task of providing responsive user interaction. The system makes use of some interaction concepts reminiscent of those defined in the Wireless Application Protocol (WAP), making it possible to utilize the techniques described here for WAP-compliant devices and services that may become available in the near future. Keywords. Hand-held devices, web browser, proxy systems, focus+context visualization, text reduction, flip zooming, WAP (wireless application protocol) 1
Querying Text Databases for Efficient Information Extraction
- In Proceedings of the 19th IEEE International Conference on Data Engineering (ICDE
, 2003
"... A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
A wealth of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for sophisticated query processing, for integration with relational databases, and for data mining. Current information extraction techniques extract relations from a text database by examining every document in the database, or use filters to select promising documents for extraction. The exhaustive scanning approach is not practical or even feasible for large databases, and the current filtering techniques require human involvement to maintain and to adopt to new databases and domains. In this paper, we develop an automatic query-based technique to retrieve documents useful for the extraction of user-defined relations from large text databases, which can be adapted to new domains, databases, or target relations with minimal human effort. We report a thorough experimental evaluation over a large newspaper archive that shows that we significantly improve the efficiency of the extraction process by focusing only on promising documents.
NYU: Description of the Proteus/PET system as used for MUC-7
- In Proceedings of the Seventh Message Understanding Conference (MUC-7
, 1998
"... Through the history of the MUC's, adapting Information Extraction (IE) systems to anew class of events has continued to beatime-consuming and expensive task. Since MUC-6, the Information Extraction e ort at NYU has focused on the problem of portability and customization, especially at the scenario l ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Through the history of the MUC's, adapting Information Extraction (IE) systems to anew class of events has continued to beatime-consuming and expensive task. Since MUC-6, the Information Extraction e ort at NYU has focused on the problem of portability and customization, especially at the scenario level. To begin to address this problem, we have built asetoftools, which allow theusertoadapt the system to new
Mining Reference Tables for Automatic Text Segmentation
- IN PROCEEDINGS OF THE TENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2004
"... Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data wareh ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data warehouses and relational databases to develop an automatic segmentation system. Thus, we overcome limitations of existing supervised text segmentation approaches, which require comprehensive manually labeled training data. Our segmentation system is robust, accurate, and efficient, and requires no additional manual effort. Thorough evaluation on real datasets demonstrates the robustness and accuracy of our system, with segmentation accuracy exceeding state of the art supervised approaches.
Protein Names And How To Find Them
, 2002
"... A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named e ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition might be regarded a solved problem in some domains, it still poses a signi cant challenge in others. In this work we focus on one of the more difficult tasks, the identification of protein names in text.
Ontology Research and Development. Part 2 - a Review of Ontology Mapping and Evolving
, 2002
"... This is the second of a two-part paper to review ontology research and development, in particular, ontology mapping and evolving. Ontology is defined as a formal explicit specification of a shared conceptualization. Ontology itself is not a static model so that it must have the potential to capture ..."
Abstract
-
Cited by 25 (1 self)
- Add to MetaCart
This is the second of a two-part paper to review ontology research and development, in particular, ontology mapping and evolving. Ontology is defined as a formal explicit specification of a shared conceptualization. Ontology itself is not a static model so that it must have the potential to capture changes of meanings and relations. As such, mapping and evolving ontologies is part of an essential task of ontology learning and development. Ontology mapping is concerned with reusing existing ontologies, expanding and combining them by some means and enabling a larger pool of information and knowledge in different domains to be integrated to support new communication and use. Ontology evolving, likewise, is concerned with maintaining existing ontologies and extending them as appropriate when new information or knowledge is acquired. It is apparent from the reviews that current research into semi-automatic or automatic ontology research in all the three aspects of generation, mapping and evolving have so far achieved limited success. Expert
Evaluating high accuracy retrieval techniques
- In Proceedings of SIGIR
, 2004
"... ABSTRACT Although information retrieval research has always been concernedwith improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists forhigh accuracy retrieval. This means that achieving high precision in the top document ranks ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
ABSTRACT Although information retrieval research has always been concernedwith improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists forhigh accuracy retrieval. This means that achieving high precision in the top document ranks is paramount. In this paper we presentwork aimed at achieving high accuracy in ad-hoc document retrieval by incorporating approaches from question answering (QA).We focus on getting the first relevant result as high as possible in the ranked list and argue that traditional precision and recall are notappropriate measures for evaluating this task. We instead use the mean reciprocal rank (MRR) of the first relevant result. We eval-uate three different methods for modifying queries to achieve high accuracy. The experiments done on TREC data provide support forthe approach of using MRR and incorporating QA techniques for getting high accuracy in ad-hoc retrieval task. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Soft-ware--Performance evaluation (efficiency and effectiveness) ; H.3.3 [Information Storage and Retrieval]: Information Search andRetrieval--Query formulation
A Hybrid Approach for QA Track Definitional Questions
- In Proc. of the 12 th Annual Text Retrieval Conference
, 2003
"... We present an overview of DefScriber, a system developed at Columbia University that combines knowledge-based and statistical methods to answer definitional questions of the form, “What is X? ” We discuss how DefScriber was applied to the definition questions in the TREC 2003 QA track main task. We ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
We present an overview of DefScriber, a system developed at Columbia University that combines knowledge-based and statistical methods to answer definitional questions of the form, “What is X? ” We discuss how DefScriber was applied to the definition questions in the TREC 2003 QA track main task. We conclude with an analysis of our system’s results on the definition questions. 1 1

