Declaration (2008)
BibTeX
@MISC{Williams08declaration,
author = {Dean Williams},
title = {Declaration},
year = {2008}
}
OpenURL
Abstract
Information Extraction Improving the ability of computer systems to process text is a significant research challenge. Many applications are based on partially structured databases, where structured data conforming to a schema is combined with free text. Information is stored as text in these applications because the queries required are not all known in advance – allowing for text is an attempt to capture information that could be relevant in the future but cannot be anticipated when the database schema is being designed. Text is also used due to the limitations of conventional databases, where the schema cannot easily be extended as new entity types and relationships arise in the future. Information Extraction (IE) is the process of finding instances of pre-defined entity types within text, while Data Integration systems build a virtual global schema from available structured data sources. We argue that combining techniques from IE and data integration is a promising approach for supporting applications that access partially structured data: the virtual global schema and associated metadata can be used to partially configure an IE process, and the information extracted by the IE process can then be integrated into the virtual global database, supporting queries which could not otherwise be answered. In this thesis we describe the design and implementation of the Experimental System To Extract Structure from Text (ESTEST) that investigates this approach. We 3 give examples of its use and experimental results from a number of application domains.







