Results 1 - 10
of
281
Data Integration: A Theoretical Perspective
- Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract
-
Cited by 965 (45 self)
- Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Answering Queries Using Views: A Survey
, 2000
"... The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a w ..."
Abstract
-
Cited by 562 (32 self)
- Add to MetaCart
The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.
A Query Language for XML
, 1998
"... An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, ..."
Abstract
-
Cited by 340 (22 self)
- Add to MetaCart
An important application of XML is the interchange of electronic data (EDI) between multiple data sources on the Web. As XML data proliferates on the Web, applications will need to integrate and aggregate data from multiple source and clean and transform data to facilitate exchange. Data extraction, conversion, transformation, and integration are all well-understood database problems, and their solutions rely on a query language. We present a query language for XML, called XML-QL, which we argue is suitable for performing the above tasks. XML-QL is a declarative, "relational complete" query language and is simple enough that it can be optimized. XML-QL can extract data from existing XML documents and construct new XML documents. Keywords: XML, query languages, electronic-data interchange (EDI) 1. Introduction The goal of XML is to provide many of SGML's benefits not available in HTML and to provide them in a language that is easier to learn and use than complete SGML. These benefits...
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 320 (3 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Wrapper Induction: Efficiency and Expressiveness
- Artificial Intelligence
, 2000
"... The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, event listings, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually fo ..."
Abstract
-
Cited by 267 (11 self)
- Add to MetaCart
(Show Context)
The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, event listings, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Most systems use customized wrapper procedures to perform this extraction task. Unfortunately, writing wrappers is tedious and error-prone. As an alternative, we advocate wrapper induction, a technique for automatically constructing wrappers. In this article, we describe six wrapper classes, and use a combination of empirical and analytical techniques to evaluate the computational tradeoffs among them. We first consider expressiveness: how well the classes can handle actual Internet resources, and the extent to which wrappers in one class can mimic those in another. We then...
Catching the Boat with Strudel: Experiences with a Web-Site Management System
, 1998
"... The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation and management of the site's structure, and the visual presentation of the site's pages. Fir ..."
Abstract
-
Cited by 211 (23 self)
- Add to MetaCart
The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation and management of the site's structure, and the visual presentation of the site's pages. First, the site builder creates a uniform model of all data available at the site. Second, the builder uses this model to declaratively define the Web site's structure by applying a "site-definition query" to the underlying data. The result of evaluating this query is a "site graph", which represents both the site's content and structure. Third, the builder specifies the visual presentation of pages in Strudel's HTML-template language. The data model underlying Strudel is a semi-structured model of labeled directed graphs. We describe Strudel's key characteristics, report on our experiences using Strudel, and present the technical problems that arose from our experience. We describe our experience constructing sev...
Query Optimization for XML
- In Proceedings of VLDB
, 1999
"... XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structu ..."
Abstract
-
Cited by 208 (3 self)
- Add to MetaCart
XML is an emerging standard for data representation and exchange on the World-Wide Web. Due to the nature of information on the Web and the inherent flexibility of XML, we expect that much of the data encoded in XML will be semistructured:the data may be irregular or incomplete, and its structure may change rapidly or unpredictably. This paper describes the query processor of Lore,aDBMS for XML-based data supporting an expressive query language. We focus primarily on Lore's cost-based query optimizer. While all of the usual problems associated with cost-based query optimization apply to XML-based query languages, a number of additional problems arise, such as new kinds of indexing, more complicated notions of database statistics, and vastly different query execution strategies for different databases. We define appropriate logical and physical query plans, database statistics, and a cost model, and we describe plan enumeration including heuristics for reducing the large search space. Our optimizer is fully implemented in Lore and preliminary performance results are reported.