Results 1 - 10
of
60
Data Integration: A Theoretical Perspective
- Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract
-
Cited by 585 (35 self)
- Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Eddies: Continuously Adaptive Query Processing
- In SIGMOD
, 2000
"... In large federated and shared-nothing databases, resources can exhibit widely fluctuating characteristics. Assumptions made at the time a query is submitted will rarely hold throughout the duration of query processing. As a result, traditional static query optimization and execution techniques are i ..."
Abstract
-
Cited by 301 (19 self)
- Add to MetaCart
In large federated and shared-nothing databases, resources can exhibit widely fluctuating characteristics. Assumptions made at the time a query is submitted will rarely hold throughout the duration of query processing. As a result, traditional static query optimization and execution techniques are ineffective in these environments. In this paper we introduce a query processing mechanism called an eddy, which continuously reorders operators in a query plan as it runs. We characterize the moments of symmetry during which pipelined joins can be easily reordered, and the synchronization barriers that require inputs from different sources to be coordinated. By combining eddies with appropriate join algorithms, we merge the optimization and execution phases of query processing, allowing each tuple to have a flexible ordering of the query operators. This flexibility is controlled by a combination of fluid dynamics and a simple learning algorithm. Our initial implementation demonstrates prom...
Navigational Plans For Data Integration
- In Proceedings of the National Conference on Artificial Intelligence (AAAI
, 1999
"... We consider the problem of building data integration systems when the data sources are webs of data, rather than sets of relations. Previous approaches to modeling data sources are inappropriate in this context because they do not capture the relationships between linked data and the need to navigat ..."
Abstract
-
Cited by 112 (3 self)
- Add to MetaCart
We consider the problem of building data integration systems when the data sources are webs of data, rather than sets of relations. Previous approaches to modeling data sources are inappropriate in this context because they do not capture the relationships between linked data and the need to navigate through paths in the data source in order to obtain the data. We describe a language for modeling data sources in this new context. We show that our language has the required expressive power, and that minor extensions to it would make query answering intractable. We provide a sound and complete algorithm for reformulating a user query into a query over the data sources, and we show how to create query execution plans that both query and navigate the data sources. Introduction The purpose of data integration is to provide a uniform interface to a multitude of data sources. Data integration applications arise frequently as corporations attempt to provide their customers and employees wit...
UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion
, 2000
"... This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using structural recursion, which is introduced as a top- ..."
Abstract
-
Cited by 103 (4 self)
- Add to MetaCart
This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it can be expressed using structural recursion, which is introduced as a top-down, recursive function, similar to the way XSL is defined on XML trees. On cyclic data, structural recursion can be defined in two equivalent ways: as a recursive function which evaluates the data top-down and remembers all its calls to avoid infinite loops, or as a bulk evaluation which processes the entire data in parallel using only traditional relational algebra operators. The latter makes it possible for optimization techniques in relational queries to be applied to structural recursion. We show that the composition of two structural recursion queries can be expressed as a single such query, and this is used as the basis of an optimization method for mediator systems. Several other fo...
Efficient and Extensible Algorithms for Multi Query Optimization
, 2000
"... Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multi-query optimization aims at exploiting common subexpressi ..."
Abstract
-
Cited by 99 (4 self)
- Add to MetaCart
Complex queries are becoming commonplace, with the growing use of decision support systems. These complex queries often have a lot of common sub-expressions, either within a single query, or across multiple such queries run as a batch. Multi-query optimization aims at exploiting common subexpressions to reduce evaluation cost. Multi-query optimization has hither-to been viewed as impractical, since earlier algorithms were exhaustive, and explore a doubly exponential search space. In this paper we demonstrate that multi-query optimization using heuristics is practical, and provides significant benefits. We propose three cost-based heuristic algorithms: Volcano-SH and Volcano-RU, which are based on simple modifications to the Volcano search strategy, and a greedy heuristic. Our greedy heuristic incorporates novel optimizations that improve efficiency greatly. Our algorithms are designed to be easily added to existing optimizers. We present a performance study comparing the algo...
Answering XML Queries over Heterogeneous Data Sources
, 2001
"... This work describes an architecture for integrating heterogeneous data sources under an XML global schema, following the local-as-view approach (local sources' schemas are described as views over the global schema). In this context, we focus on the problem of translating the user's query against the ..."
Abstract
-
Cited by 67 (2 self)
- Add to MetaCart
This work describes an architecture for integrating heterogeneous data sources under an XML global schema, following the local-as-view approach (local sources' schemas are described as views over the global schema). In this context, we focus on the problem of translating the user's query against the XML global schema into a SQL query over the local data sources.
Integrating Keyword Search into XML Query Processing
, 2000
"... Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data whose structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unkn ..."
Abstract
-
Cited by 65 (1 self)
- Add to MetaCart
Due to the popularity of the XML data format, several query languages for XML have been proposed, specially devised to handle data whose structure is unknown, loose, or absent. While these languages are rich enough to allow for querying the content and structure of an XML document, a varying or unknown structure can make formulating queries a very difficult task. We propose an extension to XML query languages that enables keyword search at the granularity of XML elements, that helps novice users formulate queries, and also yields new optimization opportunities for the query processor. We present an implementation of this extension on top of a commercial RDBMS; we then discuss implementation choices and performance results. Keywords XML query processing, full-text index 1 Introduction There is no doubt that XML is rapidly becoming one of the most important data formats. It is already used for scientific data (e.g., DNA sequences), in linguistics (e.g., the Treebank database at the U...
On Wrapping Query Languages and Efficient XML Integration
, 2000
"... Modern applications (Web portals, digital libraries, etc.) require integrated access to various information sources (from traditional DBMS to semistructured Web repositories), fast deployment and low maintenance cost in a rapidly evolving environment. Because of its flexibility, there is an increasi ..."
Abstract
-
Cited by 56 (10 self)
- Add to MetaCart
Modern applications (Web portals, digital libraries, etc.) require integrated access to various information sources (from traditional DBMS to semistructured Web repositories), fast deployment and low maintenance cost in a rapidly evolving environment. Because of its flexibility, there is an increasing interest in using XML as a middleware model for such applications. XML enables fast wrapping and declarative integration. However, query processing in XML-based integration systems is still penalized by the lack of an algebra with adequate optimization properties and the difficulty to understand source query capabilities. In this paper, we propose an algebraic approach to support efficient XML query evaluation. We define a general purpose algebra suitable for semistructured or XML query languages. We show how this algebra can be used, with appropriate type information, to also wrap more structured query languages such as OQL or SQL. Finally, we develop new optimization techniques for XML-...
Declarative Specification of Web Sites with STRUDEL
, 2000
"... Strudel is a system for implementing data-intensive Web sites, which typically integrate information from multiple data sources and have complex structure. Strudel's key idea is separating the management of a Web site's data, the specification of its content and structure, and the visual representat ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
Strudel is a system for implementing data-intensive Web sites, which typically integrate information from multiple data sources and have complex structure. Strudel's key idea is separating the management of a Web site's data, the specification of its content and structure, and the visual representation of its pages. Strudel provides a declarative query language for specifying a site's content and structure, and a simple template language for specifying a site's HTML representation. This paper contains a comprehensive description of the Strudel system and details the benefits of declarative site specification. We describe our experiences using Strudel in a production application and describe three different, but complementary, systems that extend and improve upon Strudel's original ideas. Key words Web-site management, declarative query languages 1 Introduction Web sites have become the principal mechanism for disseminating and accessing information on the Internet and on corporation...

