Results 1 - 10
of
21
Data Integration: A Theoretical Perspective
- Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract
-
Cited by 585 (35 self)
- Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Data Integration under Integrity Constraints
- Information Systems
, 2002
"... Data integratio n systemspro vide accessto a seto fhetero - geneo us, auto no mo us data so urces thro ugh a so -called glo bal schema. There are basically two appro aches fo r designing a data integratio n system. In the glo bal-centric appro ach,o ne defines the elementso f the glo bal schema as v ..."
Abstract
-
Cited by 69 (18 self)
- Add to MetaCart
Data integratio n systemspro vide accessto a seto fhetero - geneo us, auto no mo us data so urces thro ugh a so -called glo bal schema. There are basically two appro aches fo r designing a data integratio n system. In the glo bal-centric appro ach,o ne defines the elementso f the glo bal schema as viewso ver the so urces, whereas in the lo cal-centric appro ach, o e characterizes the so rces as viewso ver theglo al schema. It is well kno wn that pro cessing queries in the latter appro ach is similar to query answering with inc o plete infoC atio , and, therefo9 is a c o plex task. On theo ther hand, it is a co mmo no pinio n that query pro cessing is much easier in the fo rmer appro ach. In this paper we sho w the surprising result that, when theglo al schema is expressed in the relatio al mo del with integrity c o straints, eveno f simple types, the pr o lemo f inco6 plete info rmatio n implicitly arises, making querypro cessing di#cult in the glo al-centric approC h as well. We thenfo cuso n glo al schemas with key andfo eign key co straints, which represents a situat io which is veryco#=W in practice, and we illustrate techniques fo e#ectively answering queries po sed to the data integratio n system in this case. 1
M.: Description logics for information integration
- Computational Logic: Logic Programming and Beyond. LNCS
, 2002
"... Abstract. Information integration is the problem of combining the data residing at different, heterogeneous sources, and providing the user with a unified view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the us ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
Abstract. Information integration is the problem of combining the data residing at different, heterogeneous sources, and providing the user with a unified view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the user. It is the task of the system to free the user from the knowledge on where data are, and how data are structured at the sources. In this chapter, we discuss data integration in general, and describe a logic-based approach to data integration. A logic of the Description Logics family is used to model the information managed by the integration system, to formulate queries posed to the system, and to perform several types of automated reasoning supporting both the modeling, and the query answering process. We focus, in particular, on a specific Description Logic, called DLR, specifically designed for database applications. In the chapter, we illustrate how DLR is used to model a mediated schema of an integration system, to specify the semantics of the data sources, and finally to support the query answering process by means of the associated reasoning methods. 1
M.: On the expressive power of data integration systems
, 2002
"... Abstract. There are basically two approaches for designing a data integration system. In the global-as-view (GAV) approach, one maps the concepts in the global schema to views over the sources, whereas in the local-as-view (LAV) approach, one maps the sources into views over the global schema. The g ..."
Abstract
-
Cited by 26 (6 self)
- Add to MetaCart
Abstract. There are basically two approaches for designing a data integration system. In the global-as-view (GAV) approach, one maps the concepts in the global schema to views over the sources, whereas in the local-as-view (LAV) approach, one maps the sources into views over the global schema. The goal of this paper is to relate the two approaches with respect to their expressive power. The analysis is carried out in a relational database setting, where both the queries on the global schema, and the views in the mapping are conjunctive queries. We introduce the notion of query-preserving transformation, and query-reducibility between data integration systems, and we show that, when no integrity constraints are allowed in global schema, the LAV and the GAV approaches are incomparable. We then consider the addition of integrity constraints in the global schema, and present techniques for query-preserving transformations in both directions. Finally, we show that our results imply that we can always transform any system following the GLAV approach (a generalization of both LAV and GAV) into a query-preserving GAV system. 1
On Answering Queries in the Presence of Limited Access Patterns
- In Proc. of ICDT 2001
, 2001
"... . In information-integration systems, source relations often have limitations on access patterns to their data; i.e., when one must provide values for certain attributes of a relation in order to retrieve its tuples. In this paper we consider the following fundamental problem: can we compute the ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
. In information-integration systems, source relations often have limitations on access patterns to their data; i.e., when one must provide values for certain attributes of a relation in order to retrieve its tuples. In this paper we consider the following fundamental problem: can we compute the complete answer to a query by accessing the relations with legal patterns? The complete answer to a query is the answer that we could compute if we could retrieve all the tuples from the relations. We give algorithms for solving the problem for various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons. We prove the problem is undecidable for datalog queries. If the complete answer to a query cannot be computed, we often need to compute its maximal answer. The second problem we study is, given two conjunctive queries on relations with limited access patterns, how to test whether the maximal answer to...
Accessing data integration systems through conceptual schemas
- In Proc. of the 20th Int. Conf. on Conceptual Modeling (ER 2001
, 2001
"... Abstract. Data integration systems provide access to a set of heterogeneous, autonomous data sources through a so-called global, or mediated view. There is a general consensus that the best way to describe the global view is through a conceptual data model, and that there are basically two approache ..."
Abstract
-
Cited by 18 (10 self)
- Add to MetaCart
Abstract. Data integration systems provide access to a set of heterogeneous, autonomous data sources through a so-called global, or mediated view. There is a general consensus that the best way to describe the global view is through a conceptual data model, and that there are basically two approaches for designing a data integration system. In the global-as-view approach, one defines the concepts in the global schema as views over the sources, whereas in the local-as-view approach, one characterizes the sources as views over the global schema. It is well known that processing queries in the latter approach is similar to query answering with incomplete information, and, therefore, is a complex task. On the other hand, it is a common opinion that query processing is much easier in the former approach. In this paper we show the surprising result that, when the global schema is expressed in terms of a conceptual data model, even a very simple one, query processing becomes difficult in the global-as-view approach also. We demonstrate that the problem of incomplete information arises in this case too, and we illustrate some basic techniques for effectively answering queries posed to the global schema of the data integration system. 1
Computing Complete Answers to Queries in the Presence of Limited Access Patterns
- Journal of VLDB
, 1999
"... In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In th ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
In data applications such as information integration, there can be limited access patterns to relations, i.e., binding patterns require values to be specified for certain attributes in order to retrieve data from a relation. As a consequence, we cannot retrieve all tuples from these relations. In this article we study the problem of computing the complete answer to a query, i.e., the answer that could be computed if all the tuples could be retrieved. A query is stable if for any instance of the relations in the query, its complete answer can be computed using the access patterns permitted by the relations. We study the problem of testing stability of various classes of queries, including conjunctive queries, unions of conjunctive queries, and conjunctive queries with arithmetic comparisons.
Survey on Methods for Query Rewriting and Query Answering Using Views
, 2001
"... A Data Integration System is constituted by three main components: source schemas, a global schema and a mapping between the two. There exist two main approaches for specifying the mapping: in the local-as-view (LAV) approach the source structures are de ned as views over the global schema; on t ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
A Data Integration System is constituted by three main components: source schemas, a global schema and a mapping between the two. There exist two main approaches for specifying the mapping: in the local-as-view (LAV) approach the source structures are de ned as views over the global schema; on the contrary in the global-as-view (GAV) approach each global concept is de ned in terms of a view over the source schemas. The problem of query processing is to nd ecient methods for answering queries posed to the global schema on the basis of the data stored at sources. In LAV there exist two approaches to query processing: by query rewriting, in which one tries to compute a rewriting of the query in terms of the views and then evaluates such a rewriting, and by query answering, in which one aims at directly answering the query based on the view extensions. In GAV, existing systems deal with query processing by simply unfolding each global concept in the query with its de nition in terms of the sources. In this paper, we survey the most important query processing algorithms proposed in the literature for LAV, and we describe the principal GAV data integration systems and the form of query processing they adopt.
Minimizing View Sets without Losing Query-Answering Power
, 2001
"... The problem of answering queries using views has been studied extensively due to its relevance in a wide variety of data-management applications. In these applications, we often need to select a subset of views to maintain due to limited resources. In this paper, we show that traditional query conta ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
The problem of answering queries using views has been studied extensively due to its relevance in a wide variety of data-management applications. In these applications, we often need to select a subset of views to maintain due to limited resources. In this paper, we show that traditional query containment is not a good basis for deciding whether or not a view should be selected. Instead, we should minimize the view set without losing its query-answering power. To formalize this notion, we first introduce the concept of "p-containment." That is, a view set V is p-contained in another view set W, if W can answer all the queries that can be answered by V. We show that p-containment and the traditional query containment are not related. We then discuss how to minimize a view set while retaining its query-answering power. We develop the idea further by considering p-containment of two view sets with respect to a given set of queries, and consider their relationship in terms o...
Answering Queries with Useful Bindings
- ACM Transactions on Database Systems (TODS
, 2001
"... this paper, we propose a query-planning framework to answer queries in the presence of limited access patterns. In the framework, a query and source descriptions are translated to a recursive datalog program. We then solve optimization problems in this framework, including how to decide whether acce ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
this paper, we propose a query-planning framework to answer queries in the presence of limited access patterns. In the framework, a query and source descriptions are translated to a recursive datalog program. We then solve optimization problems in this framework, including how to decide whether accessing off-query sources is necessary, how to choose useful sources for a query, and how to test query containment. We develop algorithms to solve these problems, and thus construct an efficient program to answer a query

