Results 1 - 10
of
15
Data Integration: A Theoretical Perspective
- Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Abstract
-
Cited by 585 (35 self)
- Add to MetaCart
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Query containment for data integration systems
- In Proc. of PODS 2000
, 2000
"... The problem of query containment is fundamental to many aspects of database systems,including query optimization,determining independence of queries from updates,and rewriting queries using views. In the data-integration framework,however,the standard notion of query containment does not suffice. We ..."
Abstract
-
Cited by 49 (1 self)
- Add to MetaCart
The problem of query containment is fundamental to many aspects of database systems,including query optimization,determining independence of queries from updates,and rewriting queries using views. In the data-integration framework,however,the standard notion of query containment does not suffice. We define relative containment,which formalizes the notion of query containment relative to the sources available to the data-integration system. First,we provide optimal bounds for relative containment for several important classes of datalog queries,including the common case of conjunctive queries. Next,we provide bounds for the case when sources enforce access restrictions in the form of binding pattern constraints. Surprisingly,we show that relative containment for conjunctive queries is still decidable in this case,even though it is known that finding all answers to such queries may require a recursive datalog program over the sources. Finally,we provide tight bounds for variants of relative containment when the queries and source descriptions may contain comparison predicates.
M.: Description logics for information integration
- Computational Logic: Logic Programming and Beyond. LNCS
, 2002
"... Abstract. Information integration is the problem of combining the data residing at different, heterogeneous sources, and providing the user with a unified view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the us ..."
Abstract
-
Cited by 31 (4 self)
- Add to MetaCart
Abstract. Information integration is the problem of combining the data residing at different, heterogeneous sources, and providing the user with a unified view of these data, called mediated schema. The mediated schema is therefore a reconciled view of the information, which can be queried by the user. It is the task of the system to free the user from the knowledge on where data are, and how data are structured at the sources. In this chapter, we discuss data integration in general, and describe a logic-based approach to data integration. A logic of the Description Logics family is used to model the information managed by the integration system, to formulate queries posed to the system, and to perform several types of automated reasoning supporting both the modeling, and the query answering process. We focus, in particular, on a specific Description Logic, called DLR, specifically designed for database applications. In the chapter, we illustrate how DLR is used to model a mediated schema of an integration system, to specify the semantics of the data sources, and finally to support the query answering process by means of the associated reasoning methods. 1
Lossless Regular Views
- In Proc. of PODS 2002
, 2002
"... If the only information we have on a certain database is through a set of views, the question arises of whether this is sucient to answer completely a given query. We say that the set of views is lossless with respect to the query, if, no matter what the database is, we can answer the query by solel ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
If the only information we have on a certain database is through a set of views, the question arises of whether this is sucient to answer completely a given query. We say that the set of views is lossless with respect to the query, if, no matter what the database is, we can answer the query by solely relying on the content of the views. The question of losslessness has various applications, for example in query optimization, mobile computing, data warehousing, and data integration. We study this problem in a context where the database is semistructured, and both the query and the views are expressed as regular path queries. The form of recursion present in this class prevents us from applying known results to our case.
View-Based Query Containment
- In Proc. of PODS 2003
, 2003
"... Query containment is the problem of checking whether for all databases the answer to a query is a subset of the answer to a second query. In several data management tasks, such as data integration, mobile computing, etc., the data of interest are only accessible through a given set of views. In this ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Query containment is the problem of checking whether for all databases the answer to a query is a subset of the answer to a second query. In several data management tasks, such as data integration, mobile computing, etc., the data of interest are only accessible through a given set of views. In this case, containment of queries should be determined relative to the set of views, as already noted in the literature. Such a form of containment, which we call view-based query containment, is the subject of this paper. The problem comes in various forms, depending on whether each of the two queries is expressed over the base alphabet or the alphabet of the view names. We present a thorough analysis of view-based query containment, by discussing all possible combinations from a semantic point of view, and by showing their mutual relationships. In particular, for the two settings of conjunctive queries and two-way regular path queries, we provide both techniques and complexity bounds for the different variants of the problem. Finally, we study the relationship between view-based query containment and view-based query rewriting.
Processing Unions of Conjunctive Queries with Negation under Limited Access Patterns
- IN EDBT
, 2004
"... We study the problem of answering queries over sources with limited access patterns. The problem is to decide whether a given query Q is feasible, i.e., equivalent to an executable query Q # that observes the limited access patterns given by the sources. We characterize the complexity of decidin ..."
Abstract
-
Cited by 15 (8 self)
- Add to MetaCart
We study the problem of answering queries over sources with limited access patterns. The problem is to decide whether a given query Q is feasible, i.e., equivalent to an executable query Q # that observes the limited access patterns given by the sources. We characterize the complexity of deciding feasibility for the classes CQ (conjunctive queries with negation) and UCQ (unions of CQ queries): Testing feasibility is just as hard as testing containment and therefore # 2 -complete. We also provide a uniform treatment for CQ, UCQ, CQ , and UCQ by devising a single algorithm which is optimal for each of these classes. In addition, we show how one can often avoid the worst-case complexity by certain approximations: At compile-time, even if a query Q is not feasible, we can find e#ciently the minimal executable query containing Q. For query answering at runtime, we devise an algorithm which may report complete answers even in the case of infeasible plans and which can indicate to the user the degree of completeness for certain incomplete answers.
Survey on Methods for Query Rewriting and Query Answering Using Views
, 2001
"... A Data Integration System is constituted by three main components: source schemas, a global schema and a mapping between the two. There exist two main approaches for specifying the mapping: in the local-as-view (LAV) approach the source structures are de ned as views over the global schema; on t ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
A Data Integration System is constituted by three main components: source schemas, a global schema and a mapping between the two. There exist two main approaches for specifying the mapping: in the local-as-view (LAV) approach the source structures are de ned as views over the global schema; on the contrary in the global-as-view (GAV) approach each global concept is de ned in terms of a view over the source schemas. The problem of query processing is to nd ecient methods for answering queries posed to the global schema on the basis of the data stored at sources. In LAV there exist two approaches to query processing: by query rewriting, in which one tries to compute a rewriting of the query in terms of the views and then evaluates such a rewriting, and by query answering, in which one aims at directly answering the query based on the view extensions. In GAV, existing systems deal with query processing by simply unfolding each global concept in the query with its de nition in terms of the sources. In this paper, we survey the most important query processing algorithms proposed in the literature for LAV, and we describe the principal GAV data integration systems and the form of query processing they adopt.
Answering Queries with Useful Bindings
- ACM Transactions on Database Systems (TODS
, 2001
"... this paper, we propose a query-planning framework to answer queries in the presence of limited access patterns. In the framework, a query and source descriptions are translated to a recursive datalog program. We then solve optimization problems in this framework, including how to decide whether acce ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
this paper, we propose a query-planning framework to answer queries in the presence of limited access patterns. In the framework, a query and source descriptions are translated to a recursive datalog program. We then solve optimization problems in this framework, including how to decide whether accessing off-query sources is necessary, how to choose useful sources for a query, and how to test query containment. We develop algorithms to solve these problems, and thus construct an efficient program to answer a query
Processing First-Order Queries under Limited Access Patterns
- IN PODS
, 2004
"... We study the problem of answering queries over sources with limited access patterns. Given a first-order query Q, the problem is to decide whether there is an equivalent query which can be executed observing the access patterns restrictions. If so, we say that Q is feasible. We define feasible for f ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We study the problem of answering queries over sources with limited access patterns. Given a first-order query Q, the problem is to decide whether there is an equivalent query which can be executed observing the access patterns restrictions. If so, we say that Q is feasible. We define feasible for first-order queries---previous definitions handled only some existential cases---and characterize the complexity of many first-order query classes. For each of them, we show that deciding feasibility is as hard as deciding containment. Since feasibility is undecidable in many cases and hard to decide in some others, we also define an approximation to it which can be computed in NP for any first-order query and in P for unions of conjunctive queries with negation. Finally, we outline a practical overall strategy for processing first-order queries under limited access patterns.

