Data Integration: A Theoretical Perspective
 Symposium on Principles of Database Systems
, 2002
"... Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interestin ..."
Cited by 718 (45 self)
Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Constraint Query Languages
, 1992
"... We investigate the relationship between programming with constraints and database query languages. We show that efficient, declarative database programming can be combined with efficient constraint solving. The key intuition is that the generalization of a ground fact, or tuple, is a conjunction ..."
Cited by 335 (35 self)
We investigate the relationship between programming with constraints and database query languages. We show that efficient, declarative database programming can be combined with efficient constraint solving. The key intuition is that the generalization of a ground fact, or tuple, is a conjunction of constraints over a small number of variables. We describe the basic Constraint Query Language design principles and illustrate them with four classes of constraints: real polynomial inequalities, dense linear order inequalities, equalities over an infinite domain, and boolean equalities. For the analysis, we use quantifier elimination techniques from logic and the concept of data complexity from database theory. This framework is applicable to managing spatial data and can be combined with existing multidimensional searching algorithms and data structures.
Data Exchange: Semantics and Query Answering
 In ICDT
, 2003
"... Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answe ..."
Cited by 314 (34 self)
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.
Complexity of Answering Queries Using Materialized Views
 In PODS
, 1998
"... We study the complexity of the problem of answering queries using materialized views. This problem has attracted a lot of attention recently because of its relevance in data integration. Previous work considered only conjunctive view definitions. We examine the consequences of allowing more expressi ..."
Cited by 283 (5 self)
We study the complexity of the problem of answering queries using materialized views. This problem has attracted a lot of attention recently because of its relevance in data integration. Previous work considered only conjunctive view definitions. We examine the consequences of allowing more expressive view definition languages. The languageswe consider for view definitions and user queries are: conjunctive queries with inequality, positive queries, datalog, and firstorder logic. We show that the complexity of the problem depends on whether views are assumed to store all the tuples that satisfy the view definition, or only a subset of it. Finally, we apply the results to the view consistency and view selfmaintainability problems which arise in data warehousing. 1 Introduction The notion of materialized view is essential in databases [34] and is attracting more and more attention with the popularity of data warehouses [28]. The problem of answering queries using materialized views [24...
On the Decidability of Query Containment under Constraints
 IN PROC. OF THE 17TH ACM SIGACT SIGMOD SIGART SYMP. ON PRINCIPLES OF DATABASE SYSTEMS (PODS’98
, 1998
"... Query containment under constraints is the problem of checking whether for every database satisfying a given set of constraints, the result of one query is a subset of the result of another query. Recent research points out that this is a central problem in several database applications, and we addr ..."
Cited by 243 (60 self)
Query containment under constraints is the problem of checking whether for every database satisfying a given set of constraints, the result of one query is a subset of the result of another query. Recent research points out that this is a central problem in several database applications, and we address it within a setting where constraints are specified in the form of special inclusion dependencies over complex expressions, built by using intersection and difference of relations, special forms of quantification, regular expressions over binary relations, and cardinality constraints. These types of constraints capture a great variety of data models, including the relational, the entityrelational, and the objectoriented model. We study the problem of checking whether q is contained in q 0 with respect to the constraints specified in a schema S, where q and q 0 are nonrecursive Datalog programs whose atoms are complex expressions. We present the following results on query containme...
Temporal Query Languages: a Survey
, 1995
"... We define formal notions of temporal domain and temporal database, and use them to survey a wide spectrum of temporal query languages. We distinguish between an abstract temporal database and its concrete representations, and accordingly between abstract and concrete temporal query languages. We als ..."
Cited by 106 (11 self)
We define formal notions of temporal domain and temporal database, and use them to survey a wide spectrum of temporal query languages. We distinguish between an abstract temporal database and its concrete representations, and accordingly between abstract and concrete temporal query languages. We also address the issue of incomplete temporal information. 1 Introduction A temporal database is a repository of temporal information. A temporal query language is any query language for temporal databases. In this paper we propose a formal notion of temporal database and use this notion in surveying a wide spectrum of temporal query languages. The need to store temporal information arises in many computer applications. Consider, for example, records of various kinds: financial [37], personnel, medical [98], or judicial. Also, monitoring data, e.g., in telecommunications network management [4] or process control, has often a temporal dimension. There has been a lot of research in temporal dat...
Constraint Programming and Database Query Languages
 In Proc. 2nd Conference on Theoretical Aspects of Computer Software (TACS
, 1994
"... . The declarative programming paradigms used in constraint languages can lead to powerful extensions of Codd's relational data model. The development of constraint database query languages from logical database query languages has many similarities with the development of constraint logic programmin ..."
Cited by 60 (3 self)
. The declarative programming paradigms used in constraint languages can lead to powerful extensions of Codd's relational data model. The development of constraint database query languages from logical database query languages has many similarities with the development of constraint logic programming from logic programming, but with the additional requirements of data efficient, setatatime, and bottomup evaluation. In this overview of constraint query languages (CQLs) we first present the framework of [41]. The principal idea is that: "the ktuple (or record) data type can be generalized by a conjunction of quantifierfree constraints over k variables". The generalization must preserve various language properties of the relational data model, e.g., the calculus/algebra equivalence, and have time complexity polynomial in the size of the data. We next present an algebra for dense order constraints that is simpler to evaluate than the calculus described in [41], and we sharpen some of...
The Complexity of Querying Indefinite Data about Linearly Ordered Domains
 In The Proceedings of the Eleventh ACM SIGACTSIGMODSIGART Symposium on Principles of Database Systems
, 1992
"... In applications dealing with ordered domains, the available data is frequently indefinite. While the domain is actually linearly ordered, only some of the order relations holding between points in the data are known. Thus, the data provides only a partial order, and query answering involves determin ..."
Cited by 40 (2 self)
In applications dealing with ordered domains, the available data is frequently indefinite. While the domain is actually linearly ordered, only some of the order relations holding between points in the data are known. Thus, the data provides only a partial order, and query answering involves determining what holds under all the compatible linear orders. In this paper we study the complexity of evaluating queries in logical databases containing such indefinite information. We show that in this context queries are intractable even under the data complexity measure, but identify a number of PTIME subproblems. Data complexity in the case of monadic predicates is one of these PTIME cases, but for disjunctive queries the proof is nonconstructive, using wellquasiorder techniques. We also show that the query problem we study is equivalent to the problem of containment of conjunctive relational database queries containing inequalities. One of our results implies that the latter is \Pi p 2 ...
On the complexity of XPath containment in the presence of disjunction, DTDs, and variables
 Logical Methods in Computer Science
Relational Queries over Interpreted Structures
 Journal of the ACM
"... We rework parts of the classical relational theory when the underlying domain is a structure with some interpreted operations that can be used in queries. We identify parts of the classical theory that go through `as before' when interpreted structure is present, parts that go through only for cl ..."
Cited by 22 (12 self)
We rework parts of the classical relational theory when the underlying domain is a structure with some interpreted operations that can be used in queries. We identify parts of the classical theory that go through `as before' when interpreted structure is present, parts that go through only for classes of nicelybehaved structures, and parts that only arise in the interpreted case. The first category includes a number of results on language equivalence and expressive power characterizations for the activedomain semantics for a variety of logics. Under this semantics, quantifiers range over elements of a relational database. The main kind of results we prove here are generic collapse results: for generic queries, adding operations beyond order, does not give us extra power. The second category includes results on the natural semantics, under which quantifiers range over the entire interpreted structure. We prove, for a variety of structures, naturalactive collapse results, s...