Results 1  10
of
26
J.B.: The chase revisited
 In: PODS (2008
"... We revisit the classical chase procedure, studying its properties as well as its applicability to standard database problems. We settle (in the negative) the open problem of decidability of termination of the standard chase, and we provide sufficient termination conditions which are strictly less ov ..."
Abstract

Cited by 54 (6 self)
 Add to MetaCart
We revisit the classical chase procedure, studying its properties as well as its applicability to standard database problems. We settle (in the negative) the open problem of decidability of termination of the standard chase, and we provide sufficient termination conditions which are strictly less overconservative than the best previously known. We investigate the adequacy of the standard chase for checking query containment under constraints, constraint implication and computing certain answers in data exchange. We find room for improvement after gaining a deeper understanding of the chase by separating the algorithm from its result. We identify the properties of the chase result that are essential to the above applications, and we introduce the more general notion of an Funiversal model set, which supports query and constraint languages that are closed under a class F of mappings. By choosing F appropriately, we extend prior results all the way to existential firstorder queries and ∀∃firstorder constraints (and various standard sublanguages). We show that the standard chase is incomplete for finding universal model sets, and we introduce the extended core chase which is complete, i.e. finds an Funiversal model set when it exists. A key advantage of the new chase is that the same algorithm can be applied for the mapping classes F of interest, by simply modifying appropriately the set of constraints given as input. Even when restricted to the typical input in prior work (unions of conjunctive queries and embedded dependencies), the new chase supports certain answer computation and containment/implication tests in strictly more cases than the incomplete standard chase.
Logical foundations of relational data exchange
 SIGMOD Record
"... Data exchange has been defined as the problem of taking data structured under a source schema and materializing an instance of a target schema that reflects as accurately as possible the source data [19]. In the last years, the need ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
Data exchange has been defined as the problem of taking data structured under a source schema and materializing an instance of a target schema that reflects as accurately as possible the source data [19]. In the last years, the need
The Recovery of a Schema Mapping: Bringing Exchanged Data Back
 In Proceedings of the 28th ACM Symposium on Principles of Database Systems (PODS
, 2008
"... A schema mapping is a specification that describes how data from a source schema is to be mapped to a target schema. Once the data has been transferred from the source to the target, a natural question is whether one can undo the process and recover the initial data, or at least part of it. In fact, ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
A schema mapping is a specification that describes how data from a source schema is to be mapped to a target schema. Once the data has been transferred from the source to the target, a natural question is whether one can undo the process and recover the initial data, or at least part of it. In fact, it would be desirable to find a reverse schema mapping from target to source that specifies how to bring the exchanged data back. In this paper, we introduce the notion of a recovery of a schema mapping: it is a reverse mapping M ′ for a mapping M that recovers sound data with respect to M. We further introduce an order relation on recoveries. This allows us to choose mappings that recover the maximum amount of sound information. We call such mappings maximum recoveries. We study maximum recoveries in detail, providing a necessary and sufficient condition for their existence. In particular, we prove that maximum recoveries exist for the class of mappings specified by FOTOCQ sourcetotarget dependencies. This class subsumes the class of sourcetotarget tuplegenerating dependencies used in previous work on data exchange. For the class of mappings specified by FOTOCQ dependencies, we provide an exponentialtime algorithm for computing maximum recoveries, and a simplified version for full dependencies that works in quadratic time. We also characterize the language needed to express maximum recoveries, and we include a detailed comparison with the notion of inverse (and quasiinverse) mapping previously proposed in the data exchange literature. In particular, we show that maximum recoveries strictly generalize inverses. We finally study the complexity of some decision problems related to the notions of recovery and maximum recovery.
On Reconciling Data Exchange, Data Integration, and Peer Data Management
"... Data exchange and virtual data integration have been the subject of several investigations in the recent literature. At the same time, the notion of peer data management has emerged as a powerful abstraction of many forms of flexible and dynamic datacentered distributed systems. Although research o ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Data exchange and virtual data integration have been the subject of several investigations in the recent literature. At the same time, the notion of peer data management has emerged as a powerful abstraction of many forms of flexible and dynamic datacentered distributed systems. Although research on the above issues has progressed considerably in the last years, a clear understanding on how to combine data exchange and data integration in peer data management is still missing. This is the subject of the present paper. We start our investigation by first proposing a novel framework for peer data exchange, showing that it is a generalization of the classical data exchange setting. We also present algorithms for all the relevant data exchange tasks, and show that they can all be done in polynomial time with respect to data complexity. Based on the motivation that typical mappings and integrity constraints found in data integration are not captured by peer data exchange, we extend the framework to incorporate these features. One of the main difficulties is that the constraints of this new class are not amenable to materialization. We address this issue by resorting to a suitable combination of virtual and materialized data exchange, showing that the resulting framework is a generalization of both classical data exchange and classical data integration, and that the new setting incorporates the most expressive types of mapping and constraints considered in the two contexts. Finally, we present algorithms for all the relevant data management tasks also in the new setting, and show that, again, their data complexity is polynomial.
Muse: Mapping Understanding and deSign by Example
"... Abstract — A fundamental problem in information integration is that of designing the relationships, called schema mappings, between two schemas. The specification of a semantically correct schema mapping is typically a complex task. Automated tools can suggest potential mappings, but few tools are a ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Abstract — A fundamental problem in information integration is that of designing the relationships, called schema mappings, between two schemas. The specification of a semantically correct schema mapping is typically a complex task. Automated tools can suggest potential mappings, but few tools are available for helping a designer understand mappings and design alternative mappings. We describe Muse, a mapping design wizard that uses data examples to assist designers in understanding and refining a schema mapping towards the desired specification. We present novel algorithms behind Muse and show how Muse systematically guides the designer on two important components of a mapping design: the specification of the desired grouping semantics for sets of data and the choice among alternative interpretations for semantically ambiguous mappings. In every component, Muse infers the desired semantics based on the designer’s actions on a short sequence of small examples. Whenever possible, Muse draws examples from a familiar database, thus facilitating the design process even further. We report our experience with Muse on some publicly available schemas. I.
Peer data exchange
 In Proc. of PODS 2005
, 2005
"... In this paper, we introduce and study a framework, called peer data exchange, for sharing and exchanging data between peers. This framework is a special case of a fullfledged peer data management system and a generalization of data exchange between a source schema and a target schema. The motivatio ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
In this paper, we introduce and study a framework, called peer data exchange, for sharing and exchanging data between peers. This framework is a special case of a fullfledged peer data management system and a generalization of data exchange between a source schema and a target schema. The motivation behind peer data exchange is to model authority relationships between peers, where a source peer may contribute data to a target peer, specified using sourcetotarget constraints, and a target peer may use targettosource constraints to restrict the data it is willing to receive, but cannot modify the data of the source peer. A fundamental algorithmic problem in this framework is that of deciding the existence of a solution: given a source instance and a target instance for a fixed peer data exchange setting, can the target instance be augmented in such a way that the source instance and the augmented target instance satisfy all constraints of the setting? We investigate the computational complexity of the problem for peer data exchange settings in which the constraints are given by tuple generating dependencies. We show that this problem is always in NP, and that it can be NPcomplete even for “acyclic ” peer data exchange settings. We also show that the data complexity of the certain answers of target conjunctive queries is in coNP, and that it can be coNPcomplete even for
CWAsolutions for data exchange settings with target dependencies
 In PODS
, 2007
"... Data exchange deals with the following problem: given an instance over a source schema, a specification of the relationship between the source and the target, and dependencies on the target, construct an instance over a target schema that satisfies the given relationships and dependencies. Recently— ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Data exchange deals with the following problem: given an instance over a source schema, a specification of the relationship between the source and the target, and dependencies on the target, construct an instance over a target schema that satisfies the given relationships and dependencies. Recently—for data exchange settings without target dependencies—Libkin (PODS’06) introduced a new concept of solutions based on the closed world assumption (so called CWAsolutions), and showed that, in some respects, this new notion behaves better than the standard notion of solutions considered in previous papers on data exchange. The present paper extends Libkin’s notion of CWAsolutions to data exchange settings with target dependencies. We show that, when restricting attention to data exchange settings with weakly acyclic target dependencies, this new notion behaves similarly as before: the core is the unique “minimal ” CWAsolution, and computing CWAsolutions as well as certain answers to positive queries is possible in polynomial time and can be PTIMEhard. However, there may be more than one “maximal ” CWAsolution. And going beyond the class of positive queries, we obtain that there are conjunctive queries with (just) one inequality, for which evaluating the certain answers is coNPhard. Finally, we consider the ExistenceofCWASolutions problem: while the problem is tractable for data exchange settings with weakly acyclic target dependencies, it turns out to be undecidable for general data exchange settings. As a consequence, we obtain that also the ExistenceofUniversalSolutions problem is undecidable in general.
Data Exchange in the Presence of Arithmetic Comparisons
 EDBT'08
, 2008
"... Data exchange is the problem of transforming data structured under a schema (called source) into data structured under a different schema (called target). The emphasis of data exchange is to materialize a target instance (called solution) that satisfies the relationship between the schemas. Universa ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Data exchange is the problem of transforming data structured under a schema (called source) into data structured under a different schema (called target). The emphasis of data exchange is to materialize a target instance (called solution) that satisfies the relationship between the schemas. Universal solutions were shown to be the most suitable solutions, mainly because they can be used to answer conjunctive queries posed over the target schema. Trying to extend this result to more expressive query languages fails, even if we only add inequalities (̸=) to conjunctive queries. In this work we study data exchange in the presence of general arithmetic comparisons (<, ≤,>, ≥, =, ̸=): (a) We consider queries posed over the target schema that belong to the class of unions of conjunctive queries with arithmetic comparisons (in short CQACs). (b) We exploit arithmetic comparisons to define more expressive data exchange settings, called DEAC settings. In particular, DEAC settings consist of constraints that involve arithmetic comparisons. For that, two new classes of dependencies (tgdACs and acgds) are introduced, to capture the need of arithmetic comparisons in sourcetotarget and target constraints. We show that in DEAC settings the existence of solution problem is in NP. We define a novel chase procedure called ACchase which is a tree and we prove that it produces a universal solution (appropriately defined to deal with arithmetic comparisons). We show that the new concept of universal solution is the right tool for query answering in the case of unions of CQACs. The complexity of computing certain answers for unions of CQACs is shown to be coNPWork partially done when all authors were visiting Stanford University and when the third author was visiting the
Probabilistic XML: Models and complexity
, 2011
"... Abstract. Uncertainty in data naturally arises in various applications, such as data integration and Web information extraction. Probabilistic XML is one of the concepts that have been proposed to model and manage various kinds of uncertain data. In essence, a probabilistic XML document is a compact ..."
Abstract

Cited by 7 (7 self)
 Add to MetaCart
Abstract. Uncertainty in data naturally arises in various applications, such as data integration and Web information extraction. Probabilistic XML is one of the concepts that have been proposed to model and manage various kinds of uncertain data. In essence, a probabilistic XML document is a compact representation of a probability distribution over ordinary XML documents. Various models of probabilistic XML provide different languages, with various degrees of expressiveness, for such compact representations. Beyond representation, probabilistic XML systems are expected to support data management in a way that properly reflects the uncertainty. For instance, query evaluation entails probabilistic inference, and update operations need to properly change the entire probability space. Efficiently and effectively accomplishing datamanagement tasks in that manner is a major technical challenge. This chapter reviews the literature on probabilistic XML. Specifically, this chapter discusses the probabilistic XML models that have been proposed, and the complexity of query evaluation therein. Also discussed are other datamanagement tasks like updates and compression, as well as systemic and implementation aspects. 1