Results 1 - 10
of
428
Data Exchange: Getting to the Core
, 2003
"... Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that sat ..."
Abstract
-
Cited by 168 (19 self)
- Add to MetaCart
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. Given a source instance, there may be many solutions to the data exchange problem, that is, many target instances that satisfy the constraints of the data exchange problem. In an earlier paper, we identified a special class of solutions that we call universal. A universal solution has homomorphisms into every possible solution, and hence is a "most general possible" solution. Nonetheless, given a source instance, there may be many universal solutions. This naturally raises the question of whether there is a "best" universal solution, and hence a best solution for data exchange. We answer this question by considering the well-known notion of the core of a structure, a notion that was first studied in graph theory, but has also played a role in conjunctive-query processing. The core of a structure is the smallest substructure that is also a homomorphic image of the structure. All universal solutions have the same core (up to isomorphism); we show that this core is also a universal solution, and hence the smallest universal solution. The uniqueness of the core of a universal solution together with its minimality make the core an ideal solution for data exchange. Furthermore, we show that the core is the best among all universal solutions for answering unions of conjunctive queries with inequalities. After this, we investigate the computational complexity of producing the core. Well-known results by Chandra and Merlin imply that, unless P = NP, there is no polynomial-time algorithm that, given a structure as input, returns the core of that structure as output. In contrast, in the context of data e...
On the decidability and complexity of query answering over inconsistent and incomplete databases
- In Proc. of PODS 2003
, 2003
"... In databases with integrity constraints, data may not satisfy the constraints. In this paper, we address the problem of obtaining consistent answers in such a setting, when key and inclusion dependencies are expressed on the database schema. We establish decidability and complexity results for query ..."
Abstract
-
Cited by 149 (29 self)
- Add to MetaCart
(Show Context)
In databases with integrity constraints, data may not satisfy the constraints. In this paper, we address the problem of obtaining consistent answers in such a setting, when key and inclusion dependencies are expressed on the database schema. We establish decidability and complexity results for query answering under different assumptions on data (soundness and/or completeness). In particular, after showing that the problem is in general undecidable, we identify the maximal class of inclusion dependencies under which query answering is decidable in the presence of key dependencies. Although obtained in a single database context, such results are directly applicable to data integration, where multiple information sources may provide data that are inconsistent with respect to the global view of the sources. 1.
Composing Mappings among Data Sources
- In VLDB
, 2003
"... Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of ..."
Abstract
-
Cited by 140 (9 self)
- Add to MetaCart
Semantic mappings between data sources play a key role in several data sharing architectures. Mappings provide the relationships between data stored in different sources, and therefore enable answering queries that require data from other nodes in a data sharing network. Composing mappings is one of the core problems that lies at the heart of several optimization methods in data sharing networks, such as caching frequently traversed paths and redundancy analysis.
A general Datalog-based framework for tractable query answering over ontologies
- In Proc. PODS-2009. ACM
, 2009
"... Ontologies play a key role in the Semantic Web [4], data modeling, and information integration [16]. Recent trends in ontological reasoning have shifted from decidability issues to tractability ones, as e.g. reflected by the work on the DL-Lite family of tractable description logics (DLs) [11, 19]. ..."
Abstract
-
Cited by 135 (24 self)
- Add to MetaCart
(Show Context)
Ontologies play a key role in the Semantic Web [4], data modeling, and information integration [16]. Recent trends in ontological reasoning have shifted from decidability issues to tractability ones, as e.g. reflected by the work on the DL-Lite family of tractable description logics (DLs) [11, 19]. An important result of these works is that the main
Schema Mappings, Data Exchange, and Metadata Management
, 2005
"... Schema mappings are high-level specifications that describe the relationship between database schemas. Schema mappings are prominent in several different areas of database management, including database design, information integration, data exchange, metadata management, and peer-topeer data managem ..."
Abstract
-
Cited by 127 (11 self)
- Add to MetaCart
(Show Context)
Schema mappings are high-level specifications that describe the relationship between database schemas. Schema mappings are prominent in several different areas of database management, including database design, information integration, data exchange, metadata management, and peer-topeer data management systems. Our main aim in this paper is to present an overview of recent advances in data exchange and metadata management, where the schema mappings are between relational schemas. In addition, we highlight some research issues and directions for future work.
Model management 2.0: manipulating richer mappings
- in SIGMOD, 2007
"... Model management is a generic approach to solving problems of data programmability where precisely engineered mappings are required. Applications include data warehousing, e-commerce, object-to-relational wrappers, enterprise information integration, database portals, and report generators. The goal ..."
Abstract
-
Cited by 127 (3 self)
- Add to MetaCart
(Show Context)
Model management is a generic approach to solving problems of data programmability where precisely engineered mappings are required. Applications include data warehousing, e-commerce, object-to-relational wrappers, enterprise information integration, database portals, and report generators. The goal is to develop a model management engine that can support tools for all of these applications. The engine supports operations to match schemas, compose mappings, diff schemas, merge schemas, translate schemas into different data models, and generate data transformations from mappings. Much has been learned about model management since it was proposed seven years ago. This leads us to a revised vision that differs from the original in two main respects: the operations must handle more expressive mappings, and the runtime that executes mappings should be added as an important model management component. We review what has been learned from recent experience, explain the revised model management vision based on that experience, and identify the research problems that the revised vision opens up.
Clio Grows Up: From Research Prototype to Industrial Tool
- In ACM SIGMOD International Conference on Management of Data (SIGMOD
, 2005
"... Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBM’s mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational s ..."
Abstract
-
Cited by 112 (11 self)
- Add to MetaCart
(Show Context)
Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBM’s mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool. 1.
Logical foundations of peer-to-peer data integration
- In Proc. of the 23rd ACM SIGACT SIGMOD SIGART Sym. on Principles of Database Systems (PODS-2004
, 2004
"... In peer-to-peer data integration, each peer exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas. Peers are autonomous systems and mappings are dynamically created and changed. One of the challenges in these systems is answering que ..."
Abstract
-
Cited by 107 (13 self)
- Add to MetaCart
(Show Context)
In peer-to-peer data integration, each peer exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas. Peers are autonomous systems and mappings are dynamically created and changed. One of the challenges in these systems is answering queries posed to one peer taking into account the mappings. Obviously, query answering strongly depends on the semantics of the overall system. In this paper, we compare the commonly adopted approach of interpreting peerto-peer systems using a first-order semantics, with an alternative approach based on epistemic logic. We consider several central properties of peer-to-peer systems: modularity, generality, and decidability. We argue that the approach based on epistemic logic is superior with respect to all the above properties. In particular, we show that, in systems in which peers have decidable schemas and conjunctive mappings, but are arbitrarily interconnected, the first-order approach may lead to undecidability of query answering, while the epistemic approach always preserves decidability. This is a fundamental property, since the actual interconnections among peers are not under the control of any actor in the system. 1.
Taming the infinite chase: Query answering under expressive relational constraints
- In Proc. of KR 2008
, 2008
"... The chase algorithm is a fundamental tool for query evaluation and for testing query containment under tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates. This paper in ..."
Abstract
-
Cited by 104 (16 self)
- Add to MetaCart
The chase algorithm is a fundamental tool for query evaluation and for testing query containment under tuple-generating dependencies (TGDs) and equality-generating dependencies (EGDs). So far, most of the research on this topic has focused on cases where the chase procedure terminates. This paper introduces expressive classes of TGDs defined via syntactic restrictions: guarded TGDs (GTGDs) and weakly guarded sets of TGDs (WGT-GDs). For these classes, the chase procedure is not guaranteed to terminate and thus may have an infinite outcome. Nevertheless, we prove that the problems of conjunctive-query answering and query containment under such TGDs are decidable. We provide decision procedures and tight complexity bounds for these problems. Then we show how EGDs can be incorporated into our results by providing conditions under which EGDs do not harmfully interact with TGDs and do not affect the decidability and complexity of query answering. We show applications of the aforesaid classes of constraints to the problem of answering conjunctive queries in F-Logic Lite, an object-oriented ontology language, and in some tractable Description Logics. 1.
XML Data Exchange: Consistency and Query Answering
, 2005
"... Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data. In this paper, we star ..."
Abstract
-
Cited by 88 (12 self)
- Add to MetaCart
Data exchange is the problem of finding an instance of a target schema, given an instance of a source schema and a specification of the relationship between the source and the target. Theoretical foundations of data exchange have recently been investigated for relational data. In this paper, we start looking into the basic properties of XML data exchange, that is, restructuring of XML documents that conform to a source DTD under a target DTD, and answering queries written over the target schema. We define XML data exchange settings in which sourceto-target dependencies refer to the hierarchical structure of the data. Combining DTDs and dependencies makes some XML data exchange settings inconsistent. We investigate the consistency problem and determine its exact complexity. We then move to query answering, and prove a dichotomy theorem that classifies data exchange settings into those over which query answering is tractable, and those over which it is coNPcomplete, depending on classes of regular expressions used in DTDs. Furthermore, for all tractable cases we give polynomial-time algorithms that compute target XML documents over which queries can be answered.