Results 1 - 10
of
94
ULDBs: Databases with uncertainty and lineage
- IN VLDB
, 2006
"... This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, howev ..."
Abstract
-
Cited by 310 (32 self)
- Add to MetaCart
This paper introduces ULDBs, an extension of relational databases with simple yet expressive constructs for representing and manipulating both lineage and uncertainty. Uncertain data and data lineage are two important areas of data management that have been considered extensively in isolation, however many applications require the features in tandem. Fundamentally, lineage enables simple and consistent representation of uncertain data, it correlates uncertainty in query results with uncertainty in the input data, and query processing with lineage and uncertainty together presents computational benefits over treating them separately. We show that the ULDB representation is complete, and that it permits straightforward implementation of many relational operations. We define two notions of ULDB minimality—dataminimal and lineage-minimal—and study minimization of ULDB representations under both notions. With lineage, derived relations are no longer self-contained: their uncertainty depends on uncertainty in the base data. We provide an algorithm for the new operation of extracting a database subset in the presence of interconnected uncertainty. Finally, we show how ULDBs enable a new approach to query processing in probabilistic databases. ULDBs form the basis of the Trio system under development at Stanford.
Combinators for bi-directional tree transformations: A linguistic approach to the view update problem
- In ACM SIGPLAN–SIGACT Symposium on Principles of Programming Languages (POPL
, 2005
"... We propose a novel approach to the view update problem for tree-structured data: a domainspecific programming language in which all expressions denote bi-directional transformations on trees. In one direction, these transformations—dubbed lenses—map a “concrete ” tree into a simplified “abstract vie ..."
Abstract
-
Cited by 205 (17 self)
- Add to MetaCart
(Show Context)
We propose a novel approach to the view update problem for tree-structured data: a domainspecific programming language in which all expressions denote bi-directional transformations on trees. In one direction, these transformations—dubbed lenses—map a “concrete ” tree into a simplified “abstract view”; in the other, they map a modified abstract view, together with the original concrete tree, to a correspondingly modified concrete tree. Our design emphasizes both robustness and ease of use, guaranteeing strong well-behavedness and totality properties for welltyped lenses. We identify a natural mathematical space of well-behaved bi-directional transformations over arbitrary structures, study definedness and continuity in this setting, and state a precise connection with the classical theory of “update translation under a constant complement ” from databases. We then instantiate this semantic framework in the form of a collection of lens combinators that can be assembled to describe transformations on trees. These combinators include familiar constructs from functional programming (composition, mapping, projection, conditionals, recursion) together with some novel primitives for manipulating trees (splitting, pruning, copying, merging, etc.). We illustrate the expressiveness of these combinators by developing a number of bi-directional listprocessing transformations as derived forms. An extended example shows how our combinators can be used to define a lens that translates between a native HTML representation of browser bookmarks and a generic abstract bookmark format.
An annotation management system for relational databases
- In VLDB
, 2004
"... We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Suc ..."
Abstract
-
Cited by 129 (8 self)
- Add to MetaCart
(Show Context)
We present an annotation management system for relational databases. In this system, every piece of data in a relation is assumed to have zero or more annotations associated with it and annotations are propagated along, from the source to the output, as data is being transformed through a query. Such an annotation management system is important for understanding the provenance and quality of data, especially in applications that deal with integration of scientific and biological data. We present an extension, pSQL, of a fragment of SQL that has three different types of annotation propagation schemes, each useful for different purposes. The default scheme propagates annotations according to where data is copied from. The default-all scheme propagates annotations according to where data is copied from among all equivalent formulations of a given query. The custom scheme allows a user to specify how annotations should propagate. We present a storage scheme for the annotations and describe algorithms for translating a pSQL query under each propagation scheme into one or more SQL queries that would correctly retrieve the relevant annotations according to the specified propagation scheme. For the default-all scheme, we also show how we generate finitely many queries that can simulate the annotation propagation behavior of the set of all equivalent queries, which is possibly infinite. The algorithms are implemented and the feasibility of the system is demonstrated by a set of experiments that we have conducted. 1
A Cost-Based Model and Effective Heuristic for Repairing Constraints by Value Modification
- In ACM SIGMOD International Conference on Management of Data
, 2005
"... Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find “low cost ” changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple ..."
Abstract
-
Cited by 107 (16 self)
- Add to MetaCart
Data integrated from multiple sources may contain inconsistencies that violate integrity constraints. The constraint repair problem attempts to find “low cost ” changes that, when applied, will cause the constraints to be satisfied. While in most previous work repair cost is stated in terms of tuple insertions and deletions, we follow recent work to define a database repair as a set of value modifications. In this context, we introduce a novel cost framework that allows for the application of techniques from record-linkage to the search for good repairs. We prove that finding minimal-cost repairs in this model is NP-complete in the size of the database, and introduce an approach to heuristic repair-construction based on equivalence classes of attribute values. Following this approach, we define two greedy algorithms. While these simple algorithms take time cubic in the size of the database, we develop optimizations inspired by algorithms for duplicate-record detection that greatly improve scalability. We evaluate our framework and algorithms on synthetic and real data, and show that our proposed optimizations greatly improve performance at little or no cost in repair quality. 1.
Curated databases
- PODS'08
, 2008
"... Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databa ..."
Abstract
-
Cited by 105 (12 self)
- Add to MetaCart
Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.
Relational Lenses: A Language for Updatable Views
- Principles of Database Systems
, 2006
"... We propose a novel approach to the classical view update problem. The view update problem arises from the fact that modifications to a database view may not correspond uniquely to modifications on the underlying database; we need a means of determining an “update policy ” that guides how view update ..."
Abstract
-
Cited by 88 (12 self)
- Add to MetaCart
(Show Context)
We propose a novel approach to the classical view update problem. The view update problem arises from the fact that modifications to a database view may not correspond uniquely to modifications on the underlying database; we need a means of determining an “update policy ” that guides how view updates are reflected in the database. Our approach is to define a bi-directional query language, in which every expression can be read both (from left to right) as a view definition and (from right to left) as an update policy. The primitives of this language are based on standard relational operators. Its type system, which includes recordlevel predicates and functional dependencies, plays a crucial role in guaranteeing that update policies are well-behaved, in a precise sense, and that they are total—i.e., able to handle arbitrary changes to the view.
Bidirectional Transformations: A Cross-Discipline Perspective GRACE meeting notes, state of the art, and outlook
"... was held in December 2008 near Tokyo, Japan. The meeting brought together researchers and practitioners from a variety of subdisciplines of computer science to share research efforts and help create a new community. In this report, we survey the state of the art and summarize the technical presentat ..."
Abstract
-
Cited by 72 (23 self)
- Add to MetaCart
(Show Context)
was held in December 2008 near Tokyo, Japan. The meeting brought together researchers and practitioners from a variety of subdisciplines of computer science to share research efforts and help create a new community. In this report, we survey the state of the art and summarize the technical presentations delivered at the meeting. We also describe some insights gathered from our discussions and introduce a new effort to establish a benchmark for bidirectional transformations. 1
Mondrian: Annotating and querying databases through colors and blocks
- in ICDE ’06: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06
, 2006
"... Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data ar ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
(Show Context)
Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries. 1.
Provenance in Databases: Past, Current, and Future
, 2007
"... The need to understand and manage provenance arises in almost every scientific application. In many cases, information about provenance constitutes the proof of correctness of results that are generated by scientific applications. It also determines the quality and amount of trust one places on the ..."
Abstract
-
Cited by 53 (0 self)
- Add to MetaCart
The need to understand and manage provenance arises in almost every scientific application. In many cases, information about provenance constitutes the proof of correctness of results that are generated by scientific applications. It also determines the quality and amount of trust one places on the results. For these reasons, the knowledge of provenance of a scientific result is typically regarded to be as important as the result itself. In this paper, we provide an overview of research in provenance in databases and discuss some future research directions. The content of this paper is largely based on the tutorial presented at SIGMOD 2007 [11].
A Programmable Editor for Developing Structured Documents Based on Bidirectional Transformations
- In Partial Evaluation and Program Manipulation (PEPM
, 2004
"... This paper presents a novel editor supporting interactive refinement in the development of structured documents. The user performs a sequence of editing operations on the document view, and the editor automatically derives an efficient and reliable document source and a transformation that produces ..."
Abstract
-
Cited by 47 (20 self)
- Add to MetaCart
(Show Context)
This paper presents a novel editor supporting interactive refinement in the development of structured documents. The user performs a sequence of editing operations on the document view, and the editor automatically derives an efficient and reliable document source and a transformation that produces the document view. The editor is unique in its programmability, in the sense that transformation can be obtained through editing operations. The main tricks behind are the utilization of the view-updating technique developed in the database community, and a new bidirectional transformation language that cannot only describe the relationship between the document source and its view, but also data dependency in the view.