Results 1  10
of
60
Why and Where: A Characterization of Data Provenance
 In ICDT
, 2001
"... With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to ..."
Abstract

Cited by 337 (18 self)
 Add to MetaCart
(Show Context)
With the proliferation of database views and curated databases, the issue of data provenance # where a piece of data came from and the process by which it arrived in the database # is becoming increasingly important, especially in scienti#c databases where understanding provenance is crucial to the accuracy and currency of data. In this paper we describe an approach to computing provenance when the data of interest has been created by a database query.We adopt a syntactic approach and present results for a general data model that applies to relational databases as well as to hierarchical data such as XML. A novel aspect of our work is a distinction between #why" provenance #refers to the source data that had some in#uence on the existence of the data# and #where" provenance #refers to the location#s# in the source databases from which the data was extracted#.
Principles of Programming with Complex Objects and Collection Types
 Theoretical Computer Science
, 1995
"... We present a new principle for the development of database query languages that the primitive operations should be organized around types. Viewing a relational database as consisting of sets of records, this principle dictates that we should investigate separately operations for records and sets. Th ..."
Abstract

Cited by 132 (28 self)
 Add to MetaCart
We present a new principle for the development of database query languages that the primitive operations should be organized around types. Viewing a relational database as consisting of sets of records, this principle dictates that we should investigate separately operations for records and sets. There are two immediate advantages of this approach, which is partly inspired by basic ideas from category theory. First, it provides a language for structures in which record and set types may be freely combined: nested relations or complex objects. Second, the fundamental operations for sets are closely related to those for other "collection types" such as bags or lists, and this suggests how database languages may be uniformly extended to these new types. The most general operation on sets, that of structural recursion, is one in which not all programs are welldefined. In looking for limited forms of this operation that always give rise to welldefined operations, we find a number of close ...
A data transformation system for biological data sources
 IN PROC. OF INTL. CONJ. ON VERY LARGE DATA BASES
, 1995
"... Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages conta ..."
Abstract

Cited by 76 (21 self)
 Add to MetaCart
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well a.s sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data.
Towards Tractable Algebras for Bags
, 1993
"... Bags, i.e. sets with duplicates, are often used to implement relations in database systems. In this paper, we study the expressive power of algebras for manipulating bags. The algebra we present is a simple extension of the nested relation algebra. Our aim is to investigate how the use of bags in ..."
Abstract

Cited by 58 (4 self)
 Add to MetaCart
Bags, i.e. sets with duplicates, are often used to implement relations in database systems. In this paper, we study the expressive power of algebras for manipulating bags. The algebra we present is a simple extension of the nested relation algebra. Our aim is to investigate how the use of bags in the language extends its expressive power, and increases its complexity. We consider two main issues, namely (i) the impact of the depth of bag nesting on the expressive power, and (ii) the complexity and the expressive power induced by the algebraic operations. We show that the bag algebra is more expressive than the nested relation algebra (at all levels of nesting), and that the difference may be subtle. We establish a hierarchy based on the structure of algebra expressions. This hierarchy is shown to be highly related to the properties of the powerset operator. Invited to a special issue of the Journal of Computer and System Sciences selected from ACM Princ. of Database Systems,...
The Power of Languages for the Manipulation of Complex Values
 VLDB Journal
, 1995
"... Abstract. Various models and languages for describing and manipulating hierarchically structured data have been proposed. Algebraic, calculusbased, and logicprogramming oriented languages have all been considered. This article presents a general model for complex values (i.e., values with hierarc ..."
Abstract

Cited by 50 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Various models and languages for describing and manipulating hierarchically structured data have been proposed. Algebraic, calculusbased, and logicprogramming oriented languages have all been considered. This article presents a general model for complex values (i.e., values with hierarchical structures), and languages for it based on the three paradigms. The algebraic language generalizes those presented in the literature; it is shown to be related to the functional style of programming advocated by Backus (1978). The notion of domain independence (from relational databases) is defined, and syntactic restrictions (referred to as safety conditions) on calculus queries are formulated to guarantee domain independence. The main results are: The domainindependent calculus, the safe calculus, the algebra, and the logicprogramming oriented language have equivalent expressive power. In particular, recursive queries, such as the transitive closure, can be expressed in each of the languages. For this result, the algebra needs the powerset operation. A more restricted version of safety is presented, such that the restricted safe calculus is equivalent to the algebra without the powerset. The results are extended to the case where arbitrary functions and predicates are used in the languages. Key Words. Database, query language, complex value, complex object, database model.
Optimizing Object Queries Using an Effective Calculus
 ACM Transactions on Database Systems
, 1998
"... This paper concentrates on query unnesting (also known as query decorrelation), an optimization that, even though improves performance considerably, is not treated properly (if at all) by most OODB systems. Our framework generalizes many unnesting techniques proposed recently in the literature and i ..."
Abstract

Cited by 47 (2 self)
 Add to MetaCart
This paper concentrates on query unnesting (also known as query decorrelation), an optimization that, even though improves performance considerably, is not treated properly (if at all) by most OODB systems. Our framework generalizes many unnesting techniques proposed recently in the literature and is capable of removing any form of query nesting using a very simple and efficient algorithm. The simplicity of our method is due to the use of the monoid comprehension calculus as an intermediate form for OODB queries. The monoid comprehension calculus treats operations over multiple collection types, aggregates, and quantifiers in a similar way, resulting in a uniform way of unnesting queries, regardless of their type of nesting.
New Techniques for Studying Set Languages, Bag Languages and Aggregate Functions
, 1994
"... We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphs and prove that the properties of thes ..."
Abstract

Cited by 41 (24 self)
 Add to MetaCart
We provide new techniques for the analysis of the expressive power of query languages for nested collections. These languages may use set or bag semantics and may be further complicated by the presence of aggregate functions. We exhibit certain classes of graphs and prove that the properties of these graphs that can be tested in such languages are either finite or cofinite. This result settles the conjectures of Grumbach, Milo, and Paredaens that parity test, transitive closure, and balanced binary tree test are not expressible in bag languages like the PTIME fragment of BALG of Grumbach and Milo and BQL of Libkin and Wong. Moreover, it implies that many recursive queries, including simple ones like the test for a chain, cannot be expressed in a nested relational language even when aggregate functions are available. In an attempt to generalize the finitecofiniteness result, we study the bounded degree property which says that the number of distinct in and outdegrees in the output of...
Deciding Containment for Queries with Complex Objects and Aggregations
, 1997
"... We address the problem of query containment and query equivalence for complex objects. We show that for a certain conjunctive query language for complex objects, query containment and weak query equivalence are decidable. Our results have two consequences. First, when the answers of the two queries ..."
Abstract

Cited by 41 (5 self)
 Add to MetaCart
We address the problem of query containment and query equivalence for complex objects. We show that for a certain conjunctive query language for complex objects, query containment and weak query equivalence are decidable. Our results have two consequences. First, when the answers of the two queries are guaranteed not to contain empty sets, then weak equivalence coincides with equivalence, and our result answers partially an open problem about the equivalence of nest; unnest queries for complex objects [GPG90]. Second, we derive an NPcomplete algorithm for checking the equivalence of certain conjunctive queries with grouping and aggregates. Our results rely on a translation of the containment and equivalence conditions for complex objects into novel conditions on conjunctive queries, which we call simulation and strong simulation. These conditions are more complex than containment of conjunctive queries, because they involve arbitrary numbers of quantifier alternations. We prove that c...
Some Properties of Query Languages for Bags
 IN PROCEEDINGS OF 4TH INTERNATIONAL WORKSHOP ON DATABASE PROGRAMMING LANGUAGES
, 1993
"... In this paper we study the expressive power of query languages for nested bags. We define the ambient bag language by generalizing the constructs of the relational language of BreazuTannen, Buneman and Wong, which is known to have precisely the power of the nested relational algebra. Relative s ..."
Abstract

Cited by 39 (26 self)
 Add to MetaCart
(Show Context)
In this paper we study the expressive power of query languages for nested bags. We define the ambient bag language by generalizing the constructs of the relational language of BreazuTannen, Buneman and Wong, which is known to have precisely the power of the nested relational algebra. Relative strength of additional polynomial constructs is studied, and the ambient language endowed with the strongest combination of those constructs is chosen as a candidate for the basic bag language, which is called BQL (Bag Query Language). We prove that achieveing the power of BQL in the relational language amounts to adding simple arithmetic to the latter. We show that BQL has shortcomings of the relational algebra: it can not express recursive queries. In particular, parity test is not definable in BQL. We consider augmenting BQL with powerbag and structural recursion to overcome this deficiency. In contrast to the relational case, where powerset and structural recursion are equivalent...