Results 1 - 10
of
15
SEQ: A Model for Sequence Databases
- University of Wisconsin-Madison
, 1995
"... This paper presents the model which is the basis for a system to manage various kinds of sequence data. The model separates the data from the ordering information, and includes operators based on two distinct abstractions of a sequence. The main contributions of the model are: (a) it can deal with d ..."
Abstract
-
Cited by 59 (5 self)
- Add to MetaCart
This paper presents the model which is the basis for a system to manage various kinds of sequence data. The model separates the data from the ordering information, and includes operators based on two distinct abstractions of a sequence. The main contributions of the model are: (a) it can deal with different types of sequence data, (b) it supports an expressive range of sequence queries, (c) it draws from many of the diverse existing approaches to modeling sequence data. 1
Sequences, Datalog and Transducers
, 1996
"... This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
This paper develops a query language for sequence databases, such as genome databases and text databases. The language, called SequenceDatalog, extends classical Datalog with interpreted function symbols for manipulating sequences. It has both a clear operational and declarative semantics, based on a new notion called the extended active domain of a database. The extended domain contains all the sequences in the database and all their subsequences. This idea leads to a clear distinction between safe and unsafe recursion over sequences: safe recursion stays inside the extended active domain, while unsafe recursion does not. By carefully limiting the amountof unsafe recursion, the paper develops a safe and expressive subset of Sequence Datalog. As part of the development, a new type of transducer is introduced, called a generalized sequence transducer. Unsafe recursion is allowed only within these generalized transducers. Generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." Generalized transducers can be implemented in Sequence Datalog in a straightforward way. Moreover, their introduction into the language leads to simple conditions that guarantee safety and finiteness. This paper develops two such conditions. The first condition expresses exactly the class of ptime sequence functions; and the second expresses exactly the class of elementary sequence functions.
A Query Language for List-Based Complex Objects
- In Thirteenth ACM SIGMOD Intern. Symposium on Principles of Database Systems (PODS'94
, 1994
"... We present a language for querying list-based complex objects. The language is shown to express precisely the polynomial-time generic list-object functions. The iteration mechanism of the language is based on a new approach wherein, in addition to the list over which the iteration is performed, a se ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
We present a language for querying list-based complex objects. The language is shown to express precisely the polynomial-time generic list-object functions. The iteration mechanism of the language is based on a new approach wherein, in addition to the list over which the iteration is performed, a second list is used to control the number of iteration steps. During the iteration, the intermediate results can be moved to the output list as well as re-inserted into the list being iterated over. A simple syntactic constraint allows the growth rate of the intermediate results to be tightly controlled which, in turn, restricts the expressiveness of the language to PTIME. Data Parallel Systems Inc., 4617 Morningside Dr., Bloomington, IN, 47408; email: colby@dpsi.com y University of Regina, Dept. of Comp. Science, Regina, Saskatchewan S4S 0A2, Canada, email: saxton@cs.uregina.ca z Indiana University, Comp. Science Dept., Bloomington, IN 47405-4101, email: vgucht@cs.indiana.edu. 1 Intro...
A Query Language for NC
- In Proceedings of 13th ACM Symposium on Principles of Database Systems
, 1994
"... We show that a form of divide and conquer recursion on sets together with the relational algebra expresses exactly the queries over ordered relational databases which are NC -computable. At a finer level, we relate k nested uses of recursion exactly to AC k , k 1. We also give corresponding resul ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
We show that a form of divide and conquer recursion on sets together with the relational algebra expresses exactly the queries over ordered relational databases which are NC -computable. At a finer level, we relate k nested uses of recursion exactly to AC k , k 1. We also give corresponding results for complex objects. 1 Introduction NC is the complexity class of functions that are computable in poly-logarithmic time with polynomially many processors on a parallel random access machine (PRAM). The query language for NC discussed here is centered around a form of divide and conquer recursion (dcr ) on finite sets which has obvious potential for parallel evaluation and can easily express, for example, transitive closure and parity. Divide and conquer with parameters e; f; u defines the unique function ', notation dcr (e; f; u), taking finite sets as arguments, such that: '(;) def = e '(fyg) def = f(y) '(s 1 [ s 2 ) def = u('(s 1 ); '(s 2 )) when s 1 " s 2 = ; For parity, we t...
A Uniform Calculus for Collection Types
- OREGON GRADUATE INSTITUTE OF SCIENE & TECHNOLOGY
, 1994
"... We present a new algebra for collection types based on monoids and monoid homomorphisms. The types supported in this algebra can be any nested composition of collection types, including lists, sets, multisets (bags), vectors, and matrices. We also define a new calculus for this algebra, called mo ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We present a new algebra for collection types based on monoids and monoid homomorphisms. The types supported in this algebra can be any nested composition of collection types, including lists, sets, multisets (bags), vectors, and matrices. We also define a new calculus for this algebra, called monoid comprehensions, that captures operations involving multiple collection types in declarative form. This algebra can easily capture the semantics of many object-oriented database query languages that support mixed collection types, such as the OQL language of the ODMG-93 standard. In addition, it is ideal for expressing data parallelism and nested parallelism and can be effectively translated onto many parallel architectures. We present a normalization algorithm that reduces any expression in our algebra to a canonical form which, when evaluated, generates very few intermediate data structures. These canonical forms are amenable to a higher degree of parallelism than the original...
CoPa: a Parallel Programming Language for Collections
- University of Pennsylvania, Institute for
, 1998
"... In this paper we propose a new framework for parallel processing of collections. We define a high-level language called CoPa for processing nested sets, bags, and sequences (a generalization of arrays and lists). CoPa includes most features found in query languages for object-oriented or object-rela ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper we propose a new framework for parallel processing of collections. We define a high-level language called CoPa for processing nested sets, bags, and sequences (a generalization of arrays and lists). CoPa includes most features found in query languages for object-oriented or object-relational databases, and has, in addition, a powerful form of recursion not found in query languages. CoPa has a formal declarative definition of parallel complexity, as part of its specification. We prove the existence of a complexity-preserving compilation for CoPa, i.e. one which offers upper-bound guarantees for the parallel complexity of the compiled code. The majority of the compilation process is architecture-independent, using a parallel vector machine model (BVRAM). The BVRAM instructions form a sequence-algebra which is of independent interest, and have been carefully chosen to reconcile two conflicting demands: supporting the complexity-preserving compilation of CoPa's high-level con...
Finite Query Languages for Sequence Databases
, 1995
"... This paper develops a query language for sequence databases, such as genome databases and text databases. Unlike relational data, queries over sequential data can easily produce infinite answer sets, since the universe of sequences is infinite, even for a finite alphabet. The challenge is to develop ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper develops a query language for sequence databases, such as genome databases and text databases. Unlike relational data, queries over sequential data can easily produce infinite answer sets, since the universe of sequences is infinite, even for a finite alphabet. The challenge is to develop query languages that are both highly expressive and finite. This paper develops such a language. It is a subset of a recently developed logic called Sequence Datalog [22]. Sequence Datalog distinguishes syntactically between subsequence extraction and sequence construction. Extraction creates sequences of bounded length, and leads to safe recursion; while construction can create sequences of arbitrary length, and leads to unsafe recursion. In this paper, we develop syntactic restrictions for Sequence Datalog that allow sequence construction but preserve finiteness. The main idea is to use safe recursion to control and limit unsafe recursion. The main results are a finite language, called We...
Management Of Sequence Data
, 1996
"... One of the challenges facing today's database systems is the need to support complex data types, which are of growing importance in new application areas. The thesis addresses this problem, with a specific focus on supporting sequence data. A large part of the thesis deals with the details of seque ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
One of the challenges facing today's database systems is the need to support complex data types, which are of growing importance in new application areas. The thesis addresses this problem, with a specific focus on supporting sequence data. A large part of the thesis deals with the details of sequences. Issues covered include the model for sequence data, an algebra of operators to query the data, a query language to express the queries, optimization techniques and query processing algorithms. Performance results are presented from an implementation of these ideas, demonstrating the effects of the various optimizations. This detailed exploration of sequence data is one contribution of the thesis. The second contribution is a solution to the problem of integrating different data types, including sequences and relations, in a general-purpose database system. The thesis discusses the drawbacks of existing solutions, and then proposes a solution based on a novel E-ADT paradigm. This parad...
An Object Based Algebra for Parallel Query Processing and Optimization
, 1992
"... The Tarski algebra provides an algebraic foundation for object-based query languages. This is demonstrated by showing how queries expressed in a graph-oriented query language (based on the functional data model) can be translated into the Tarski algebra. The graphical representation of queries in co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The Tarski algebra provides an algebraic foundation for object-based query languages. This is demonstrated by showing how queries expressed in a graph-oriented query language (based on the functional data model) can be translated into the Tarski algebra. The graphical representation of queries in combination with the Tarski algebra is a convenient mechanism to study optimization in the context of object based query languages. We then propose extensions to the Tarski algebra that facilitate parallel query processing and address the issue of parallel query optimization in this algebraic framework. We also show how our framework helps in the study of non-monotonic query optimization. 1 Introduction Over the last decade, a variety of new database models [10] have been introduced to deal with data applications involving objects with a complex external and/or internal structure. These database models can be classified into three main categories: the complex object models, the function-bas...
Querying Sequence Databases with Transducers
- In International Workshop on Database Programming Languages (DBPL), number 1369 in Lecture Notes in Computer Science
, 1997
"... This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. T ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper develops a database query language called Transducer Datalog motivated by the needs of a new and emerging class of database applications. In these applications, such as text databases and genome databases, the storage and manipulation of long character sequences is a crucial feature. The issues involved in managing this kind of data are not addressed by traditional database systems, either in theory or in practice. To address these issues, we recently introduced a new machine model called a generalized sequence transducer. These generalized transducers extend ordinary transducers by allowing them to invoke other transducers as "subroutines." This paper establishes the computational properties of Transducer Datalog, a query language based on this new machine model. In the process, we develop a hierarchy of time-complexity classes based on the Ackermann function. The lower levels of this hierarchy correspond to well-known complexity classes, such as polynomial time...

