Results 1 - 10
of
10
Using SQL to Build New Aggregates and Extenders for Object-Relational Systems
- In Proceedings of 26th International Conference on Very Large Data Bases
, 2000
"... User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that supports an SQLbased language for introducing new UDAs. AXL is easy to learn and use for database prog ..."
Abstract
-
Cited by 42 (12 self)
- Add to MetaCart
User-defined Aggregates (UDAs) provide a versatile mechanism for extending the power and applicability of Object-Relational Databases (O-R DBs). In this paper, we describe the AXL system that supports an SQLbased language for introducing new UDAs. AXL is easy to learn and use for database programmers because it preserves the constructs, programming paradigm and data types of SQL (whereas there is an `impedance mismatch' between SQL and the procedural languages of user-defined functions currently used in O-R DBs). AXL will also inherit the benefits of database query languages, such as scalability, data independence and parallelizability. In this paper, we show that, while adding only minimal extensions to SQL, AXL is very powerful and capable of expressing complex algorithms e#ciently. We demonstrate this by coding data mining functions and other advanced applications that, previously, had been a major problem for SQL databases. Due to its flexibility, SQL-compati...
Autopart: Automating schema design for large scientific databases using data partitioning
- In Proceedings of the 16th International Conference on Scientific and Statistical Database Management
, 2004
"... Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of the scientific workloads. Current automated physical design tools focus on the selection of indexes and materialized views. In large-scale scientific databases, however, the data volume and the continuous insertion of new data allows for only limited indexes and materialized views. By contrast, data partitioning does not replicate data, thereby reducing space requirements and minimizing update overhead. In this paper we propose AutoPart, an algorithm that automatically partitions database tables to optimize sequential access assuming prior knowledge of a representative workload. The resulting schema is indexed using a fraction of the space required for indexing the original schema. To evaluate AutoPart, we build an automated schema design tool that interfaces to commercial database systems. We experiment with AutoPart in the context of the Sloan Digital Sky Survey database, a real-world astronomical database, running on SQL Server 2000. Our experiments corroborate the benefits of partitioning for large-scale systems: Partitioning alone improves query execution performance by a factor of two on average. Combined with indexes, the new schema also outperforms the indexed original schema by 20 % (for queries) and a factor of five (for updates), while using only half the original index space.
Reusing Invariants: A New Strategy for Correlated Queries
- In SIGMOD
, 1998
"... Correlated queries are very common and important in decision support systems. Traditional nested iteration evaluation methods for such queries can be very time consuming. When they apply, query rewriting techniques have been shown to be much more efficient. But query rewriting is not always possible ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Correlated queries are very common and important in decision support systems. Traditional nested iteration evaluation methods for such queries can be very time consuming. When they apply, query rewriting techniques have been shown to be much more efficient. But query rewriting is not always possible. When query rewriting does not apply, can we do something better than the traditional nested iteration methods? In this paper, we propose a new invariant technique to evaluate correlated queries efficiently. The basic idea is to recognize the part of the subquery that is not related to the outer references and cache the result of that part after its first execution. Later, we can reuse the result and combine it with the result of the rest of the subquery that is changing for each iteration. Our technique applies to arbitrary correlated subqueries. This paper introduces algorithms to recognize the invariant part of a data flow tree, and to restructure the evaluation plan to reuse the stored ...
On relational support for XML publishing: Beyond sorting and tagging
- In Proceedings of SIGMOD 2003
, 2003
"... In this paper, we study whether the need for efficient XML publishing brings any new requirements for relational query engines, or if sorting query results in the relational engine and tagging them in middleware is sufficient. We observe that the mismatch between the XML data model and the relationa ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
In this paper, we study whether the need for efficient XML publishing brings any new requirements for relational query engines, or if sorting query results in the relational engine and tagging them in middleware is sufficient. We observe that the mismatch between the XML data model and the relational model requires relational engines to be enhanced for efficiency. Specifically, they need to support relationvalued variables. We discuss how such support can be provided through the addition of an operator, GApply, with minimal extensions to existing relational engines. We discuss how the operator may be exposed in SQL syntax and provide a comprehensive study of optimization rules that govern this operator. We report the results of a preliminary performance evaluation showing the speedup obtained through our approach and the effectiveness of our optimization rules. 1.
Monad Comprehensions: A Versatile Representation for Queries
, 2003
"... eeply, you will note that we generate query results solely through the side-e#ect free construction of values from simpler constituents and that functional composition will be the predominant way of forming complex queries. Referential transparency is the key to transformational programming and equa ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
eeply, you will note that we generate query results solely through the side-e#ect free construction of values from simpler constituents and that functional composition will be the predominant way of forming complex queries. Referential transparency is the key to transformational programming and equational reasoning. Relatively few components are needed in our initial query language core. We grow this language through function definitions of the form f # e where e is an expression built from components we have already introduced. The functions f so defined will get more complex as we go on until we are ready to give the meaning of SQL, OQL [1.4], or XPath [1.1] query clauses such as select-from-where, exists-in, flatten, or path expressions. 1.1.1 Notation, Types, and Values If you are familiar with notational conventions of functional programming languages such as Haskell [1.14] you will feel at home right away. Figure 1.1 introduces the core expression forms e and their notation.
Primitives for Workload Summarization and Implications for SQL
"... Workload information has proved to be a crucial component for database-administration tasks as well as for analysis of query logs to understand user behavior and system usage. These tasks require the ability to summarize large SQL workloads. In this paper, we identify primitives that are important t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Workload information has proved to be a crucial component for database-administration tasks as well as for analysis of query logs to understand user behavior and system usage. These tasks require the ability to summarize large SQL workloads. In this paper, we identify primitives that are important to enable many important workload-summarization tasks. These primitives also appear to be useful in a variety of practical scenarios besides workload summarization. Today’s SQL is inadequate to express these primitives conveniently. We discuss possible extensions to SQL and the relational engine to efficiently support such summarization primitives. 1.
Evaluation of Ad Hoc OLAP: In-Place Computation
- in SSDBM
, 1999
"... Large scale data analysis and mining activities, such as identifying interesting trends, making unusual patterns to stand out and verifying hypotheses, require sophisticated information extraction queries. Being able to express these data mining queries concisely is of major importance not only from ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Large scale data analysis and mining activities, such as identifying interesting trends, making unusual patterns to stand out and verifying hypotheses, require sophisticated information extraction queries. Being able to express these data mining queries concisely is of major importance not only from the user's, but also from the system's point of view. Recent research in OLAP has focused on datacubes and their applications; however, expression and processing of ad hoc decision support queries has been given very little attention. In this paper we present an appropriate framework for these queries and introduce a syntactic construct to support it. This SQL extension allows most OLAP queries, such as complex intra- and inter-group comparisons, trends and hierarchical comparisons, to be expressed in a compact, intuitive and simple manner. However, this syntactic extension is not the focus of this paper. This succinct representation of a complex OLAP query translates immediately to a novel...
How to Query Network Traffic Data Using Data Streams
, 2002
"... In this paper, we show how ad-hoc queries can be made in real time on network traffic data using a data stream model. We define a rich class of ordering properties and use them to label the attributes of a stream. The attribute ordering properties can be used to optimize data stream queries, enablin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we show how ad-hoc queries can be made in real time on network traffic data using a data stream model. We define a rich class of ordering properties and use them to label the attributes of a stream. The attribute ordering properties can be used to optimize data stream queries, enabling the efficient evaluation of complex ad-hoc queries. The rich set of ordering properties allows the output of many operators on data streams to also be labeled with ordering properties, allowing the composability of data stream queries. We show how an SQL-like language can express many network analysis query, and be translated into data stream operators. We use the data stream model described in this paper as the query processing architecture for Gigascope, a fast and flexible network
ABSTRACT Non-Linear Prefixes in Query Languages
"... In first order logic there are two main extensions to quantification: generalized quantifiers and non-linear prefixes. While generalized quantifiers have been explored from a database perspective, non-linear prefixes have not – most likely because of complexity concerns. In this paper we first illus ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In first order logic there are two main extensions to quantification: generalized quantifiers and non-linear prefixes. While generalized quantifiers have been explored from a database perspective, non-linear prefixes have not – most likely because of complexity concerns. In this paper we first illustrate the usefulness of non-linear prefixes in query languages by means of example queries. We then introduce the subject formally, distinguishing between two forms of nonlinearity: branching and cumulation. To escape complexity concerns, we focus on monadic quantifiers. In this context, we show that branching does not extend the expressive power of first order logic when it is interpreted over finite models, while cumulation does not extend the expressive power when it is interpreted over bounded models. Branching and cumulation do, however, allow us to formulate some queries in a succinct and elegant manner. When branching and cumulation are interpreted over infinite models, we show that the resulting language can be embedded in an infinitary logic proposed by Libkin. We also discuss non-linear prefixes from an algorithmic point of view.
Comprehending Queries
, 1999
"... There are no compelling reasons why database-internal query representations have to be designated by operators. This text describes a world in which datatypes determine the comprehension of queries. In this world, a datatype is characterized by its algebra of value constructors. These algebras are p ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
There are no compelling reasons why database-internal query representations have to be designated by operators. This text describes a world in which datatypes determine the comprehension of queries. In this world, a datatype is characterized by its algebra of value constructors. These algebras are principal. Query operators are secondary in the sense that they simply box (recursive)programs that describe how to form a query result by application of datatype constructors. Often, operators will be unboxed to inspect and possibly rewrite these programs. Query optimization then means to deal with the transformation of programs. The predominant role of the constructor algebras suggests that this model understands queries as mappings between such algebras. The key observation that makes the whole approach viable is that (a) homomorphic mappings are expressive enough to cover declarative user query languages like OQL or recent SQL dialects, and, at the same time, (b)a single program form suffices

