Results 1  10
of
31
Query optimization in database systems
 ACM Computing Surveys
, 1984
"... Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logicbased and semantic transformations, fast imple ..."
Abstract

Cited by 207 (0 self)
 Add to MetaCart
Efficient methods of processing unanticipated queries are a crucial prerequisite for the success of generalized database management systems. A wide variety of approaches to improve the performance of query evaluation algorithms have been proposed: logicbased and semantic transformations, fast implementations of basic operations, and combinatorial or heuristic algorithms for generating alternative access plans and choosing among them. These methods are presented in the framework of a general query evaluation procedure using the relational calculus representation of queries. In addition, nonstandard query optimization issues such as higher level query evaluation, query optimization in distributed databases, and use of database machines are addressed. The focus, however, is on query optimization in centralized database systems.
Join Indices
 ACM Transactions on Database Systems
, 1987
"... In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of com ..."
Abstract

Cited by 201 (3 self)
 Add to MetaCart
In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.
Parallel sorting on a sharednothing architecture using probabilistic splitting
, 1991
"... We consider the problem of external sorting in a sharednothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a paralle ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
We consider the problem of external sorting in a sharednothing multiprocessor. A critical step in the algorithms we consider is to determine the range of sort keys to be handled by each processor. We consider two techniques for determining these ranges of sort keys: exact splitting, using a parallel version of the algorithm proposed by Iyer, Ricard, and Varman; and probabilistic splitting, which uses sampling to estimate quantiles. We present analytic results showing that probabilistic splitting performs better than exact splitting. Finally, we present experimental results from an implementation of sorting via probabilistic splitting in the Gamma parallel database machine.
SVP  a Model Capturing Sets, Streams, and Parallelism
 In Proceedings of the 18th VLDB Conference
, 1992
"... We describe the SVP data model. The goal of SVP is to model both set and stream data, and to model parallelism in bulk data processing. SVP also shows promise for other parallel processing applications. SVP models collections, which include sets and streams as special cases. Collections are represen ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
We describe the SVP data model. The goal of SVP is to model both set and stream data, and to model parallelism in bulk data processing. SVP also shows promise for other parallel processing applications. SVP models collections, which include sets and streams as special cases. Collections are represented as ordered tree structures, and divideandconquer mappings are easily defined on these structures. We show that many useful database mappings (queries) have a divideandconquer format when specified using collections, and that this specification exposes parallelism. We formalize a class of divideandconquer mappings on collections called SVPtransducers. SVPtransducers generalize aggregates, set mappings, stream transductions, and scan computations. At the same time, they have a rigorous semantics based on continuity with respect to collection orderings, and permit implicit specification of both independent and pipeline parallelism. 1 Introduction Achieving parallelism in bulk data...
Parallel Query Processing Using Shared Memory Multiprocessors and Disk Arrays
, 1992
"... my research. I will also remember that it was from them that I learned how to appreciate a good beer and enjoy a good party. I would like to thank my fellow students Yongdong Wang and Chuentsai Sun for their valuable friendship and for all their help. I also would like to thank Guangrui Zhu and Ya ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
my research. I will also remember that it was from them that I learned how to appreciate a good beer and enjoy a good party. I would like to thank my fellow students Yongdong Wang and Chuentsai Sun for their valuable friendship and for all their help. I also would like to thank Guangrui Zhu and Yan Wei for being two special friends and making my life more interesting. Many thanks v also go to my college friends Yuzheng Ding and Jiyang Liu. Our communications have always been an inspiring source in my life. Although my parents and my sister are an ocean away, they have offered me their constant love and encouragement throughout my study. I would like to take this opportunity to thank them for everything they have done for me. Last, but the most, I would like to thank my dear wife, Nanyan Xiong. Without her love, understanding and support throughout my Ph.D. program, this thesis would not have been possible. This thesis is dedicated to her as a small token of my deep appreciation.
An Efficient PermutationBased Parallel Algorithm for RangeJoin in Hypercubes
 Griffith University
, 1993
"... The rangejoin of sets R and S is defined to be the set containing all tuples (r; s) that satisfy e 1 jr \Gamma sj e 2 , where r 2 R, s 2 S, e 1 and e 2 are fixed constants. This paper proposes an efficient parallel rangejoin algorithm in hypercubes. To compute the rangejoin of two sets R and S ..."
Abstract

Cited by 9 (9 self)
 Add to MetaCart
The rangejoin of sets R and S is defined to be the set containing all tuples (r; s) that satisfy e 1 jr \Gamma sj e 2 , where r 2 R, s 2 S, e 1 and e 2 are fixed constants. This paper proposes an efficient parallel rangejoin algorithm in hypercubes. To compute the rangejoin of two sets R and S on a hypercube of p processors (p jRj = m jSj = n), the proposed algorithm simply permutes the elements of R to obtain their possible combinations with the elements S and thus all possible local rangejoins. Requiring only O( m+n p ) local memory at each processor, our algorithm has a time complexity O(( n p + m) log n p ) in the best case when no element in S matches any element in R; O(T k sort + mn p ) in the worst case when all elements in S match each element in R, where T k sort = O(k log k) when all elements in S are distinct, and T k sort = O(k) when all elements in S are equal, k = n p . The generalcase time complexity of the algorithm is also shown. Keywords: Range...
Towards a Scalable Parallel Object Database  The Bulk Synchronous Parallel Approach
, 1996
"... Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in nonnumerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make generalpurpose parallel computing co ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in nonnumerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make generalpurpose parallel computing costeffective, the requirements for nonnumerical (or symbolic) applications, and the previous attempts to develop parallel databases. The central theme of the Bulk Synchronous Parallel model is to provide a high level abstraction of parallel computing hardware whilst providing a realisation of a parallel programming model that enables architecture independent programs to deliver scalable performance on diverse hardware platforms. Therefore, the primary objective of this report is to investigate the feasibility of developing a portable, scalable, parallel object database, based on the Bulk Synchronous Parallel model of computation. In particular, we devise a way of providing highlevel abstra...
Parallel HashBased Join Algorithms for a SharedEverything Environment
, 1996
"... We analyze the costs, and describe the implementation, of three hashedbased join algorithms for a generalpurpose sharedmemory multiprocessor. The three algorithms considered are the Hashed Loops, GRACE and Hybrid algorithms. We also describe the results of a set of experiments which validate the ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We analyze the costs, and describe the implementation, of three hashedbased join algorithms for a generalpurpose sharedmemory multiprocessor. The three algorithms considered are the Hashed Loops, GRACE and Hybrid algorithms. We also describe the results of a set of experiments which validate the cost models presented and demonstrate the relative performance of the three algorithms. Index Terms algorithms, cost models, parallel systems, query processing, relational database systems. 1 Introduction The LauRel project [10] is aimed at extending current relational database technology in three areas: data modelling, exploitation of parallelism, and structuring of the stored database. As part of this project, we are building a prototype database system which runs on a generalpurpose sharedmemory multiprocessor system. A key objective of this implementation is to exploit the parallelism offered by the underlying architecture. One of the ways to help achieve this objective is to develo...
Multiple Query Optimization for Data Analysis Applications on Clusters of SMPs
"... This paper is concerned with the efficient execution of multiple query workloads on a cluster of SMPs. We target applications that access and manipulate large scientific datasets. Queries in these applications involve userdefined processing operations on data and distributed data structures to hold ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
This paper is concerned with the efficient execution of multiple query workloads on a cluster of SMPs. We target applications that access and manipulate large scientific datasets. Queries in these applications involve userdefined processing operations on data and distributed data structures to hold intermediate and final results. Our goal is to implement system components to leverage previously computed query results and to effectively utilize processing power and aggregated I/O bandwidth on SMP nodes so that both single queries and multiquery batches can be efficiently executed. 1
A parallel algorithm for record clustering
 ACM Trans. on Database Systems
, 1990
"... We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the Ptree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We present an efficient heuristic algorithm for record clustering that can run on a SIMD machine. We introduce the Ptree, and its associated numbering scheme, which in the split phase allows each processor independently to compute the unique cluster number of a record satisfying an arbitrary query. We show that by restricting ourselves in the merge phase to combining only sibling clusters, we obtain a parallel algorithm whose speedup ratio is optimal in the number of processors used. Finally, we report on experiments showing that our method produces substantial savings in an environment with relatively little overlap among the queries.