Results 1 -
6 of
6
CMD: A Multidimensional Declustering Method for Parallel Database Systems
- In Proceedings of the Int. Conf. on Very Large Data Bases
, 1992
"... I/O parallelism appears to be a promising approach to achieving high performance in parallel database systems. In such systems, it is essential to decluster database files into fragments and spread them across multiple disks so that the DBMS software can exploit the I/O bandwidth reading and writing ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
I/O parallelism appears to be a promising approach to achieving high performance in parallel database systems. In such systems, it is essential to decluster database files into fragments and spread them across multiple disks so that the DBMS software can exploit the I/O bandwidth reading and writing the disks in parallel. In this paper, we consider the problem of declustering multidimensional data on a parallel disk system. Since the multidimensional range query is the main work-horse for applications accessing such data, our aim is to provide efficient support for it. A new declustering method for parallel disk systems, called coordinate modulo distribution (CMD), is proposed. Our analysis shows that the method achieves optimum parallelism for a very high percentage of range queries on multidimensional data, if the distribution of data on each dimension is stationary. We have derived the exact conditions under which optimality is achieved. Also provided are the worst and average case bounds ...
A survey of parallel execution strategies for transitive closure and logic programs
- DISTRIBUTED AND PARALLEL DATABASES
, 1993
"... An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particu ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple
LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers
, 1995
"... LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, e ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, easily reaching Gbytes. Address calculus does not require any centralized component that could lead to a hot- spot. Access times to the le can be under a millisecond and the le can be used in parallel by several client processors. We showthe LH*lh design, and report on the performance analysis. This includes experiments on the Parsytec GC/PowerPlus multicomputer with up to 128 Power PCs and 32 MB of distributed RAM per node. We prove the e ciency of the method and justify various algorithmic choices that were made. LH*lh opens a new perspective for high-performance applications, especially for the database management of new types of data and in real-time environments.
A Performance Evaluation of Load Balancing Techniques for Join Operations on Multicomputer Database Systems
- In Proc. of the 11th International Conference on Data Engineering
, 1995
"... There has been a wealth of research in the area of parallel join algorithms. Among them, hash-based algorithms are particularly suitable for shared-nothing database systems. The eectiveness of these techniques depends on the uniformity in the distribution of the join attribute values. When this cond ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
There has been a wealth of research in the area of parallel join algorithms. Among them, hash-based algorithms are particularly suitable for shared-nothing database systems. The eectiveness of these techniques depends on the uniformity in the distribution of the join attribute values. When this condition is not met, a severe uctuation may occur among the bucket sizes causing uneven workload for the processing nodes. Many parallel join algorithms with load balancing capability have been proposed to address this problem. Among them, the sampling and incremental approaches have been shown to provide improvement over the more conventional methods. The comparison between these two approaches, however, have not been investigated. In this paper, we improve these techniques, and implement them on nCUBE/2 parallel computer to compare their performance. Our study indicates that the sampling technique is the better approach.
Performance of Load Balancing Techniques for Join Operations in Shared-Nothing Database Management Systems
"... We investigate various load balancing approaches for hash-based join techniques popular in multicomputer-based shared-nothing database systems. When the tuples are not uniformly distributed among the hash buckets, redistribution of these buckets among the processors is necessary to maintain good ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We investigate various load balancing approaches for hash-based join techniques popular in multicomputer-based shared-nothing database systems. When the tuples are not uniformly distributed among the hash buckets, redistribution of these buckets among the processors is necessary to maintain good system performance. Two recent load balancing techniques rely on sampling and incremental balancing, respectively, have been shown to be more robust than conventional methods. The comparison of these two approaches, however, has not been investigated. In this study, we improve these two schemes, and implement them along with a conventional method and a standard join technique which does not do load balancing on an nCUBE/2 parallel computer to compare their performance. Our experimental results indicate that the sampling technique is the better approach. To further evaluate the performance of these techniques under diverse hardware conditions, we also develop a cost model, and implemen...
using Transparent Distribution
"... and their applications. SMC is sponsored by the Netherlands Organization for Scientific Research (NWO). CWI is a member of ..."
Abstract
- Add to MetaCart
and their applications. SMC is sponsored by the Netherlands Organization for Scientific Research (NWO). CWI is a member of

