Results 1 -
3 of
3
Practical Skew Handling in Parallel Joins
- IN PROCEEDINGS OF THE 18TH VLDB CONFERENCE
, 1992
"... We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each ..."
Abstract
-
Cited by 85 (8 self)
- Add to MetaCart
We present an approach to dealing with skew in parallel joins in database systems. Our approach is easily implementable within current parallel DBMS, and performs well on skewed data without degrading the performance of the system on non-skewed data. The main idea is to use multiple algorithms, each specialized for a di erent degree of skew, and to use a small sample of the relations being joined to determine which algorithm is appropriate. We developed, implemented, and experimented with four new skew-handling parallel join algorithms; one, which wecall virtual processor range partitioning, was the clear winner in high skew cases, while traditional hybrid hash join was the clear winner in lower skew or no skew cases. We present experimental results from an implementation of all four algorithms on the Gamma parallel database machine. To our knowledge, these are the rst reported skew-handling numbers from an actual implementation.
LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers
, 1995
"... LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, e ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, easily reaching Gbytes. Address calculus does not require any centralized component that could lead to a hot- spot. Access times to the le can be under a millisecond and the le can be used in parallel by several client processors. We showthe LH*lh design, and report on the performance analysis. This includes experiments on the Parsytec GC/PowerPlus multicomputer with up to 128 Power PCs and 32 MB of distributed RAM per node. We prove the e ciency of the method and justify various algorithmic choices that were made. LH*lh opens a new perspective for high-performance applications, especially for the database management of new types of data and in real-time environments.

