Results 1 -
1 of
1
Data Placement in Bubba
- PROCEEDINGS OF THE ACM-SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA
, 1988
"... Thus paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications bemg developed at MCC “Highly-parallel” lmplres that load balancmng IS a cntlcal performance issue “Data-mtenave” means data IS so large that operatrons should be executed where the d ..."
Abstract
-
Cited by 96 (0 self)
- Add to MetaCart
Thus paper examines the problem of data placement in Bubba, a highly-parallel system for data-intensive applications bemg developed at MCC “Highly-parallel” lmplres that load balancmng IS a cntlcal performance issue “Data-mtenave” means data IS so large that operatrons should be executed where the data resides As a result, data placement becomes a cntlcal performance issue In general, determmmg the optimal placement of data across processmg nodes for performance IS a difficult problem We describe our heuristic approach to solvmg the data placement problem w Bubba We then present expenmental results using a specific workload to provide msrght into the problem Several researchers have argued the benefits of deelustering (1 e, spreading each base relation over many nodes) We show that as declustermg IS increased. load balancing contmues to improve However, for transactions mvolvmg complex Joins, further declusterrng reduces throughput because of communications, startup and termmatron overhead We argue that data placement, especially declustermg, m a highly-parallel system must be considered early in the design, so that mechanrsms can be included for supportmg variable declustermg, for mmtmlzmg the most significant overheads associated with large-scale declustenng, and for gathering the required statistics.

