Results 1 
5 of
5
GYM: A Multiround Join Algorithm In MapReduce And Its Analysis
"... We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the query in O(d+log(n)) rounds andO(n (IN w+OUT)2 M) communication cost, where M is the memory available per machine in the cluster and IN and OUT are the sizes of input and output of the query, respectively. M is assumed to be IN 1 , for some constant > 1. Using GYM we achieve two main results: (1) Every widthw query can be computed in O(n) rounds of MapReduce with O(n (IN w+OUT)2 M
From Theory to Practice: Efficient Join Query Evaluation in a Parallel Database System
"... Big data analytics often requires processing complex queries using massive parallelism, where the main performance metrics is the communication cost incurred during data reshuffling. In this paper, we describe a system that can compute efficiently complex join queries, including queries with cyclic ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Big data analytics often requires processing complex queries using massive parallelism, where the main performance metrics is the communication cost incurred during data reshuffling. In this paper, we describe a system that can compute efficiently complex join queries, including queries with cyclic joins, on a massively parallel architecture. We build on two independent lines of work for multijoin query evaluation: a communicationoptimal algorithm for distributed evaluation, and a worstcase optimal algorithm for sequential evaluation. We evaluate these algorithms together, then describe novel, practical optimizations for both algorithms. 1.
Changing the Face of Database Cloud Services with Personalized Service Level Agreements
"... We develop and evaluate an approach for generating Personalized Service Level Agreements (PSLAs) that separate cloud users from the details of compute resources behind a cloud database management service. PSLAs retain the possibility to tradeoff performance for cost and do so in a manner specific ..."
Abstract
 Add to MetaCart
(Show Context)
We develop and evaluate an approach for generating Personalized Service Level Agreements (PSLAs) that separate cloud users from the details of compute resources behind a cloud database management service. PSLAs retain the possibility to tradeoff performance for cost and do so in a manner specific to the user’s database. 1.
GYM: A Multiround Join Algorithm In MapReduce
"... We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the ..."
Abstract
 Add to MetaCart
(Show Context)
We study the problem of computing the join of n relations in multiple rounds of MapReduce. We introduce a distributed and generalized version of Yannakakis’s algorithm, called GYM. GYM takes as input any generalized hypertree decomposition (GHD) of a query of width w and depth d, and computes the query in O(d) rounds and O(n(INw + OUT)) communication and computation cost. Using GYM we achieve two main results: (1) Every widthw query can be computed in O(n) rounds of MapReduce with O(n(INw + OUT)) cost; (2) Every widthw query can be computed inO(log(n)) rounds of MapReduce withO(n(IN3w+OUT)) cost. We achieve our second result by showing how to construct a O(log(n))depth and width3w GHD of a query of width w. We describe another general technique to construct even shorter depth GHDs with longer widths, effectively showing a spectrum of tradeoffs one can make between communication and computation and the number of rounds of MapReduce. By simulating MapReduce in the PRAM model, our second main result also implies the result of Gottlob et al. [12] that computing acyclic and constantwidth queries are in NC. In fact, for certain queries, our approach yields significantly fewer PRAM steps than does the construction of the latter paper. However, we achieve our results using only Yannakakis’s algorithm, which has been perceived to have a sequential nature. Instead, we surprisingly show that Yannakakis’s algorithm can be parallelized significantly by giving it as input shortdepth GHDs of queries. 1.
Towards an Analytics Query Engine
"... This vision paper presents new challenges and opportunities in the area of distributed data analytics, at the core of which are data mining and machine learning. At rst, we provide an overview of the current state of the art in the area and then analyse two aspects of data analytics systems, seman ..."
Abstract
 Add to MetaCart
(Show Context)
This vision paper presents new challenges and opportunities in the area of distributed data analytics, at the core of which are data mining and machine learning. At rst, we provide an overview of the current state of the art in the area and then analyse two aspects of data analytics systems, semantics and optimization. We argue that these aspects will emerge as important issues for the data management community in the next years and propose promising research directions for solving them.