## A New Computation Model for Rack-Based Computing

Citations: | 1 - 0 self |

### BibTeX

@MISC{Afrati_anew,

author = {Foto N. Afrati and Jeffrey D. Ullman},

title = {A New Computation Model for Rack-Based Computing},

year = {}

}

### OpenURL

### Abstract

Implementations of map-reduce are being used to perform many operations on very large data. We explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. In particular, we look at strategies for taking the join of several relations and sorting large sets. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. In our study, we focus on communication among processes and processing time costs, both total and elapsed. We show tradeoffs among them depending on the computational limits we invoke on the processes. 1.

### Citations

3237 | The anatomy of a large-scale hypertextual Web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...her data-intensive applications mainly process large amounts of data that need special-purpose computations. The canonical problem today is the sparse-matrix-vector calculation involved with PageRank =-=[5]-=-, where the dimension of the matrix and vector can be in the 10’s of billions. Most of these computations are conceptually simple but their size has led implementors to distribute them across hundreds... |

2333 | Computational Complexity
- Papadimitriou
- 1994
(Show Context)
Citation Context ...that are incorporated to the model and the measures by which algorithms are evaluated. Communication Complexity. There have been several interesting models that address communication among processes. =-=[18]-=- is a central work in this area, although the first studies were based on VLSI complexity, e.g. [22] — the development of lower bounds on chip speed and area for chips that solve common problems such ... |

1693 | MapReduce: simplified data processing on large clusters
- Dean, Ghemawat
- 2004
(Show Context)
Citation Context ...ires prior specific permission and/or a fee. PODS’09, Providence, RI, USA Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$5.00. A powerful tool for building applications on such a file system is map-reduce =-=[10]-=- or its open-source equivalent Hadoop [2]. Briefly, map-reduce allows a Map function to be applied to data stored in one or more files, resulting in key-value pairs. Many instantiations of the Map fun... |

1129 |
A bridging model for parallel computation
- Valiant
(Show Context)
Citation Context ...logarithmic factor. On the other hand, because we permit parallel execution of processes, the elapsed time would be much less under our model than under Kung-Hong. The Valiant Model. In 1990, Valiant =-=[24]-=- introduced a bridging model between software and hardware having in mind such applications as those where communication was enabled by packet switching networks or optical crossbars, although the mod... |

908 | The Google File System
- Ghemawat, Gobioff, et al.
- 2003
(Show Context)
Citation Context ... place of file systems, operating systems, and database-management systems. 1.1 The New Software Stack and our Contribution Central to this stack is a file system such as the Google File System (GFS) =-=[13]-=- or Hadoop File System (HFS) [2]. Such file systems are characterized by: • Block sizes that are perhaps 1000 times larger than those in conventional file systems — multimegabyte instead of multikilob... |

694 |
The Art of Computer Programming, Volume 3: Sorting and Searching (2nd Edition
- Knuth
- 1998
(Show Context)
Citation Context ...cost. For example, we can use a Batcher sorting network [4] to get O(log 2 n) elapsed cost. A similar approach is to use the O(n log 2 n) version of Shell sort [21] developed by V. Pratt ([19] or see =-=[15]-=-). While we shall not go into the details of exactly how Batcher’s sorting algorithm works, at a high level, it implements merge sort with a recursive parallel merge, to make good use of parallelism. ... |

511 | Bigtable: A distributed storage system for structured data
- Chang, Dean, et al.
(Show Context)
Citation Context ...ller manages Map and Reduce processes and is able to redo them if a process fails. The new software stack includes higher-level, more database-like facilities, as well. Examples are Google’s BigTable =-=[7]-=-, or Yahoo!’s PNUTS [9]. At a still higher level, Yahoo!’s PIG/PigLatin [17] translates relational operations such as joins into map-reduce computations. However, there are concerns that as effective ... |

501 | Sorting networks and their applications
- Batcher
- 1968
(Show Context)
Citation Context ...O(n) elapsed communication/processing costs. Known parallel sorting algorithms allow us to get close to the optimal O(log n) elapsed processing cost. For example, we can use a Batcher sorting network =-=[4]-=- to get O(log 2 n) elapsed cost. A similar approach is to use the O(n log 2 n) version of Shell sort [21] developed by V. Pratt ([19] or see [15]). While we shall not go into the details of exactly ho... |

349 | Pig latin: a not-so-foreign language for data processing
- Olston, Reed, et al.
(Show Context)
Citation Context ... fails. The new software stack includes higher-level, more database-like facilities, as well. Examples are Google’s BigTable [7], or Yahoo!’s PNUTS [9]. At a still higher level, Yahoo!’s PIG/PigLatin =-=[17]-=- translates relational operations such as joins into map-reduce computations. However, there are concerns that as effective the map-reduce framework might be with certain tasks, there are issues that ... |

167 |
I/o complexity: the red-blue pebbling game
- Hong, Kung
- 1981
(Show Context)
Citation Context ...at different from what has been looked at previously, and these differences naturally change what the best algorithms are for many problems. The Kung-Hong Model. A generation ago, the Kung-Hong model =-=[14]-=- examined the amount of I/O (transfer between main and secondary memory) that was needed on a processor that had a limited amount of main memory. They gave a lower bound for matrix-multiplication in t... |

165 | Direct bulk-synchronous parallel algorithms
- Gerbessiotis, Valiant
- 1994
(Show Context)
Citation Context ...ical crossbars, although the model goes arguably beyond that. One of the concerns was to compare with sequential or PRAM algorithms and show competitiveness for several problems including sorting. In =-=[12]-=-, a probabilistic algorithm for sorting is developed for this model. The goal in this algorithm is to optimize the competitive ratio of the total number of operations and the ratio between communicati... |

124 |
Map-Reduce-Merge: Simplified Relational Data Processing on LargeClusters
- Yang, Dasdan, et al.
- 2007
(Show Context)
Citation Context ...e computations. However, there are concerns that as effective the map-reduce framework might be with certain tasks, there are issues that are not effectively addressed by this framework. For example, =-=[8]-=- talks about adding to map-reduce a “merge” phase and demonstrates how this can express relational algebra operators. In [11] a discussion is held that the efficiency of a DBMS, as embodied in tools s... |

119 | PNUTS: Yahoo!’s hosted data serving platform
- Cooper, Ramakrishnan, et al.
- 2008
(Show Context)
Citation Context ...duce processes and is able to redo them if a process fails. The new software stack includes higher-level, more database-like facilities, as well. Examples are Google’s BigTable [7], or Yahoo!’s PNUTS =-=[9]-=-. At a still higher level, Yahoo!’s PIG/PigLatin [17] translates relational operations such as joins into map-reduce computations. However, there are concerns that as effective the map-reduce framewor... |

87 |
Area-time complexity for VLSI
- Thompson
- 1979
(Show Context)
Citation Context ... Complexity. There have been several interesting models that address communication among processes. [18] is a central work in this area, although the first studies were based on VLSI complexity, e.g. =-=[22]-=- — the development of lower bounds on chip speed and area for chips that solve common problems such as sorting. Our model is quite different from VLSI models, since we place no constraint on where pro... |

86 | High-Performance Sorting on Networks of Workstations
- Arpaci-Dusseau, Arpaci-Dusseau, et al.
- 1997
(Show Context)
Citation Context ... even small deviations from the standard map-reduce framework. The sorting task has often been used for testing computational environments about data management applications. For example, the goal in =-=[3, 6, 20]-=- is to explore the viability of commercial technologies for utilizing cluster resources, racks of computers and disks; in these works, algorithms for external sorting are implemented with the focus on... |

60 | AlphaSort: A RISC Machine Sort
- Nyberg, Barclay, et al.
- 1994
(Show Context)
Citation Context ...er resources, racks of computers and disks; in these works, algorithms for external sorting are implemented with the focus on I/O efficiency. These algorithms are tested against well known benchmarks =-=[16, 20]-=-. The map-reduce framework does not lent well to the sorting task due to its high degree of sequentiality. • We develop the first sorting algorithm for computer environments with the same assumptions ... |

54 | Joulesort: a balanced energy-efficiency benchmark
- Rivoire, Shah, et al.
- 2007
(Show Context)
Citation Context ... even small deviations from the standard map-reduce framework. The sorting task has often been used for testing computational environments about data management applications. For example, the goal in =-=[3, 6, 20]-=- is to explore the viability of commercial technologies for utilizing cluster resources, racks of computers and disks; in these works, algorithms for external sorting are implemented with the focus on... |

40 |
The Input/Output Complexity of Transitive Closure
- Ullman, Yannakakis
- 1991
(Show Context)
Citation Context ...needed on a processor that had a limited amount of main memory. They gave a lower bound for matrix-multiplication in this model. The same model was used to explore transitive closure algorithms ([1], =-=[23]-=-) later. One important difference between the Kung-Hong model and the model we present here is that we place a limit on communication, not local memory. 1 However, in current map-reduce implementation... |

31 |
A high-speed sorting procedure
- Shell
(Show Context)
Citation Context ...the optimal O(log n) elapsed processing cost. For example, we can use a Batcher sorting network [4] to get O(log 2 n) elapsed cost. A similar approach is to use the O(n log 2 n) version of Shell sort =-=[21]-=- developed by V. Pratt ([19] or see [15]). While we shall not go into the details of exactly how Batcher’s sorting algorithm works, at a high level, it implements merge sort with a recursive parallel ... |

26 | Direct algorithms for computing the transitive closure of database relations
- Agrawal, Jagadish
- 1987
(Show Context)
Citation Context ... was needed on a processor that had a limited amount of main memory. They gave a lower bound for matrix-multiplication in this model. The same model was used to explore transitive closure algorithms (=-=[1]-=-, [23]) later. One important difference between the Kung-Hong model and the model we present here is that we place a limit on communication, not local memory. 1 However, in current map-reduce implemen... |

20 |
Shellsort and Sorting Networks
- Pratt
- 1972
(Show Context)
Citation Context ... processing cost. For example, we can use a Batcher sorting network [4] to get O(log 2 n) elapsed cost. A similar approach is to use the O(n log 2 n) version of Shell sort [21] developed by V. Pratt (=-=[19]-=- or see [15]). While we shall not go into the details of exactly how Batcher’s sorting algorithm works, at a high level, it implements merge sort with a recursive parallel merge, to make good use of p... |

13 |
Millennium sort: A clusterbased application for windows nt using dcom, river primitives and the virtual interface architecture
- Buonadonna, Coates, et al.
- 1999
(Show Context)
Citation Context ... even small deviations from the standard map-reduce framework. The sorting task has often been used for testing computational environments about data management applications. For example, the goal in =-=[3, 6, 20]-=- is to explore the viability of commercial technologies for utilizing cluster resources, racks of computers and disks; in these works, algorithms for external sorting are implemented with the focus on... |

7 |
Mapreduce – a major step backward
- DeWitt, Stonebraker
- 2008
(Show Context)
Citation Context ...e issues that are not effectively addressed by this framework. For example, [8] talks about adding to map-reduce a “merge” phase and demonstrates how this can express relational algebra operators. In =-=[11]-=- a discussion is held that the efficiency of a DBMS, as embodied in tools such as indexes, are missing from the map-reduce framework. • This paper presents a model in which we can improve the efficien... |