## A New Computation Model for Rack-Based Computing

### Abstract

Implementations of map-reduce are being used to perform many operations on very large data. We explore alternative ways that a system could use the environment and capabilities of map-reduce implementations such as Hadoop, yet perform operations that are not identical to map-reduce. In particular, we look at strategies for taking the join of several relations and sorting large sets. The centerpiece of this exploration is a computational model that captures the essentials of the environment in which systems like Hadoop operate. Files are unordered sets of tuples that can be read and/or written in parallel; processes are limited in the amount of input/output they can perform, and processors are available in essentially unlimited supply. In our study, we focus on communication among processes and processing time costs, both total and elapsed. We show tradeoffs among them depending on the computational limits we invoke on the processes. 1.

