Results 1 - 10
of
21
MapReduce: simplified data processing on large clusters
- OSDI’04: PROCEEDINGS OF THE 6TH CONFERENCE ON SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION
, 2004
"... MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with t ..."
Abstract
-
Cited by 913 (3 self)
- Add to MetaCart
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program’s execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system. Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google’s clusters every day. 1
Google’s MapReduce Programming Model — Revisited
"... Google’s MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google’s domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce a ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
Google’s MapReduce programming model serves for processing large data sets in a massively parallel manner. We deliver the first rigorous description of the model including its advancement as Google’s domain-specific language Sawzall. To this end, we reverse-engineer the seminal papers on MapReduce and Sawzall, and we capture our findings as an executable specification. We also identify and resolve some obscurities in the informal presentation given in the seminal papers. We use typed functional programming (specifically Haskell) as a tool for design recovery and executable specification. Our development comprises three components: (i) the basic program skeleton that underlies MapReduce computations; (ii) the opportunities for parallelism in executing MapReduce computations; (iii) the fundamental characteristics of Sawzall’s aggregators as an advancement of the MapReduce approach. Our development does not formalize the more implementational aspects of an actual, distributed execution of MapReduce computations.
Optimization Rules for Programming with Collective Operations
- IPPS/SPDP'99. 13th Int. Parallel Processing Symp. & 10th Symp. on Parallel and Distributed Processing
, 1999
"... We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fu ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a formal framework for reasoning about collective operations; (2) a set of optimization rules which save communications by fusing several collective operations into one; (3) performance estimates, which guide the application of optimization rules depending on the machine characteristics; (4) a simple case study with the first results of machine experiments.
A Transformational Framework for Skeletal Programs: Overview and Case Study
- Parallel and Distributed Processing. IPPS/SPDP’99 Workshops Proceedings. Lecture Notes in Computer Science 1586
, 1999
"... A structured approach to parallel programming allows to construct applications by composing skeletons, i.e., recurring patterns of task- and data-parallelism. First academic and commercial experience with skeleton-based systems has demonstrated both the benefits of the approach and the lack of a spe ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
A structured approach to parallel programming allows to construct applications by composing skeletons, i.e., recurring patterns of task- and data-parallelism. First academic and commercial experience with skeleton-based systems has demonstrated both the benefits of the approach and the lack of a special methodology for algorithm design and performance prediction. In the paper, we take a first step toward such a methodology, by developing a general transformational framework named FAN, and integrating it with an existing skeleton-based programming system, P3L. The framework includes a new functional abstract notation for expressing parallel algorithms, a set of semantics-preserving transformation rules, and analytical estimates of the rules' impact on the program performance. The use of FAN is demonstrated on a case study: we design a parallel algorithm for the maximum segment sum problem, translate the algorithm in P3L, and experiment with the target C+MPI code on a Fujitsu AP1000 parallel machine.
Parallelization of Divide-and-Conquer by Translation to Nested Loops
- J. Functional Programming
, 1997
"... We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the prob ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
We propose a sequence of equational transformations and specializations which turns a divide-and-conquer skeleton in Haskell into a parallel loop nest in C. Our initial skeleton is often viewed as general divide-and-conquer. The specializations impose a balanced call tree, a fixed degree of the problem division, and elementwise operations. Our goal is to select parallel implementations of divide-and-conquer via a space-time mapping, which can be determined at compile time. The correctness of our transformations is proved by equational reasoning in Haskell; recursion and iteration are handled by induction. Finally, we demonstrate the practicality of the skeleton by expressing Strassen's matrix multiplication in it.
A library of constructive skeletons for sequential style of parallel programming
- In InfoScale ’06: Proceedings of the 1st international conference on Scalable information systems, volume 152 of ACM International Conference Proceeding Series
, 2006
"... With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level pa ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
With the increasing popularity of parallel programming environments such as PC clusters, more and more sequential programmers, with little knowledge about parallel architectures and parallel programming, are hoping to write parallel programs. Numerous attempts have been made to develop high-level parallel programming libraries that use abstraction to hide low-level concerns and reduce difficulties in parallel programming. Among them, libraries of parallel skeletons have emerged as a promising way towards this direction. Unfortunately, these libraries are not well accepted by sequential programmers, because of incomplete elimination of lower-level details, ad-hoc selection of library functions, unsatisfactory performance, or lack of convincing application examples. This paper addresses principle of designing skeleton libraries of parallel programming and reports implementation details and practical applications of a skeleton library SkeTo. The SkeTo library is unique in its feature that it has a solid theoretical foundation based on the theory of Constructive Algorithmics, and is practical to be used to describe various parallel computations in a sequential manner. 1.
Optimizing Compositions of Scans and Reductions in Parallel Program Derivation
, 1997
"... Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
Introduction We study two popular programming schemas: scan (also known as prefix sums, parallel prefix, etc.) and reduction (also known as fold). Originally from the functional world [3], they are becoming increasingly popular as primitives of parallel programming. The reasons are that, first, such higher-order combinators are adequate and useful for a broad class of applications [4], second, they encourage well-structured, coarse-grained parallel programming and, third, their implementation in the MPI standard [14] makes the target programs portable across different parallel architectures with predictable performance. Our contributions are as follows: -- We formally prove two optimization rules: the first rule transforms a sequential composition of scan and reduction into a single reduction, the second rule transforms a composition of two scans into a single scan. -- We apply the first rule in the formal derivation of a parallel algorithm for the
(De)Composition Rules for Parallel Scan and Reduction
- In Proc. 3rd Int. Working Conf. on Massively Parallel Programming Models (MPPM'97
, 1998
"... We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan an ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We study the use of well-defined building blocks for SPMD programming of machines with distributed memory. Our general framework is based on homomorphisms, functions that capture the idea of dataparallelism and have a close correspondence with collective operations of the MPI standard, e.g., scan and reduction. We prove two composition rules: under certain conditions, a composition of a scan and a reduction can be transformed into one reduction, and a composition of two scans into one scan. As an example of decomposition, we transform a segmented reduction into a composition of partial reduction and all-gather. The performance gain and overhead of the proposed composition and decomposition rules are assessed analytically for the hypercube and compared with the estimates for some other parallel models.
Diffusion: Calculating Efficient Parallel Programs
- IN 1999 ACM SIGPLAN WORKSHOP ON PARTIAL EVALUATION AND SEMANTICS-BASED PROGRAM MANIPULATION (PEPM ’99
, 1999
"... Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of p ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Parallel primitives (skeletons) intend to encourage programmers to build a parallel program from ready-made components for which efficient implementations are known to exist, making the parallelization process easier. However, programmers often suffer from the difficulty to choose a combination of proper parallel primitives so as to construct efficient parallel programs. To overcome this difficulty, we shall propose a new transformation, called diffusion, which can efficiently decompose a recursive definition into several functions such that each function can be described by some parallel primitive. This allows programmers to describe algorithms in a more natural recursive form. We demonstrate our idea with several interesting examples. Our diffusion transformation should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.
Parallelizing Functional Programs by Generalization
- Journal of Functional Programming
, 1997
"... List homomorphisms are functions that are parallelizable using the divide-and-conquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the Bird-Meertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
List homomorphisms are functions that are parallelizable using the divide-and-conquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the Bird-Meertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential representations of a function, based on cons- and snoc-lists, respectively, there is also a representation as a homomorphism. Our contribution is a mechanizable method to extract the homomorphism representation from a pair of sequential representations. The method is decomposed to a generalization problem and an inductive claim, both solvable by term rewriting techniques. To solve the former we present a sound generalization procedure which yields the required representation, and terminates under reasonable assumptions. We illustrate the method and the procedure by the parallelization of the scan-function (parallel prefix). The inductive claim is provable automatically.

