Results 11  20
of
33
Financial software on gpus: between haskell and fortran
 In Proceedings of the 1st ACM SIGPLAN workshop on Functional highperformance computing, FHPC '12
, 2012
"... This paper presents a realworld pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardwareneutral algorithm implementation and efficient execution on graphicsprocessing units (GPU). The language issues refer to preserving al ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a realworld pricing kernel for financial derivatives and evaluates the language and compiler tool chain that would allow expressive, hardwareneutral algorithm implementation and efficient execution on graphicsprocessing units (GPU). The language issues refer to preserving algorithmic invariants, e.g., inherent parallelism made explicit by mapreducescan functional combinators. Efficient execution is achieved by manually applying a series of generallyapplicable compiler transformations that allows the generatedOpenCL code to yield speedups as high as 70 × and 540 × on a commodity mobile and desktop GPU, respectively. Apart from the concrete speedups attained, our contributions are twofold: First, from a language perspective, we illustrate that even stateoftheart autoparallelization techniques are incapable of discovering all the requisite data parallelism when rendering the
The Static Parallelization of Loops and Recursions
 In Proc. 11th Int. Symp. on High Performance Computing Systems (HPCS'97
, 1997
"... We demonstrate approaches to the static parallelization of loops and recursions on the example of the polynomial product. Phrased as a loop nest, the polynomial product can be parallelized automatically by applying a spacetime mapping technique based on linear algebra and linear programming. One ca ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
We demonstrate approaches to the static parallelization of loops and recursions on the example of the polynomial product. Phrased as a loop nest, the polynomial product can be parallelized automatically by applying a spacetime mapping technique based on linear algebra and linear programming. One can choose a parallel program that is optimal with respect to some objective function like the number of execution steps, processors, channels, etc. However, at best, linear execution time complexity can be attained. Through phrasing the polynomial product as a divideandconquer recursion, one can obtain a parallel program with sublinear execution time. In this case, the target program is not derived by an automatic search but given as a program skeleton, which can be deduced by a sequence of equational program transformations. We discuss the use of such skeletons, compare and assess the models in which loops and divideandconquer recursions are parallelized and comment on the performance pr...
Systematic Derivation of Tree Contraction Algorithms
 In Proceedings of INFOCOM '90
, 2005
"... While tree contraction algorithms play an important role in e#cient tree computation in parallel, it is di#cult to develop such algorithms due to the strict conditions imposed on contracting operators. In this paper, we propose a systematic method of deriving e#cient tree contraction algorithms f ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
While tree contraction algorithms play an important role in e#cient tree computation in parallel, it is di#cult to develop such algorithms due to the strict conditions imposed on contracting operators. In this paper, we propose a systematic method of deriving e#cient tree contraction algorithms from recursive functions on trees in any shape. We identify a general recursive form that can be parallelized to obtain e#cient tree contraction algorithms, and present a derivation strategy for transforming general recursive functions to parallelizable form. We illustrate our approach by deriving a novel parallel algorithm for the maximum connectedset sum problem on arbitrary trees, the treeversion of the famous maximum segment sum problem.
Parallelizing Functional Programs by Term Rewriting
, 1997
"... List homomorphisms are functions that can be computed in parallel using the divideandconquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the BirdMeertens theory of lists. A previous work proved that to each pair of leftward and rightward se ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
List homomorphisms are functions that can be computed in parallel using the divideandconquer paradigm. We study the problem of finding a homomorphic representation of a given function, based on the BirdMeertens theory of lists. A previous work proved that to each pair of leftward and rightward sequential representations of a function, based on cons and snoclists, respectively, there is also a representation as a homomorphism. Our contribution is a mechanizable method to extract the homomorphism representation from a pair of sequential representations. The method is decomposed to a generalization problem and an inductive claim, both solvable by term rewriting techniques. To solve the former we present a sound generalization procedure which yields the required representation, and terminates under reasonable assumptions. We illustrate the method and the procedure by the parallelization of the scanfunction (parallel prefix). The inductive claim is provable automatically. Keywords: P...
Towards polytypic parallel programming
, 1998
"... Data parallelism is currently one of the most successful models for programming massively parallel computers. The central idea is to evaluate a uniform collection of data in parallel by simultaneously manipulating each data element in the collection. Despite many of its promising features, the curre ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Data parallelism is currently one of the most successful models for programming massively parallel computers. The central idea is to evaluate a uniform collection of data in parallel by simultaneously manipulating each data element in the collection. Despite many of its promising features, the current approach suffers from two problems. First, the main parallel data structures that most data parallel languages currently support are restricted to simple collection data types like lists, arrays or similar structures. But other useful data structures like trees have not been well addressed. Second, parallel programming relies on a set of parallel primitives that capture parallel skeletons of interest. However, these primitives are not well structured, and efficient parallel programming with these primitives is difficult. In this paper, we propose a polytypic framework for developing efficient parallel programs on most data structures. We showhow a set of polytypic parallel primitives can be formally defined for manipulating most data structures, how these primitives can be successfully structured into a uniform recursive definition, and how an efficient combination of primitives can be derived from a naive specification program. Our framework should be significant not only in development of new parallel algorithms, but also in construction of parallelizing compilers.
Data Structures for Parallel Recursion
, 1997
"... vii Chapter 1 Introduction 1 1.1 Synchronous Parallel Programming . . . . . . . . . . . . . . . . . . . 4 1.2 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Operator Priority . . . . . . . ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
vii Chapter 1 Introduction 1 1.1 Synchronous Parallel Programming . . . . . . . . . . . . . . . . . . . 4 1.2 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.2 Operator Priority . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.3 Notation and Proof Style . . . . . . . . . . . . . . . . . . . . 9 1.3 Cost Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Parallel Algorithm Complexity . . . . . . . . . . . . . . . . . 14 1.3.2 Parallel Computation Models . . . . . . . . . . . . . . . . . . 17 Chapter 2 Powerlists 20 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.1.1 Induction Principle for PowerLists . . . . . . . . . . . . . . . . 25 2.1.2 Data Movement and Permutation Functions . . . . . . . . . . 26 2.2 Hypercubes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3 A Cost Calculus for P...
A Calculational Framework for Parallelization of Sequential Programs
 In International Symposium on Information Systems and Technologies for Network Society
, 1997
"... this paper, we propose ..."
List Homomorphism with Accumulation
 In Proceedings of Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD
, 2003
"... This paper introduces accumulation into list homomorphisms for systematic development of both efficient and correct parallel programs. New parallelizable recursive pattern called is given, and transformations from sequential patterns in the form into (H)homomorphism are shown. We illustrate ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
This paper introduces accumulation into list homomorphisms for systematic development of both efficient and correct parallel programs. New parallelizable recursive pattern called is given, and transformations from sequential patterns in the form into (H)homomorphism are shown. We illustrate the power of our formalization by developing a novel and general parallel program for a class of interesting and challenging problems, known as maximum marking problems. 1.
Functional Bulk Synchronous Parallel Programs
, 2010
"... With the current generalization of parallel architectures arises the concern of applying formal methods to parallelism, which allows specifications of parallel programs to be precisely stated and the correctness of an implementation to be verified. However, the complexity of parallel, compared to se ..."
Abstract
 Add to MetaCart
(Show Context)
With the current generalization of parallel architectures arises the concern of applying formal methods to parallelism, which allows specifications of parallel programs to be precisely stated and the correctness of an implementation to be verified. However, the complexity of parallel, compared to sequential, programs makes them more errorprone and difficult to verify. This calls for a strongly structured form of parallelism, which should not only ease programming by providing abstractions that conceal much of the complexity of parallel computation, but also provide a systematic way of developing practical programs from specification. Bulk Synchronous Parallelism (BSP) is a model of computation which offers a high degree of abstraction like PRAM models but yet a realistic cost model based on a structured parallelism. We propose a framework for refining a sequential specification toward a functional BSP program, the whole process being done with the help of a proof assistant. The main technical contributions of this paper are as follows: We define BH, a new homomorphic skeleton, which captures the essence of BSP computation in an algorithmic level, and