• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Single assignment c: efficient support for high-level array operations in a functional setting (0)

by S-B Scholz
Venue:J. Funct. Program
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Data Parallel Haskell: a status report

by Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, Gabriele Keller, Simon Marlow , 2007
"... We describe the design and current status of our effort to implement the programming model of nested data parallelism into the Glasgow Haskell Compiler. We extended the original programmingmodel and its implementation, both of which were first popularised by the NESL language, in terms of expressiv ..."
Abstract - Cited by 56 (14 self) - Add to MetaCart
We describe the design and current status of our effort to implement the programming model of nested data parallelism into the Glasgow Haskell Compiler. We extended the original programmingmodel and its implementation, both of which were first popularised by the NESL language, in terms of expressiveness as well as efficiency. Our current aim is to provide a convenient programming environment for SMP parallelism, and especially multicore architectures. Preliminary benchmarks show that we are, at least for some programs, able to achieve good absolute performance and excellent speedups.

Language Virtualization for Heterogeneous Parallel Computing

by Hassan Chafi, Zach Devito, Adriaan Moors, Tiark Rompf, Arvind K. Sujeeth, Pat Hanrahan, Martin Odersky, Kunle Olukotun
"... As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatible mix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomp ..."
Abstract - Cited by 12 (6 self) - Add to MetaCart
As heterogeneous parallel systems become dominant, application developers are being forced to turn to an incompatible mix of low level programming models (e.g. OpenMP, MPI, CUDA, OpenCL). However, these models do little to shield developers from the difficult problems of parallelization, data decomposition and machine-specific details. Most programmers are having a difficult time using these programming models effectively. To provide a programming model that addresses the productivity and performance requirements for the average programmer, we explore a domainspecific approach to heterogeneous parallel programming. We propose language virtualization as a new principle that enables the construction of highly efficient parallel domain specific languages that are embedded in a common host language. We define criteria for language virtualization and present techniques to achieve them. We present two concrete case studies of domain-specific languages that are implemented using our virtualization approach.

Towards an efficient functional implementation of the NAS benchmark FT

by Clemens Grelck, Sven-bodo Scholz - Proceedings of the 7th International Conference on Parallel Computing Technologies (PaCT’03), Nizhni Novgorod, Russia. Volume 2763 of Lecture Notes in Computer Science , 2003
"... Abstract. This paper compares a high-level implementation of the NAS benchmark FT in the functional array language SaC with traditional solutions based on Fortran-77 and C. The impact of abstraction on expressiveness, readability, and maintainability of code as well as on clarity of underlying mathe ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Abstract. This paper compares a high-level implementation of the NAS benchmark FT in the functional array language SaC with traditional solutions based on Fortran-77 and C. The impact of abstraction on expressiveness, readability, and maintainability of code as well as on clarity of underlying mathematical concepts is discussed. The associated impact on runtime performance is quantified both in a uniprocessor environment as well as in a multiprocessor environment based on automatic parallelization and on OpenMP. 1 Introduction Low-level sequential base languages, e.g. Fortran-77 or C, and message passing libraries, mostly Mpi, form the prevailing tools for generating parallel applications, in particular for numerical problems. This choice offers almost literal control over data layout and program execution, including communication and synchronization. Expertised programmers are enabled to adapt their code to hardware characteristics of target machines, e.g. properties of memory hierarchies, and to enhance the runtime performance to whatever a machine is able to deliver. During the process of performance tuning, numerical code inevitably mutates from a (maybe) human-readable representation of an abstract algorithm to one that almost certainly is suitable for machines only. Ideas and concepts of underlying mathematical algorithms are completely disguised. Even minor changes to underlying algorithms may require a major re-design of the implementation. Moreover, particular demand is made on the qualification of programmers as they have to be experts in computer architecture and programming technique in addition to their specific application domains. As a consequence, development and maintenance of parallel code is prohibitively expensive.

With-Loop Scalarization: Merging Nested Array Operations

by Clemens Grelck, Sven-bodo Scholz, Kai Trojahner - Proceedings of the 15th International Workshop on Implementation of Functional Languages (IFL’03 , 2004
"... Abstract. Construction of complex array operations by composition of more basic ones allows for abstract and concise specifications of algorithms. Unfortunately, naïve compilation of such specifications leads to creation of many temporary arrays at runtime and, consequently, to poor performance char ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Abstract. Construction of complex array operations by composition of more basic ones allows for abstract and concise specifications of algorithms. Unfortunately, naïve compilation of such specifications leads to creation of many temporary arrays at runtime and, consequently, to poor performance characteristics. This paper elaborates on a new compiler optimization, named withloop-scalarization, which aims at eliminating temporary arrays in the context of nested array operations. It is based on with-loops, a versatile array comprehension construct used by the functional array language SaC both for specification as well as for internal representation of array operations. The impact of with-loop-scalarization on the runtime performance of compiled SaC code is demonstrated by several experiments involving support for arithmetic on arrays of complex numbers and the application kernel FT from the NAS benchmark suite. 1

R (2006) µTC – an intermediate language for programming chip multiprocessors

by Chris Jesshope - Proc. Pacific Computer Systems Architecture Conference 2006 - ACSAC06, ISBN 3-540-4005, LNCS 4186 , 2006
"... multiprocessors ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
multiprocessors

Axis Control in SaC

by Clemens Grelck, Sven-bodo Scholz - Proceedings of the 14th International Workshop on Implementation of Functional Languages (IFL’02), volume 2670 of Lecture Notes in Computer Science , 2002
"... Abstract. High-level array processing is characterized by the composition of generic operations, which treat all array elements in a uniform way. This paper proposes a mechanism that allows programmers to direct effects of such array operations to non-scalar subarrays of argument arrays without sacr ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract. High-level array processing is characterized by the composition of generic operations, which treat all array elements in a uniform way. This paper proposes a mechanism that allows programmers to direct effects of such array operations to non-scalar subarrays of argument arrays without sacrificing the high-level programming approach. A versatile notation for axis control is presented, and it is shown how the additional language constructs can be transformed into regular SaC code. Furthermore, an optimization technique is introduced which achieves the same runtime performance regardless of whether code is written using the new notation or in a substantially less elegant style employing conventional language features. 1

Clustered Workflow Execution of Retargeted Data Analysis Scripts

by Daniel L. Wang, Charles S. Zender, Stephen F. Jenks - EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID
"... Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to re ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Supercomputing advances have enabled computational science data volumes to grow at ever increasing rates, commonly resulting in more data produced than can be practically analyzed. Whole-dataset download costs have grown to impractical heights, even with multi-Gbps networks, forcing scientists to rely on server-side subsetting and limiting the scope of data they can analyze on a workstation. Our system supplements existing scientific data services with lightweight computational capability, providing a means of safely relocating analysis from the desktop to the server where clustered execution can be coordinated, exploiting data locality, reducing unnecessary data transfer, and providing end-users with results several times faster. We show how dataflow and other compiler-inspired analyses of shell scripts of scientists’ most common analysis tools enables parallelization and optimizations in disk and network I/O bandwidth. We benchmark using an actual geoscience analysis script, illustrating the crucial performance gains of extracting workflows defined in scripts and optimizing their execution. Current results quantify significant improvements in performance, showing the promise of bringing transparent high-performance analysis to the scientist’s desktop.

With-Loop Fusion for Data Locality and Parallelism

by Clemens Grelck, Karsten Hinckfuß, Sven-bodo Scholz - Implementation and Application of Functional Languages, 17th INternational Workshop, IFL’05, Selected Papers, volume ??? of LNCS , 2006
"... Abstract. With-loops are versatile array comprehensions used in the functional array language SaC to implement universally applicable array operations. We describe the fusion of with-loops as a novel optimization technique to improve the data locality of compiled code. Experiments based on selected ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Abstract. With-loops are versatile array comprehensions used in the functional array language SaC to implement universally applicable array operations. We describe the fusion of with-loops as a novel optimization technique to improve the data locality of compiled code. Experiments based on selected benchmark programs show the significance of withloop fusion for achieving competitive runtime performance figures with high-level SaC programs. 1

Portable High Performance and Scalability for Partitioned Global Address Space Languages

by Cristian Coarfa, Cristian Coarfa , 2007
"... Large scale parallel simulations are fundamental tools for engineers and scientists. Con-sequently, it is critical to develop both programming models and tools that enhance devel-opment time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling p ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
Large scale parallel simulations are fundamental tools for engineers and scientists. Con-sequently, it is critical to develop both programming models and tools that enhance devel-opment time productivity, enable harnessing of massively-parallel systems, and to guide the diagnosis of poorly scaling programs. This thesis addresses this challenge in two ways. First, we show that Co-array Fortran (CAF), a shared-memory parallel program-ming model, can be used to write scientific codes that exhibit high performance on modern parallel systems. Second, we describe a novel technique for analyzing parallel program performance and identifying scalability bottlenecks, and apply it across multiple program-ming models. Although the message passing parallel programming model provides both portability and high performance, it is cumbersome to program. CAF eases this burden by providing a partitioned global address space, but has before now only been implemented on shared-memory machines. To significantly broaden CAF’s appeal, we show that CAF programs can deliver high-performance on commodity cluster platforms. We designed and imple-

Regular, shape-polymorphic, parallel arrays in Haskell

by Gabriele Keller, Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon Peyton - In Proceedings of the ACM SIGPLAN International Conference on Functional Programming, ICFP 2010 , 2010
"... We present a novel approach to regular, multi-dimensional arrays in Haskell. The main highlights of our approach are that it (1) is purely functional, (2) supports reuse through shape polymorphism, (3) avoids unnecessary intermediate structures rather than relying on subsequent loop fusion, and (4) ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
We present a novel approach to regular, multi-dimensional arrays in Haskell. The main highlights of our approach are that it (1) is purely functional, (2) supports reuse through shape polymorphism, (3) avoids unnecessary intermediate structures rather than relying on subsequent loop fusion, and (4) supports transparent parallelisation. We show how to embed two forms of shape polymorphism into Haskell’s type system using type classes and type families. In particular, we discuss the generalisation of regular array transformations to arrays of higher rank, and introduce a type-safe specification of array slices. We discuss the runtime performance of our approach for three standard array algorithms. We achieve absolute performance comparable to handwritten C code. At the same time, our implementation scales well up to 8 processor cores. Categories and Subject Descriptors D.3.3 [Programming Languages]: Language Constructs and Features—Concurrent programming structures; Polymorphism; Abstract data types
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University