Results 11 -
14 of
14
DESOLA: an Active Linear Algebra Library Using Delayed Evaluation and Runtime Code Generation ⋆
"... Active libraries can be defined as libraries which play an active part in the compilation, in particular, the optimisation of their client code. This paper explores the implementation of an active dense linear algebra library by delaying evaluation of expressions built using library calls, then gene ..."
Abstract
- Add to MetaCart
Active libraries can be defined as libraries which play an active part in the compilation, in particular, the optimisation of their client code. This paper explores the implementation of an active dense linear algebra library by delaying evaluation of expressions built using library calls, then generating code at runtime for the compositions that occur. The key optimisations in this context are loop fusion and array contraction. Our prototype C++ implementation, DESOLA, automatically fuses loops arising from different client calls, identifies unnecessary intermediate temporaries, and contracts temporary arrays to scalars. Performance is evaluated using a benchmark suite of linear solvers from ITL (Iterative Template Library), and is compared with MTL (Matrix Template Library), ATLAS (Automatically Tuned Linear Algebra) and IMKL (Intel Math Kernel Library). Excluding runtime compilation overheads (caching means they occur only on the first iteration), for larger matrix sizes, performance matches or exceeds MTL; when fusion of matrix operations occurs, performance exceeds that of ATLAS and IMKL. Key words: runtime code generation, delayed evaluation, active libraries, numerical libraries
OPTIMISING COMPONENT COMPOSITION USING INDEXED DEPENDENCE METADATA
"... This paper explores the use of dependence metadata for optimising composition in component-based parallel programs. The idea is for each component to carry additional information about how points in its iteration space map to memory locations associated with its input and output data structures. Whe ..."
Abstract
- Add to MetaCart
This paper explores the use of dependence metadata for optimising composition in component-based parallel programs. The idea is for each component to carry additional information about how points in its iteration space map to memory locations associated with its input and output data structures. When two components are composed this information can be used to implement optimisations that would otherwise require expensive analysis of the components ’ code at the time of composition. This dependence metadata facilitates a number of cross-component optimisations – in this paper we focus on loop fusion and array contraction. We describe a prototype framework, based on the CLooG loop generator tool, that embodies these ideas and report experimental performance results for three non-trivial parallel benchmarks. Our results show execution time reductions of up to 50% using the proposed framework on an 8 core xeon. 1.
and Oracle Labs
"... The increasing importance of graph-data based applications is fueling the need for highly efficient and parallel implementations of graph analysis software. In this paper we describe Green-Marl, a domain-specific language (DSL) whose high level language constructs allow developers to describe their ..."
Abstract
- Add to MetaCart
The increasing importance of graph-data based applications is fueling the need for highly efficient and parallel implementations of graph analysis software. In this paper we describe Green-Marl, a domain-specific language (DSL) whose high level language constructs allow developers to describe their graph analysis algorithms intuitively, but expose the data-level parallelism inherent in the algorithms. We also present our Green-Marl compiler which translates high-level algorithmic description written in Green-Marl into an efficient C++ implementation by exploiting this exposed datalevel parallelism. Furthermore, our Green-Marl compiler applies a set of optimizations that take advantage of the high-level semantic knowledge encoded in the Green-Marl DSL. We demonstrate that graph analysis algorithms can be written very intuitively with Green-Marl through some examples, and our experimental results show that the compiler-generated implementation out of such descriptions performs as well as or better than highly-tuned handcoded implementations.
2011 International Conference on Parallel Architectures and Compilation Techniques A Heterogeneous Parallel Framework for Domain-Specific Languages
"... Abstract—Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming mod ..."
Abstract
- Add to MetaCart
Abstract—Computing systems are becoming increasingly parallel and heterogeneous, and therefore new applications must be capable of exploiting parallelism in order to continue achieving high performance. However, targeting these emerging devices often requires using multiple disparate programming models and making decisions that can limit forward scalability. In previous work we proposed the use of domain-specific languages (DSLs) to provide high-level abstractions that enable transformations to high performance parallel code without degrading programmer productivity. In this paper we present a new end-to-end system for building, compiling, and executing DSL applications on parallel heterogeneous hardware, the Delite Compiler Framework and Runtime. The framework lifts embedded DSL applications to an intermediate representation (IR), performs generic, parallel, and domain-specific optimizations, and generates an execution graph that targets multiple heterogeneous hardware devices. Finally we present results comparing the performance of several machine learning applications written in OptiML, a DSL for machine learning that utilizes Delite, to C++ and MATLAB implementations. We find that the implicitly parallel OptiML applications achieve single-threaded performance comparable to C++ and outperform explicitly parallel MATLAB in nearly all cases. Keywords-parallel programming; multicore processing; computer languages I.

