Results 1 -
4 of
4
Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping
- In PLDI
, 2009
"... Compiler-based auto-parallelization is a much studied area, yet has still not found wide-spread application. This is largely due to the poor exploitation of application parallelism, subsequently resulting in performance levels far below those which a skilled expert programmer could achieve. We have ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Compiler-based auto-parallelization is a much studied area, yet has still not found wide-spread application. This is largely due to the poor exploitation of application parallelism, subsequently resulting in performance levels far below those which a skilled expert programmer could achieve. We have identified two weaknesses in traditional parallelizing compilers and propose a novel, integrated approach, resulting in significant performance improvements of the generated parallel code. Using profile-driven parallelism detection we overcome the limitations of static analysis, enabling us to identify more application parallelism and only rely on the user for final approval. In addition, we replace the traditional target-specific and inflexible mapping heuristics with a machine-learning based prediction mechanism, resulting in better mapping decisions while providing more scope for adaptation to different target architectures.
The Polyhedral Model Is More Widely Applicable Than You Think
"... Abstract. The polyhedral model is a powerful framework for automatic optimization and parallelization. It is based on an algebraic representation of programs, allowing to construct and search for complex sequences of optimizations. This model is now mature and reaches production compilers. The main ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Abstract. The polyhedral model is a powerful framework for automatic optimization and parallelization. It is based on an algebraic representation of programs, allowing to construct and search for complex sequences of optimizations. This model is now mature and reaches production compilers. The main limitation of the polyhedral model is known to be its restriction to statically predictable, loop-based program parts. This paper removes this limitation, allowing to operate on general data-dependent control-flow. We embed control and exit predicates as first-class citizens of the algebraic representation, from program analysis to code generation. Complementing previous (partial) attempts in this direction, our work concentrates on extending the code generation step and does not compromise the expressiveness of the model. We present experimental evidence that our extension is relevant for program optimization and parallelization, showing performance improvements on benchmarks that were thought to be out of reach of the polyhedral model. 1
Towards Automatic Profile-Driven Parallelization of Embedded Multimedia Applications
"... Abstract. Despite the availability of ample parallelism in multimedia applications parallelizing compilers are largely unable to extract this application parallelism and map it onto existing embedded multi-core platforms. This is mainly due to the limitations of traditional autoparallelization on st ..."
Abstract
- Add to MetaCart
Abstract. Despite the availability of ample parallelism in multimedia applications parallelizing compilers are largely unable to extract this application parallelism and map it onto existing embedded multi-core platforms. This is mainly due to the limitations of traditional autoparallelization on static analysis and loop-level parallelism. In this paper we propose a dynamic, profile-driven approach to auto-parallelization targeting coarse-grain parallelism. We present our methodology and tools for the extraction of task graphs from sequential codes and demonstrate the various stages involved in this process based on the JPEG-2000 still image compression application. In addition, we show how the joint detection of multiple levels of parallelism and exploitation of application scenarios can lead to performance levels close to those obtained by manual parallelization. Finally, we demonstrate the applicability of our methodology to a broader set of embedded multimedia codes. 1
U N I V E R S I
"... With today’s processing hardware being multicore, and development directing to even more cores in every system, it is crucial to take advantage of all the cores available in a system. Traditionally, parallelization assumes the system is available exclusively to that one workload. This work proposes ..."
Abstract
- Add to MetaCart
With today’s processing hardware being multicore, and development directing to even more cores in every system, it is crucial to take advantage of all the cores available in a system. Traditionally, parallelization assumes the system is available exclusively to that one workload. This work proposes a cooperative strategy for OpenMP to avoid system overloading when running multiple parallelized programs. It allows multiple programs to optimize towards best system performance. This is done by communicating individual estimated speedups under workloads, and using that to compute an estimated optimal thread allocation. Experiments show it achieves more than 98 % of the optimal performance on average. Additional enhancements make the strategy more resistant to exploitation. Depending on the optimization target and detection probability, this can make exploitation attempts unprofitable. i Acknowledgements Many thanks to Kousha Etessami for a chat that changed this projects perspective, and made it much more interesting. Also, I would like to thank everyone who helped

