Results 1 -
4 of
4
Acceleration of a CFD Code with a GPU
"... Abstract: The CFD code Overflow includes as one of its solver options a quasi-SSOR algorithm. This is a fairly small piece of code but it accounts for a significant portion of the total computational time. This paper studies some of the issues in accelerating the code by use of a GPU. The algorithm ..."
Abstract
- Add to MetaCart
Abstract: The CFD code Overflow includes as one of its solver options a quasi-SSOR algorithm. This is a fairly small piece of code but it accounts for a significant portion of the total computational time. This paper studies some of the issues in accelerating the code by use of a GPU. The algorithm needs to be modified to be suitable for a GPU, and attention needs to be given to 64-bit and 32-bit arithmetic.
Numerical Ocean Modeling and Simulation with CUDA
"... Abstract—ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the size and resolution of simulations are constrained by t ..."
Abstract
- Add to MetaCart
Abstract—ROMS is software that models and simulates an ocean region using a finite difference grid and time stepping. ROMS simulations can take from hours to days to complete due to the compute-intensive nature of the software. As a result, the size and resolution of simulations are constrained by the performance limitations of modern computing hardware. To address these issues, the existing ROMS code can be run in parallel with either OpenMP or MPI. In this work, we implement a new parallelization of ROMS on a graphics processing unit (GPU) using CUDA Fortran. We exploit the massive parallelism offered by modern GPUs to gain a performance benefit at a lower cost and with less power. To test our implementation, we benchmark with idealistic marine conditions as well as real data collected from coastal waters near central California. Our implementation yields a speedup of up to 8x over a serial implementation and 2.5x over an OpenMP implementation, while demonstrating comparable performance to a MPI implementation. I.
Power-Efficient Accelerators for High-Performance Applications
, 2011
"... This work would not have been possible without the help and support of a number of my colleagues, friends and family. Thanks first go to Professor Scott Mahlke, my advisor through all my years in graduate school and during the last few years of college. A constant source of ideas and always full of ..."
Abstract
- Add to MetaCart
This work would not have been possible without the help and support of a number of my colleagues, friends and family. Thanks first go to Professor Scott Mahlke, my advisor through all my years in graduate school and during the last few years of college. A constant source of ideas and always full of energy and alacrity, Scott was a great advisor and a great teacher. I would like to thank my thesis committee for providing their thoughts and suggestions for a number of my projects. Professor David Blaauw was a tremendous resource during my investigation of various power-reduction techniques. Professor Jeffrey Fessler was instrumental in providing a real-world look at medical imaging and scientific computing. Professor Trevor Mudge was effectively my co-advisor and heavily influenced the direction of my research into SIMD and power-efficient architectures. While they were not my advisors in any official capacity, Professor James Freudenberg and Professor Mark Brehob were both very positive influences on my undergraduate and graduate careers at Michigan and provided sound advice at many pivotal points over the last several years. iii My work in industry heavily influenced the ideas and solutions in this thesis. For providing me with a window into “the real world”, I thank Krisztian Flautner, David Bull, Sami Yehia, and Shidhartha Das at ARM; and Mikhail Smelyanskiy and Victor Lee at Intel. I had some amazing colleagues while at Michigan. Thanks first go to Ankit Sethia and Vincentius Robby who put up with my requests for the projects they helped me complete.
PEPSC: A Power-Efficient Processor for Scientific Computing
"... Abstract—The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several T ..."
Abstract
- Add to MetaCart
Abstract—The rapid advancements in the computational capabilities of the graphics processing unit (GPU) as well as the deployment of general programming models for these devices have made the vision of a desktop supercomputer a reality. It is now possible to assemble a system that provides several TFLOPs of performance on scientific applications for the cost of a highend laptop computer. While these devices have clearly changed the landscape of computing, there are two central problems that arise. First, GPUs are designed and optimized for graphics applications resulting in delivered performance that is far below peak for more general scientific and mathematical applications. Second, GPUs are power hungry devices that often consume 100-300 watts, which restricts the scalability of the solution and requires expensive cooling. To combat these challenges, this paper presents the PEPSC architecture – an architecture customized for the domain of data parallel scientific applications where powerefficiency is the central focus. PEPSC utilizes a combination of a two-dimensional single-instruction multiple-data (SIMD) datapath, an intelligent dynamic prefetching mechanism, and a configurable SIMD control approach to increase execution efficiency over conventional GPUs. A single PEPSC core has a peak performance of 120 GFLOPs while consuming 2W of power when executing modern scientific applications, which represents an increase in computation efficiency of more than 10X over existing GPUs. Keywords-Low power, SIMD, GPGPU, Throughput computing, Scientific computing

