• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Wait-free programming for general purpose computations on graphics processors (2008)

by P H HA, P TSIGAS, O J ANSHUS
Venue:In IPDPS, IEEE
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

1 The Synchronization Power of Coalesced Memory Accesses

by Phuong Hoai Ha, Otto J. Anshus
"... Abstract—Multicore architectures have established themselves as the new generation of computer architectures. As part of the one core to many cores evolution, memory access mechanisms have advanced rapidly. Several new memory access mechanisms have been implemented in many modern commodity multicore ..."
Abstract - Add to MetaCart
Abstract—Multicore architectures have established themselves as the new generation of computer architectures. As part of the one core to many cores evolution, memory access mechanisms have advanced rapidly. Several new memory access mechanisms have been implemented in many modern commodity multicore architectures. By specifying how processing cores access shared memory, memory access mechanisms directly influence the synchronization capabilities of multicore architectures. Therefore, it is crucial to investigate the synchronization power of these new memory access mechanisms. This paper investigates the synchronization power of coalesced memory accesses, a family of memory access mechanisms introduced in recent large multicore architectures such as the Compute Unified Device Architecture (CUDA). We first define three memory access models to capture the fundamental features of the new memory access mechanisms. Subsequently, we prove the exact synchronization power of these models in terms of their consensus numbers. These tight results show that the coalesced memory access mechanisms can facilitate strong synchronization between the threads of multicore architectures, without the need of synchronization primitives other than reads and writes. In the case of the contemporary CUDA processors, our results imply that the coalesced memory access mechanisms have consensus numbers up to sixty four. Index Terms—Memory access models, consensus, multicore architectures, inter-process synchronization. I.

Work distribution methods on GPUs

by Christian Lauterback, Qi Mo, Dinesh Manocha
"... Due to their high thread and data parallelism, commodity GPU architectures currently provide very high performance and general programmability. Many algorithms have been successfully ported to GPUs, but several limitations have prevented scalable implementations of many less easily parallelizable re ..."
Abstract - Add to MetaCart
Due to their high thread and data parallelism, commodity GPU architectures currently provide very high performance and general programmability. Many algorithms have been successfully ported to GPUs, but several limitations have prevented scalable implementations of many less easily parallelizable recursive and hierarchical algorithms. In this paper, we investigate general approaches for dynamic work distribution and balancing on GPUs to allow recursive algorithms such as hierarchy algorithms. We propose a new and simple method that instead employs only minimal synchronization between cores and explicit balancing, but is more suited to the properties of the architecture. We show an implementation of several applications on a current GPU and our results show that for applications with fine-grained parallelism it outperforms other currently used work distribution methods since it avoids limitations of GPU architectures and provides competitive performance on coarse-grained applications. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University