Results 1 - 10
of
860
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
, 2009
"... GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architect ..."
Abstract
-
Cited by 134 (5 self)
- Add to MetaCart
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU
Exploiting the Computational Power of the Graphics Card: Optimal State Space Planning on the GPU
- In Proc. of the 21st Int. Conf. on Automated Planning and Scheduling (ICAPS
, 2011
"... In this paper optimal state space planning is parallelized by exploiting the processing power of a graphics card. The two exploration steps, namely selecting the actions to be applied and generating the successors, are per-formed on a graphics processing unit. Duplicate detec-tion, however, is delay ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
In this paper optimal state space planning is parallelized by exploiting the processing power of a graphics card. The two exploration steps, namely selecting the actions to be applied and generating the successors, are per-formed on a graphics processing unit. Duplicate detec-tion, however
Hardware Transactional Memory for GPU Architectures
- In MICRO
, 2011
"... Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency opera-tions. While threads within a C ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency opera-tions. While threads within a
Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping
- In Micro-42
, 2009
"... Heterogeneous multiprocessors are growingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However, the ..."
Abstract
-
Cited by 89 (3 self)
- Add to MetaCart
Heterogeneous multiprocessors are growingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However
Efficient parallel graph exploration for multi-core cpu and gpu
- In IEEE PACT
, 2011
"... Abstract—Graphs are a fundamental data representation that have been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this pape ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental property of randomly shaped real-world graph instances. By utilizing memory bandwidth more efficiently, our method shows improved performance over the current state
StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems
"... Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible costeffective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing t ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible costeffective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing
W.m.W.: CUDA-Lite: Reducing GPU programming complexity
- In: LCPC’08. Volume 5335 of LNCS
, 2008
"... Abstract. The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize ..."
Abstract
-
Cited by 62 (0 self)
- Add to MetaCart
memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto
A GPU-based multi-agent system for real-time simulations
- Advances in Intelligent and Soft Computing 70/2010
, 2010
"... Abstract The huge number of cores existing in current Graphics Processor Units (GPUs) provides these devices with computing capabilities that can be exploited by distributed applications. In particular, these capabilites have been used in crowd simulations for enhancing the crowd rendering, and even ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract The huge number of cores existing in current Graphics Processor Units (GPUs) provides these devices with computing capabilities that can be exploited by distributed applications. In particular, these capabilites have been used in crowd simulations for enhancing the crowd rendering
Montecito: A Dual-Core, Dual-Thread Itanium Processor
- IEEE Micro
, 2005
"... Intel’s Itanium 2 processor series has regularly delivered additional performance through the increased frequency and cache as evidenced by the 6-Mbyte and 9-Mbyte versions. 1 Montecito is the next offering in the Itanium processor family and represents many firsts for both Intel and the computing i ..."
Abstract
-
Cited by 101 (0 self)
- Add to MetaCart
industry. Its 1.7 billion transistors extend the Itanium 2 core with an enhanced form of temporal multithreading and a substantially improved cache hierarchy. In addition to these landmarks, designers have incorporated technologies and enhancements that target reliability and manageability, power
Exploiting multi-core architectures in clusters for enhancing the performance of the parallel
"... Bootstrap simulation algorithm ..."
Results 1 - 10
of
860