• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 860
Next 10 →

An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

by Sunpyo Hong, Hyesoon Kim , 2009
"... GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU architect ..."
Abstract - Cited by 134 (5 self) - Add to MetaCart
GPU architectures are increasingly important in the multi-core era due to their high number of parallel processors. Programming thousands of massively parallel threads is a big challenge for software engineers, but understanding the performance bottlenecks of those parallel programs on GPU

Exploiting the Computational Power of the Graphics Card: Optimal State Space Planning on the GPU

by Damian Sulewski, Stefan Edelkamp, Peter Kissmann - In Proc. of the 21st Int. Conf. on Automated Planning and Scheduling (ICAPS , 2011
"... In this paper optimal state space planning is parallelized by exploiting the processing power of a graphics card. The two exploration steps, namely selecting the actions to be applied and generating the successors, are per-formed on a graphics processing unit. Duplicate detec-tion, however, is delay ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
In this paper optimal state space planning is parallelized by exploiting the processing power of a graphics card. The two exploration steps, namely selecting the actions to be applied and generating the successors, are per-formed on a graphics processing unit. Duplicate detec-tion, however

Hardware Transactional Memory for GPU Architectures

by Wilson W. L. Fung, Inderpreet Singh, Andrew Brownsword, Tor M. Aamodt - In MICRO , 2011
"... Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency opera-tions. While threads within a C ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Graphics processor units (GPUs) are designed to efficiently exploit thread level parallelism (TLP), multiplexing execution of 1000s of concurrent threads on a relatively smaller set of single-instruction, multiple-thread (SIMT) cores to hide various long latency opera-tions. While threads within a

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

by Chi-keung Luk, Sunpyo Hong, Hyesoon Kim - In Micro-42 , 2009
"... Heterogeneous multiprocessors are growingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However, the ..."
Abstract - Cited by 89 (3 self) - Add to MetaCart
Heterogeneous multiprocessors are growingly important in the multi-core era due to their potential for high performance and energy efficiency. In order for software to fully realize this potential, the step that maps computations to processing elements must be as automated as possible. However

Efficient parallel graph exploration for multi-core cpu and gpu

by Sungpack Hong, Tayo Oguntebi, Kunle Olukotun - In IEEE PACT , 2011
"... Abstract—Graphs are a fundamental data representation that have been used extensively in various domains. In graph-based applications, a systematic exploration of the graph such as a breadth-first search (BFS) often serves as a key component in the processing of their massive data sets. In this pape ..."
Abstract - Cited by 32 (1 self) - Add to MetaCart
. In this paper, we present a new method for implementing the parallel BFS algorithm on multi-core CPUs which exploits a fundamental property of randomly shaped real-world graph instances. By utilizing memory bandwidth more efficiently, our method shows improved performance over the current state

StoreGPU: Exploiting Graphics Processing Units to Accelerate Distributed Storage Systems

by Samer Al-kiswany, Abdullah Gharaibeh, Elizeu Santos-neto, George Yuan, Matei Ripeanu
"... Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible costeffective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing t ..."
Abstract - Cited by 20 (3 self) - Add to MetaCart
Today Graphics Processing Units (GPUs) are a largely underexploited resource on existing desktops and a possible costeffective enhancement to high-performance systems. To date, most applications that exploit GPUs are specialized scientific applications. Little attention has been paid to harnessing

W.m.W.: CUDA-Lite: Reducing GPU programming complexity

by Sain-zee Ueng, Melvin Lathara, Sara S. Baghsorkhi, Wen-mei W. Hwu - In: LCPC’08. Volume 5335 of LNCS , 2008
"... Abstract. The computer industry has transitioned into multi-core and many-core parallel systems. The CUDA programming environment from NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are still many burdens placed upon the programmer to maximize ..."
Abstract - Cited by 62 (0 self) - Add to MetaCart
memories is still left to the programmer. We believe that this task can be better performed by automated tools. We present CUDA-lite, an enhancement to CUDA, as one such tool. We leverage programmer knowledge via annotations to perform transformations and show preliminary results that indicate auto

A GPU-based multi-agent system for real-time simulations

by Guillermo Vigueras, Juan M. Orduña, Miguel Lozano, Guillermo Vigueras, Juan M. Orduña, Miguel Lozano - Advances in Intelligent and Soft Computing 70/2010 , 2010
"... Abstract The huge number of cores existing in current Graphics Processor Units (GPUs) provides these devices with computing capabilities that can be exploited by distributed applications. In particular, these capabilites have been used in crowd simulations for enhancing the crowd rendering, and even ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract The huge number of cores existing in current Graphics Processor Units (GPUs) provides these devices with computing capabilities that can be exploited by distributed applications. In particular, these capabilites have been used in crowd simulations for enhancing the crowd rendering

Montecito: A Dual-Core, Dual-Thread Itanium Processor

by Cameron Mcnairy, Rohit Bhatia - IEEE Micro , 2005
"... Intel’s Itanium 2 processor series has regularly delivered additional performance through the increased frequency and cache as evidenced by the 6-Mbyte and 9-Mbyte versions. 1 Montecito is the next offering in the Itanium processor family and represents many firsts for both Intel and the computing i ..."
Abstract - Cited by 101 (0 self) - Add to MetaCart
industry. Its 1.7 billion transistors extend the Itanium 2 core with an enhanced form of temporal multithreading and a substantially improved cache hierarchy. In addition to these landmarks, designers have incorporated technologies and enhancements that target reliability and manageability, power

Exploiting multi-core architectures in clusters for enhancing the performance of the parallel

by César A. F. De Rose, Paulo Fern, Antonio M. Lima, Afonso Sales, Thais Webber
"... Bootstrap simulation algorithm ..."
Abstract - Add to MetaCart
Bootstrap simulation algorithm
Next 10 →
Results 1 - 10 of 860
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University