• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 751
Next 10 →

Streaming Dynamic Coarse-Grained CPU/GPU Workloads with Heterogeneous Pipelines in FastFlow

by Mehdi Goli, Michael T. Garba
"... Abstract—Software pipelines permit the decomposition of a repetitive sequential process into a succession of distinguishable sub-processes called stages, each of which can be concurrently executed on a distinct processing element. This paper presents a heterogeneous streaming pipeline implementation ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
implementation using the FastFlow skeletal library for a numerical linear algebra code. By introducing minimal memory management, we implement a large-scale streaming application which allocates the different pipeline stages to multi-core CPU and multi-GPU resources in a cluster environment, demonstrating

Modeling gpu-cpu workloads and systems

by Andrew Kerr, Gregory Diamos, Sudhakar Yalamanchili - GPGPU , 2010
"... Heterogeneous systems, systems with multiple processors tailored for specialized tasks, are challenging programming environments. While it may be possible for domain ex-perts to optimize a high performance application for a very specific and well documented system, it may not perform as well or even ..."
Abstract - Cited by 16 (0 self) - Add to MetaCart
to ease application development and au-tomate program optimization on heterogeneous platforms. This paper reports on an empirical evaluation of 25 CUDA applications on four GPUs and three CPUs, leveraging the Ocelot dynamic compiler infrastructure which can execute and instrument the same CUDA

Exploiting process lifetime distributions for dynamic load balancing

by Mor Harchol-balter, Allen B. Downey - ACM Transactions on Computer Systems , 1997
"... We consider policies for CPU load balancing in networks of workstations. We address the question of whether preemptive migration (migrating active processes) is necessary, or whether remote execution (migrating processes only at the time of birth) is sufficient for load balancing. We show that resol ..."
Abstract - Cited by 364 (32 self) - Add to MetaCart
We consider policies for CPU load balancing in networks of workstations. We address the question of whether preemptive migration (migrating active processes) is necessary, or whether remote execution (migrating processes only at the time of birth) is sufficient for load balancing. We show

Efficient CPU-GPU Work Sharing for

by Data-parallel Javascript Workloads, Xianglan Piao, Channoh Kim, Younghwan Oh, Hanjun Kim, Jae W. Lee
"... Modern web browsers are required to execute many complex, compute-intensive applications, mostly written in JavaScript. With widespread adoption of heterogeneous processors, re-cent JavaScript-based data-parallel programming models, such as River Trail and WebCL, support multiple types of processing ..."
Abstract - Add to MetaCart
-parallel JavaScript workloads. The work shar-ing scheduler partitions the input data into smaller chunks and dynamically dispatches them to both CPU and GPU for concurrent execution. For four data-parallel programs, our framework improves performance by up to 65 % with a geometric mean speedup of 33 % over GPU

Analyzing CUDA workloads using a detailed gpu simulator

by Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, Tor M. Aamodt - In Proceedings of the International Symposium on Performance Analysis of Systems and Software , 2009
"... Modern Graphic Processing Units (GPUs) provide suffi-ciently flexible programming models that understanding their performance can provide insight in designing tomorrow’s manycore processors, whether those are GPUs or other-wise. The combination of multiple, multithreaded, SIMD cores makes studying t ..."
Abstract - Cited by 168 (8 self) - Add to MetaCart
applications demonstrating varying levels of performance improvement on GPU hardware (versus a CPU-only sequential version of the application). We study the performance of these applica-tions on our GPU performance simulator with configurations comparable to contemporary high-end graphics cards. We

Cooperative Heterogeneous Computing for Parallel Processing on CPU/GPU Hybrids

by Changmin Lee, Won W. Ro, Jean-luc Gaudiot
"... This paper presents a cooperative heterogeneous com-puting framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The pro-posed system exploits at runtime the coarse-grain thread-level parallelism ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
This paper presents a cooperative heterogeneous com-puting framework which enables the efficient utilization of available computing resources of host CPU cores for CUDA kernels, which are designed to run only on GPU. The pro-posed system exploits at runtime the coarse-grain thread-level parallelism

CPU/GPU Runtime System

by Alan Humphrey, Qingyu Meng, Martin Berzins, Todd Harman, Alan Humphrey, Qingyu Meng , 2012
"... The Uintah Computational Framework was developed to provide an environment for solving fluid-structure interaction problems on structured adaptive grids on large-scale, long-running, data-intensive problems. Uintah uses a combination of fluid-flow solvers and particle-based methods for solids, toget ..."
Abstract - Add to MetaCart
and Pthreads, Uintah now runs on up to 262k cores on the DOE Jaguar system. In order to extend Uintah to heterogeneous systems, with ever-increasing CPU core counts and additional onnode GPUs, a new dynamic CPU-GPU task scheduler is designed and evaluated in this study. This new scheduler enables Uintah

Error control and analysis in coarse-graining of stochastic lattice dynamics

by Markos A. Katsoulakis, Petr Plecháč, Alexandros Sopasakis , 2005
"... The coarse-grained Monte Carlo (CGMC) algorithm was originally proposed in the series of works [15, 16]. In this paper we further investigate the approximation properties of the coarse-graining procedure and relation between the coarse-grained and microscopic processes. We provide both analytical an ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
The coarse-grained Monte Carlo (CGMC) algorithm was originally proposed in the series of works [15, 16]. In this paper we further investigate the approximation properties of the coarse-graining procedure and relation between the coarse-grained and microscopic processes. We provide both analytical

Accelerating Inclusion-based Pointer Analysis on Heterogeneous CPU-GPU Systems

by Yu Su, Ding Ye, Jingling Xue
"... Abstract—This paper describes the first implementation of Andersen’s inclusion-based pointer analysis for C programs on a heterogeneous CPU-GPU system, where both its CPU and GPU cores are used. As an important graph algorithm, Andersen’s analysis is difficult to parallelise because it makes extensi ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
-only speedups for certain programs (i.e., graphs) unpredictable. We observe that a naive parallel solution of Andersen’s analysis on a CPU-GPU system suffers from poor performance due to workload imbalance. We introduce a solution that is centered around a new dynamic workload distribution scheme. The novelty

Dynamically Managed Data for CPU-GPU Architectures

by Thomas B. Jablin, James A. Jablin, Prakash Prabhu, Feng Liu, David I. August
"... GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In prior work on automatic CPU-GPU data managemen ..."
Abstract - Cited by 12 (1 self) - Add to MetaCart
GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers must ensure a consistent program state between the CPU and GPU memories by managing data. Manually managing data is tedious and error-prone. In prior work on automatic CPU-GPU data
Next 10 →
Results 1 - 10 of 751
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University