• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Energy scalability of on-chip interconnection networks in multicore architectures (2008)

by T Konstantakopulos, J Eastep, J Psota, A Agarwal
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

A Lightweight Streaming Layer for Multicore Execution

by David Zhang, Qiuyuan J. Li, et al. , 2007
"... As multicore architectures gain widespread use, it becomes increasingly important to be able to harness their additional processing power to achieve higher performance. However, exploiting parallel cores to improve single-program performance is difficult from a programmer’s perspective because most ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
As multicore architectures gain widespread use, it becomes increasingly important to be able to harness their additional processing power to achieve higher performance. However, exploiting parallel cores to improve single-program performance is difficult from a programmer’s perspective because most existing programming languages dictate a sequential method of execution. Stream programming, which organizes programs by independent filters communicating over explicit data channels, exposes useful types of parallelism that can be exploited. However, there is still the burden of mapping high-level stream programs to specific multicore architectures. The complexities of each architecture’s underlying details makes it difficult to schedule the execution of a stream program with high performance. In this paper, we present the specifications for an intermediate layer between the stream program and the target architecture. This multicore streaming layer (MSL) provides a common level of abstraction that facilitates efficient execution of stream programs by making it easier for compilers to manage computation, and by providing automatic orchestration and optimization of communication when appropriate. We implemented a framework for one such instance of the MSL targeted to the Cell processor and the StreamIt language and achieved greater than 88 % utilization on all benchmarks with relatively small amounts of code. The framework can also be applied to other architectures and stream programming languages to enhance generality and portability.

Interaction of Scaling Trends in Processor Architecture and Cooling

by Wei Huang, Mircea R. Stan, Sudhanva Gurumurthi, Robert J. Rib, Kevin Skadron
"... It is predicted that two important trends are likely to accompany traditional CMOS semiconductor technology scaling— chip multiprocessors and 3D integration. With the everincreasing power consumption and the consequent difficulty in heat removal, it is important to consider the limits and implicatio ..."
Abstract - Add to MetaCart
It is predicted that two important trends are likely to accompany traditional CMOS semiconductor technology scaling— chip multiprocessors and 3D integration. With the everincreasing power consumption and the consequent difficulty in heat removal, it is important to consider the limits and implications of different cooling methods for the upcoming manycore and 3D era. In this paper, we consider both technology scaling and manycore architecture scaling trends in conjunction with conventional air cooling and advanced microchannel cooling for both 2D and 3D microprocessors and identify interesting inflection design points down the road.

System-level Optimizations for Memory Access in the Execution Migration Machine (EM 2)

by Keun Sup, Shim Mieszko, Lis Myong, Hyon Cho, Omer Khan, Srinivas Devadas
"... Abstract. In this paper, we describe system-level optimizations for the Execution Migration Machine (EM 2), a novel shared-memory architecture to address the memory wall and scalability issues for large-scale multicores. In EM 2, data is never replicated and threads always migrate to the core where ..."
Abstract - Add to MetaCart
Abstract. In this paper, we describe system-level optimizations for the Execution Migration Machine (EM 2), a novel shared-memory architecture to address the memory wall and scalability issues for large-scale multicores. In EM 2, data is never replicated and threads always migrate to the core where data is statically stored. This enables EM 2 not only to provide cache coherence without any complex protocols or expensive directories, but also to better utilize on-chip cache and thus experience much lower cache miss rate. However, it may incur significant execution migrations for shared data, which increases memory latency and network traffic, and thus, keeping migration rates low is a key under EM 2. We present systematic application optimization techniques to address this problem for EM 2 suitable for a compiler/OS implementation. Applying these optimizations manually to parallel benchmarks from the SPLASH-2 suite, we dramatically reduce the average migration rate for EM 2 by 53%, which directly improves parallel completion time by 34 % on average. This allows EM 2 to perform competitively compared to a traditional cache-coherent architecture, on a conventional electrical network. 1

DIRECTORYLESS SHARED MEMORY COHERENCE USING EXECUTION MIGRATION

by Mieszko Lis, Keun Sup Shim, Myong Hyon Cho, Omer Khan, Srinivas Devadas, O. Khan
"... We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance co ..."
Abstract - Add to MetaCart
We introduce the concept of deadlock-free migration-based coherent shared memory to the NUCA family of architectures. Migration-based architectures move threads among cores to guarantee sequential semantics in large multicores. Using a execution migration (EM) architecture, we achieve performance comparable to directory-based architectures without using directories: avoiding automatic data replication significantly reduces cache miss rates, while a fast network-level thread migration scheme takes advantage of shared data locality to reduce remote cache accesses that limit traditional NUCA performance. EM area and energy consumption are very competitive, and, on the average, it outperforms a directory-based MOESI baseline by 1.3 × and a traditional S-NUCA design by 1.2×. We argue that with EM scaling performance has much lower cost and design complexity than in directorybased coherence and traditional NUCA architectures: by merely scaling network bandwidth from 256 to 512 bit flits, the performance of our architecture improves by an additional 13%, while the baselines show negligible improvement. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University