• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

K.S.: Merging head and tail duplication for convergent hyperblock formation (2006)

by B Maher, A Smith, D Burger, McKinley
Add To MetaCart

Tools

Sorted by:
Results 11 - 11 of 11

The Good Block: Hardware/Software Design for Composable, Block-Atomic Processors

by Bertrand A. Maher, Katherine E. Coons, Kathryn Mckinley, Doug Burger
"... Power consumption, complexity, and on-chip latency are forcing computer systems to exploit more parallelism efficiently. Explicit Dataflow Graph Execution (EDGE) architectures seek to expose parallelism by dividing programs into blocks of efficient dataflow operations, exposing inter and intra-block ..."
Abstract - Add to MetaCart
Power consumption, complexity, and on-chip latency are forcing computer systems to exploit more parallelism efficiently. Explicit Dataflow Graph Execution (EDGE) architectures seek to expose parallelism by dividing programs into blocks of efficient dataflow operations, exposing inter and intra-block concurrency. This paper studies the balance of complexity and capability between EDGE architectures and compilers. We address three main questions. (1) What are the appropriate block granularities for achieving high performance efficiently? (2) What are good block instruction selection policies? (3) What architecture and compiler support do these designs require? Our results show that the compiler requires multiple block sizes to adapt applications to block-atomic hardware and achieve high performance. Although the architecture for a single size is simpler, the additions for variable sizes are modest and ease hardware configuration. We propose hand-crafted and learned compiler policies for block formation. We find the best policies provide significant advantages of up to a factor of 3 in some configurations. Policies vary based on (1) the amount of parallelism inherent in the application, e.g., for integer and numerical applications, and (2) the available parallel resources. The resulting configurable architecture and compiler efficiently expose and exploit software and hardware parallelism. 1.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University