• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Design Space Evaluation of Grid Processor Architectures (2001)

Cached

  • Download as a PDF

Download Links

  • [www.cse.unsw.edu.au]
  • [www.cs.utexas.edu]
  • [ftp.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Ramadass Nagarajan , Karthikeyan Sankaralingam , Doug Burger , Stephen W. Keckler
Citations:100 - 31 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Nagarajan01adesign,
    author = {Ramadass Nagarajan and Karthikeyan Sankaralingam and Doug Burger and Stephen W. Keckler},
    title = { A Design Space Evaluation of Grid Processor Architectures},
    year = {2001}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

In this paper, we survey the design space of a new class of architec-tures called Grid Processor Architectures (GPAs). These architectures are designed to scale with technology, allowing faster clock rates than conventional architectures while providing superior instruction-level parallelism on traditional workloads and high performance across a range of application classes. A GPA consists of an array of ALUs, each with limited control, connected by a thin operand network. Pro-grams are executed by mapping blocks of statically scheduled instruc-tions to the ALU array and executing them dynamically in dataflow or-der This organization enables the critical paths of instruction blocks to be executed on chains of ALUs without transmitting temporary val-ues back to the register file, avoiding most of the large, unscalable structures that limit the scalability of conventional architectures. Fi-nally, we present simulation results of a preliminary design, the GPA-1. With a half-cycle routing delay, we obtain performance roughly equal to an ideal 8-way, 512-entry window superscalar core. With no inter-ALU delay, perfect memory, and perfect branch prediction, the 1PC of the GPA-1 is more than twice that of the ideal superscalar core, achieving an average of 11 IPC across nine SPEC CPU2000 and Mediabench benchmarks.

Citations

1461 The SimpleScalar Tool Set, Version 2.0 - Burger, Austin - 1997
747 MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communications Systems - Lee - 1997
487 Multiscalar processors - Sohi, Breach, et al. - 1995
432 Virtual-Channel Flow Control - Dally - 1992
385 Complexity-effective superscalar processors - Palacharla, Jouppi, et al. - 1997
369 The Alpha 21264 microprocessor - Kessler - 1999
319 Effective compiler support for predicated execution using the hyperblock - Mahlke, Lin, et al. - 1992
264 Clock rate versus IPC: The end of the road for conventional microarchitectures - Agarwal, Hrishikesh, et al. - 2000
164 Baring It All to Software: Raw Machines - Waingold, others - 1997
157 Trace processors - Rotenberg, Jacobson, et al. - 1997
153 The microarchitecture of the Pentium 4 processor - HINTON, SAGER, et al. - 2001
140 Executing a program on the MIT taggedtoken data flow architecture - Arvind, Nikhil - 1987
136 Fine-grain parallelism with minimal hardware support: A compiler-controlled threaded abstract machine - Culler, Sah, et al. - 1991
104 Very long instruction word architectures and the ELI-512 - Fisher - 1983
81 A prelinary architecture for a basic data-flow processor - Dennis, Misunas - 1975
79 Improving superscalar instruction dispatch and issue by exploiting dynamic code sequences - Vajapeyam, Mitra - 1997
48 Integrated predicated and speculative execution - August, Connors, et al. - 1998
44 MOVE: A framework for high-performance processor design - Corporaal, Mulder - 1991
36 Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures - Hao - 1996
30 Dynamically scheduled VLIW processors - Rau - 1993
26 An empirical study of decentralized ILP execution models - Ranganathan, Franklin - 1998
12 Using Sacks to organize register files in VLIW machines - Llosa, Valero, et al. - 1994
7 Register queues: A new hardware/software approach to ecient software pipelining - Smelyanskiy, Tyson, et al. - 2000
6 Design of transport triggered architectures - Corporaal - 1994
5 The Raw processor - a scalable 32-bit fabric for embedded and general purpose computing - Taylor, Kim, et al. - 2001
3 IPC in the 10's via resource flow computing with Levo - Uht, Morano, et al. - 2001
1 Executing a program on the M1T Tagged-Token Dataflow Architecture - Arvind, Nikhil - 1990
1 Express cubes: Improving the performance ofk-ary n-cube interconnection networks - Daily - 1991
1 Dynamically scheduled VLIW processors - Ran - 1993
1 Express cubes: Improving the performance ofk-ary n-cube interconnection networks - Dally - 1991
1 IPC in the 1 O's via resource flow computing with Levo - Uht, Morano, et al. - 2001
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University