Results 1 -
4 of
4
Tartan: Evaluating spatial computation for whole program execution
- In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2006
"... Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system. Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.
Modeling the global critical path in concurrent systems
, 2006
"... We show how the global critical path can be used as a practical tool for understanding, optimizing and summarizing the behavior of highly concurrent self-timed circuits. Traditionally, critical path analysis has been applied to DAGs, and thus was constrained to combinatorial sub-circuits. We formall ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
We show how the global critical path can be used as a practical tool for understanding, optimizing and summarizing the behavior of highly concurrent self-timed circuits. Traditionally, critical path analysis has been applied to DAGs, and thus was constrained to combinatorial sub-circuits. We formally define the global critical path (GCP) and show how it can be constructed using only local information that is automatically derived directly from the circuit. We introduce a form of Production Rules, which can accurately determine the GCP for a given input vector, even for modules which exhibit choice and early termination. The GCP provides valuable insight into the control behavior of the application, which help in formulating new optimizations and re-formulating existing ones to use the GCP knowledge. We have constructed a fully automated framework for GCP detection and analysis, and have incorporated this framework into a high-level synthesis tool-chain. We demonstrate the effectiveness of the GCP framework by re-formulating two traditional CAD optimizations to use the GCP—yielding efficient algorithms which improve circuit power (by up to 9%) and performance (by up to 60%) in our experiments. 1
Design and Evaluation of a Technology-Scalable Architecture for Instruction-Level Parallelism
, 2007
"... To my parents and sisters. ..."
Reducing control overhead in dataflow architectures
- In Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques
, 2006
"... In recent years, computer architects have proposed tiled architectures in response to several emerging problems in processor design, such as design complexity, wire delay, and fabrication reliability. One of these architectures, WaveScalar, uses a dynamic, tagged-token dataflow execution model to si ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In recent years, computer architects have proposed tiled architectures in response to several emerging problems in processor design, such as design complexity, wire delay, and fabrication reliability. One of these architectures, WaveScalar, uses a dynamic, tagged-token dataflow execution model to simplify the design of the processor tiles and their interconnection network and to achieve good parallel performance. However, using a dataflow execution model reawakens old problems, including the instruction overhead required for control flow. Previous work compiling the functional language Id to the Monsoon Dataflow System found this overhead to be 2−3 × that of programs written in C and targeted to a MIPS R3000. In this paper, we present and analyze three compiler optimizations that significantly reduce control overhead with minimal additional hardware. We begin by describing how to translate imperative code into dataflow assembly and analyze the resulting control overhead. We report a similar 2 − 4 × instruction overhead, which suggests that the execution model, rather than a specific source language or target architecture, is responsible. Then, we present the compiler optimizations, each of which is designed to eliminate a particular type of control overhead, and analyze the extent to which they were able to do so. Finally, we evaluate the effect using all optimizations together has on program performance. Together, the optimizations reduce control overhead by 80 % on average, increasing application performance between 21-37%.

