Results 1 -
3 of
3
Tartan: Evaluating spatial computation for whole program execution
- In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems
, 2006
"... Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Spatial Computing (SC) has been shown to be an energy-efficient model for implementing program kernels. In this paper we explore the feasibility of using SC for more than small kernels. To this end, we evaluate the performance and energy efficiency of entire applications on Tartan, a general-purpose architecture which integrates a reconfigurable fabric (RF) with a superscalar core. Our compiler automatically partitions and compiles an application into an instruction stream for the core and a configuration for the RF. We use a detailed simulator to capture both timing and energy numbers for all parts of the system. Our results indicate that a hierarchical RF architecture, designed around a scalable interconnect, is instrumental in harnessing the benefits of spatial computation. The interconnect uses static configuration and routing at the lower levels and a packet-switched, dynamically-routed network at the top level. Tartan is most energyefficient when almost all of the application is mapped to the RF, indicating the need for the RF to support most general-purpose programming constructs. Our initial investigation reveals that such a system can provide, on average, an order of magnitude improvement in energy-delay compared to an aggressive superscalar core on single-threaded workloads.
System-level Timing Analysis and Optimizations for Hardware Compilation
, 2007
"... high-level synthesis, spatial computing, slack matching, operation chaining, asynchronous latch circuits, transaction level modeling, timing update This dissertation presents a System-Level Timing Analysis (SLTA) methodology and a micro-architectural optimization framework for use within hardware co ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
high-level synthesis, spatial computing, slack matching, operation chaining, asynchronous latch circuits, transaction level modeling, timing update This dissertation presents a System-Level Timing Analysis (SLTA) methodology and a micro-architectural optimization framework for use within hardware compilation. As the EDA abstraction layer of preference is raised to Electronic System Level (ESL), the focus is on describing systems using Transaction Level Modeling (TLM) [CG03, Pas02, Ede06], which is amenable to high-level synthesis. The proposed SLTA methodology and ESL optimization framework is designed to complement TLM-based synthesis flows by analyzing the sequential dependency behavior of system-level transactions. Using this knowledge, control-path-altering, microarchitecture optimizations are applied iteratively on a well-defined hardware Intermediate Representation (IR). There are two over-arching contributions in this dissertation. First, we describe an Intermediate Representation (IR) as a valuable addition to the infrastructure of a hardware compiler. The IR captures data/control dependencies in the source program as well as resource dependencies of the underlying circuit architecture. The IR is an abstraction of transaction events in the TLM but is also
A System-Level Timing Analysis and Optimization Methodology for Hardware Compilation (Extended Abstract)
, 2007
"... Electronic Design Automation (EDA) in the nano era faces a fresh set of challenges. Designs are getting larger and more complex, and design metrics are evolving from area and performance in the past to power and reliability in the future. In this changing landscape, the ITRS roadmap notes that it is ..."
Abstract
- Add to MetaCart
Electronic Design Automation (EDA) in the nano era faces a fresh set of challenges. Designs are getting larger and more complex, and design metrics are evolving from area and performance in the past to power and reliability in the future. In this changing landscape, the ITRS roadmap notes that it is imperative that we raise the level of abstraction in system-level design to deal with this increasing complexity, and look for reliable, manufacturing-friendly circuit architectures in order to overcome these challenges [1]. In my dissertation, I propose an optimization methodology based on system-level timing analysis, which can scale to large complex circuits. The system-level timing model allows the optimization phases to reason about global timing dependencies between circuit events [11], thereby enabling more efficient architectural design space exploration. Specifically, it computes a Global Critical Path (GCP), which traces a path through the circuit graph, indicating the most critical components in the execution. Unlike the traditional view of the critical path as an acyclic path between two clocked registers, the GCP is defined for the entire system, can cross register boundaries and can contain cycles. By computing the GCP, the optimization phases can then focus their efforts on the most important parts of the system, thus allowing us to handle large circuits without compromising efficiency, scalability or accuracy. This analysis and optimization methodology is depicted in Fig. 1, and has been incorporated into the CASH compiler (described below).

