Results 1 - 10
of
10
Conservation Cores: Reducing the Energy of Mature Computations
"... Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0 × for functions and by up to 2.1 × for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs
- In MICRO-43: Proceedings of the 43th Annual IEEE/ACM International Symposium on Microarchitecture
, 2010
"... To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cor ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also diminish given the scarcity of projected bandwidth in the future. To understand the relative merits between different approaches in the face of technology constraints, this work builds on prior modeling of heterogeneous multicores to support U-cores. Unlike prior models that trade performance, power, and area using well-known relationships between simple and complex processors, our model must consider the less-obvious relationships between conventional processors and a diverse set of U-cores. Further, our model supports speculation of future designs from scaling trends predicted by the ITRS road map. The predictive power of our model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today’s state-of-the-art multicores, GPUs, FPGAs, and ASICs. Our results reinforce some current-day understandings of the potential and limitations of U-cores and also provides new insights on their relative merits. 1.
Dynamically Specialized Datapaths for Energy Efficient Computing
"... Due to limits in technology scaling, energy efficiency of logic devices is decreasing in successive generations. To provide continued performance improvements without increasing power, regardless of the sequential or parallel nature of the application, microarchitectural energy efficiency must impro ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Due to limits in technology scaling, energy efficiency of logic devices is decreasing in successive generations. To provide continued performance improvements without increasing power, regardless of the sequential or parallel nature of the application, microarchitectural energy efficiency must improve. We propose Dynamically Specialized Datapaths to improve the energy efficiency of general purpose programmable processors. The key insights of this work are the following. First, applications execute in phases and these phases can be determined by creating a path-tree of basic-blocks rooted at the inner-most loop. Second, specialized datapaths corresponding to these path-trees, which we refer to as DySER blocks, can be constructed by interconnecting a set of heterogeneous computation units with a circuit-switched network. These blocks can be easily integrated with a processor pipeline. A synthesized RTL implementation using an industry 55nm technology library shows a 64-functional-unit DySER block occupies approximately the same area as a 64 KB single-ported SRAM and can execute at 2 GHz. We extend the GCC compiler to identify path-trees and code-mapping to DySER and evaluate the PAR-SEC, SPEC and Parboil benchmarks suites. Our results show that in most cases two DySER blocks can achieve the same performance (within 5%) as having a specialized hardware module for each path-tree. A 64-FU DySER block can cover 12 % to 100% of the dynamically executed instruction stream. When integrated with a dual-issue out-of-order processor, two DySER blocks provide geometric mean speedup of 2.1X (1.15X to 10X), and geometric mean energy reduction of 40 % (up to 70%), and 60 % energy reduction if no performance improvement is required. 1
Design and Evaluation of a Technology-Scalable Architecture for Instruction-Level Parallelism
, 2007
"... To my parents and sisters. ..."
Reducing the Energy Cost of Irregular Code Bases in Soft Processor Systems
"... Abstract — This paper describes an architecture and FPGA synthesis toolchain for building specialized, energy-saving coprocessors called Irregular Code Energy Reducers (ICERs) for a wide range of unmodified C programs. FPGAs are increasingly used to build large-scale systems, and many large software ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — This paper describes an architecture and FPGA synthesis toolchain for building specialized, energy-saving coprocessors called Irregular Code Energy Reducers (ICERs) for a wide range of unmodified C programs. FPGAs are increasingly used to build large-scale systems, and many large software systems contain relatively little code that is amenable to automatic, semi-automatic, or even manual parallelization. Whereas accelerator approaches have traditionally achieved energy benefits as a side effect from increasing performance via parallel execution, ICERs aim to achieve energy gains even on code with little exploitable parallelism. Traditional approaches to automatically generating accelerators from existing software rely on inferring parallel execution from serial code, so they face the same code analysis challenges as parallelizing compilers. In contrast, because the ICER approach targets energy rather than performance, it easily scales to large, irregular applications that are poor candidates for traditional acceleration. Our results show that, compared to a baseline system with soft processor cores, ICERs can reduce energy consumption by up to 9.5 × for the code they target and 2.8 × for whole applications. Keywords-Accelerator architectures; Reconfigurable architectures; Energy efficiency; High level synthesis I.
An Evaluation of Selective Depipelining for FPGA-based Energy-Reducing Irregular Code Coprocessors
"... Abstract — As the complexity of FPGA-based systems scales, the importance of efficiently handling irregular code increases. Recent work has proposed Irregular Code Energy Reducers (ICERs), a high-level synthesis approach for FPGAs that offers significant energy reduction for irregular code compared ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — As the complexity of FPGA-based systems scales, the importance of efficiently handling irregular code increases. Recent work has proposed Irregular Code Energy Reducers (ICERs), a high-level synthesis approach for FPGAs that offers significant energy reduction for irregular code compared to a soft core processor. ICERs target the hot-spots of programs, and are seamlessly connected via a shared L1 cache with a soft processor that executes the cold code. This paper evaluates the application of the selective depipelining (SDP) technique to ICERs, which greatly reduces both the execution time and energy of irregular computations. SDP enables irregular computations to be expressed as large, fast, low-power combinational blocks. SDP maintains high memory bandwidth by scheduling the many potentially dependent memory operations within these blocks onto a high-frequency, highly-multiplexed coherent memory while scheduling combinational operations at a much lower frequency. SDP is a key enabler for improving the execution properties of irregular computations that are difficult to parallelize. We show that applying SDP to ICERs reduces energy-delay by 2.62 × relative to ICERs. ICERs with SDP are up to 2.38 × faster than a soft core processor and reduce energy consumption by up to 15.83 × for a variety of irregular applications. I.
Integrated Circuit Design (E.I.S.)
"... Abstract. We present an improved method for scheduling speculative data paths which relies on cancel tokens to undo computations in misspeculated paths. Performancewise, this method is considerably faster than lenient execution, and faster than any other known approach applicable for general (includ ..."
Abstract
- Add to MetaCart
Abstract. We present an improved method for scheduling speculative data paths which relies on cancel tokens to undo computations in misspeculated paths. Performancewise, this method is considerably faster than lenient execution, and faster than any other known approach applicable for general (including non-pipelined) computation structures. We present experimental evidence obtained by implementing our method as part of the high-level language hardware/software compiler COMRADE. 1
Accelerating Speculative Execution in . . .
, 2008
"... We present an improved method for scheduling speculative data paths which relies on cancel tokens to undo computations in misspeculated paths. Performancewise, this method is considerably faster than lenient execution, and faster than any other known approach applicable for general (including non-pi ..."
Abstract
- Add to MetaCart
We present an improved method for scheduling speculative data paths which relies on cancel tokens to undo computations in misspeculated paths. Performancewise, this method is considerably faster than lenient execution, and faster than any other known approach applicable for general (including non-pipelined) computation structures. We present experimental evidence obtained by implementing our method as part of the high-level language hardware/software compiler COMRADE.
Dedication.....................................
, 2011
"... it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair ..."
Abstract
- Add to MetaCart
it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair
Dedication.....................................
"... and it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair ..."
Abstract
- Add to MetaCart
and it is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair

