Results 1 -
3 of
3
Conservation Cores: Reducing the Energy of Mature Computations
"... Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Growing transistor counts, limited power budgets, and the breakdown of voltage scaling are currently conspiring to create a utilization wall that limits the fraction of a chip that can run at full speed at one time. In this regime, specialized, energy-efficient processors can increase parallelism by reducing the per-computation power requirements and allowing more computations to execute under the same power budget. To pursue this goal, this paper introduces conservation cores. Conservation cores, or c-cores, are specialized processors that focus on reducing energy and energy-delay instead of increasing performance. This focus on energy makes c-cores an excellent match for many applications that would be poor candidates for hardware acceleration (e.g., irregular integer codes). We present a toolchain for automatically synthesizing c-cores from application source code and demonstrate that they can significantly reduce energy and energy-delay for a wide range of applications. The c-cores support patching, a form of targeted reconfigurability, that allows them to adapt to new versions of the software they target. Our results show that conservation cores can reduce energy consumption by up to 16.0 × for functions and by up to 2.1 × for whole applications, while patching can extend the useful lifetime of individual c-cores to match that of conventional processors.
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs
- In MICRO-43: Proceedings of the 43th Annual IEEE/ACM International Symposium on Microarchitecture
, 2010
"... To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cor ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
To extend the exponential performance scaling of future chip multiprocessors, improving energy efficiency has become a first-class priority. Single-chip heterogeneous computing has the potential to achieve greater energy efficiency by combining traditional processors with unconventional cores (U-cores) such as custom logic, FPGAs, or GPGPUs. Although U-cores are effective at increasing performance, their benefits can also diminish given the scarcity of projected bandwidth in the future. To understand the relative merits between different approaches in the face of technology constraints, this work builds on prior modeling of heterogeneous multicores to support U-cores. Unlike prior models that trade performance, power, and area using well-known relationships between simple and complex processors, our model must consider the less-obvious relationships between conventional processors and a diverse set of U-cores. Further, our model supports speculation of future designs from scaling trends predicted by the ITRS road map. The predictive power of our model depends upon U-core-specific parameters derived by measuring performance and power of tuned applications on today’s state-of-the-art multicores, GPUs, FPGAs, and ASICs. Our results reinforce some current-day understandings of the potential and limitations of U-cores and also provides new insights on their relative merits. 1.
Design Space Exploration of CMPs with Caches and Local Memories
"... Chip multiprocessors (CMPs) are the dominating architectures nowadays. There is a big variety of designs in current CMPs, with different number of cores and memory subsystems, because they are used in a wide spectrum of domains and so its best configuration highly depends on several design goals suc ..."
Abstract
- Add to MetaCart
Chip multiprocessors (CMPs) are the dominating architectures nowadays. There is a big variety of designs in current CMPs, with different number of cores and memory subsystems, because they are used in a wide spectrum of domains and so its best configuration highly depends on several design goals such as performance, energy consumption, scalability, area and programmability. This paper studies different chip configurations in terms of number of cores, size of the shared L3 cache and off-chip bandwidth requirements in order to find what is the most efficient design for High Performance Computing applications. In addition, it analyzes two types of CMPs: CMPs with a traditional cache hierarchy, or cache-based CMPs, and CMPs with both cache hierarchy and local memories, or hybrid memory CMPs. Results show that, for HPC workloads, cache-based cores perform better when the shared L3 cache is reduced in order to make room for additional cores and the bus is able to provide a high bandwidth, reaching a speedup of 3.31x against a baseline architecture. The best chip configurations of hybrid memory CMPs are the ones with a moderate number of cores and not so mall L3 caches and, furthermore, they don’t need a bus with high bandwidth. They achieve a maximum speedup of 3.06x against the baseline architecture. In the direct comparison between the chip configurations of the two types of CMPs, hybrid memory CMPs outperform cache-based CMPs in almost all configurations, achieving a maximum speedup of 1.74x. 1.

