Results 1 - 10
of
87
The filter cache: An energy efficient memory structure
- In Proceedings of the 1997 International Symposium on Microarchitecture
, 1997
"... Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many ..."
Abstract
-
Cited by 166 (4 self)
- Add to MetaCart
Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many applications, such as portable devices, low power is more important than performance. We propose to trade performance for power consumption by filtering cache references through an unusually small L1 cache. An L2 cache, which is similar in size and structure to a typical L1 cache, is positioned behind the filter cache and serves to reduce the performance loss. Experimental results across a wide range of embedded applications show that the filter cache results in improved memory system energy efficiency. For example, a direct mapped 256-byte filter cache achieves a 58 % power reduction while reducing performance by 21%, corresponding to a 51% reduction in the energy-delay product over a conventional design. 1
The Design and Use of SimplePower: A Cycle-Accurate Energy Estimation Tool
, 2000
"... In this paper, we present the design and use of a comprehensive framework, SimplePower, for evaluating the effect of high-level algorithmic, architectural, and compilation tradeoffs on energy. An execution-driven, cycle-accurate RT level energy estimation tool that uses transition sensitive energy m ..."
Abstract
-
Cited by 130 (8 self)
- Add to MetaCart
In this paper, we present the design and use of a comprehensive framework, SimplePower, for evaluating the effect of high-level algorithmic, architectural, and compilation tradeoffs on energy. An execution-driven, cycle-accurate RT level energy estimation tool that uses transition sensitive energy models forms the cornerstone of this framework. SimplePower also provides the energy consumed in the memory system and on-chip buses using analytical energy models.
System-Level Power Optimization: Techniques and Tools
- ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
, 2000
"... ..."
Analytical Energy Dissipation Models For Low Power Caches
, 1997
"... We present detailed analytical models for estimating the energy dissipation in conventional caches as well as low energy cache architectures. The analytical models use the run time statistics such as hit/miss counts, fraction of read/write requests and assume stochastical distributions for signal va ..."
Abstract
-
Cited by 106 (3 self)
- Add to MetaCart
We present detailed analytical models for estimating the energy dissipation in conventional caches as well as low energy cache architectures. The analytical models use the run time statistics such as hit/miss counts, fraction of read/write requests and assume stochastical distributions for signal values. These models are validated by comparing the power estimated using these models against the power estimated using a detailed simulator called CAPE (CAache Power Estimator). The analytical models for conventional caches are found to be accurate to within 2% error. However, these analytical models over--predict the dissipations of low--power caches by as much as 30%. The inaccuracies can be attributed to correlated signal values and locality of reference, both of which are exploited in making some cache organizations energy efficient.
An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches
, 2001
"... Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is ..."
Abstract
-
Cited by 103 (6 self)
- Add to MetaCart
Deep-submicron CMOS designs maintain high transistor switching speeds by scaling down the supply voltage and proportionately reducing the transistor threshold voltage. Lowering the threshold voltage increases leakage energy dissipation due to subthreshold leakage current even when the transistor is not switching. Estimates suggest a five-fold increase in leakage energy in every future generation. In modern microarchitectures, much of the leakage energy is dissipated in large on-chip cache memory structures with high transistor densities. While cache utilization varies both within and across applications, modern cache designs are fixed in size resulting in transistor leakage inefficiencies. This paper
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation
- In International Symposium on Low Power Electronics and Design
, 1999
"... Modern microprocessors employ one or two levels of on–chip caches to bridge the burgeoning speed disparities between the processor and the RAM. These SRAM caches are a major source of power dissipation. We investigate architectural techniques, that do not compromise the processor cycle time, for red ..."
Abstract
-
Cited by 101 (3 self)
- Add to MetaCart
Modern microprocessors employ one or two levels of on–chip caches to bridge the burgeoning speed disparities between the processor and the RAM. These SRAM caches are a major source of power dissipation. We investigate architectural techniques, that do not compromise the processor cycle time, for reducing the power dissipation within the on–chip cache hierarchy in superscalar microprocessors. We use a detailed register–level simulator of a superscalar microprocessor that simulates the execution of the SPEC benchmarks and SPICE measurements for the actual layout of a 0.5 micron, 4metal layer cache, optimized for a 300 MHz. clock. We show that a combination of subbanking, multiple line buffers and bit–line segmentation can reduce the on–chip cache power dissipation by as much as 75 % in a technology–independent manner. Key words: Low power caches, power estimation. 1.
Synthesis Techniques for Low-Power Hard Real-Time Systems on Variable Voltage Processors
, 1998
"... The energy efficiency of systems-on-a-chip can be much improved if one were to vary the supply voltage dynamically at run time. In this paper we describe the synthesis of systems-on-a-chip based on core processors, while treating voltage (and correspondingly, the clock frequency) as a variable to be ..."
Abstract
-
Cited by 96 (5 self)
- Add to MetaCart
The energy efficiency of systems-on-a-chip can be much improved if one were to vary the supply voltage dynamically at run time. In this paper we describe the synthesis of systems-on-a-chip based on core processors, while treating voltage (and correspondingly, the clock frequency) as a variable to be scheduled along with the computation tasks during the static scheduling step. In addition to describing the complete synthesis design flow for these variable voltage systems, we focus on the problem of doing the voltage scheduling while taking into account the inherent limitation on the rates at which the voltage and clock frequency can be changed by the power supply controllers and clock generators. Taking these limits on rate of change into account is crucial since changing the voltage by even a volt may take time equivalent to 100s to 10,000s of instructions on modern processors. We present both an exact but impractical formulation of this scheduling problem as a set of non-linear equations, as well as a heuristic approach based on reduction to an optimally solvable restricted ordered scheduling problem. Using various task mixes drawn from a set of nine real-life applications, our results show that we are able to reduce power consumption to within 7% of the lower bound obtained by imposing no limit at the rate of change of voltage and clock frequencies.
Power Optimization of Variable Voltage Core-Based Systems
, 1998
"... The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. We develop the design methodology for the low power core-based rea ..."
Abstract
-
Cited by 86 (7 self)
- Add to MetaCart
The growing class of portable systems, such as personal computing and communication devices, has resulted in a new set of system design requirements, mainly characterized by dominant importance of power minimization and design reuse. We develop the design methodology for the low power core-based real-time system-onchip based on dynamically variable voltage hardware. The key challenge is to develop effective scheduling techniques that treat voltage as a variable to be determined, in addition to the conventional task scheduling and allocation. Our synthesis technique also addresses the selection of the processor core and the determination of the instruction and data cache size and configuration so as to fully exploit dynamically variable voltage hardware, which result in significantly lower power consumption for a set of target applications than existing techniques. The highlight of the proposed approach is the non-preemptive scheduling heuristic which results in solutions very close to optimal ones for many test cases. The effectiveness of the approach is demonstrated on a variety of modern industrial-strength multimedia and communication applications.
Way-Predicting Set-Associative Cache for High Performance and Low Energy Consumption
- In Proceedings of the International Symposium on Low Power Electronics and Design
, 1999
"... This paper proposes a new approach using way prediction for achieving high performance and low energy consumption of set-associative caches. By accessing only a single cache way predicted, instead of accessing all the ways in a set, the energy consumption can be reduced. This paper shows that the wa ..."
Abstract
-
Cited by 74 (4 self)
- Add to MetaCart
This paper proposes a new approach using way prediction for achieving high performance and low energy consumption of set-associative caches. By accessing only a single cache way predicted, instead of accessing all the ways in a set, the energy consumption can be reduced. This paper shows that the way-predicting set-associative cache improves the ED (energy-delay) product by 60--70% compared to a conventional set-associative cache. 1 Introduction On-chip cache has been playing an important role in achieving high memory performance and low energy consumption, because it can reduce the number of access to slower and higher energy off-chip next-level memory (usually DRAM). As on-chip cache size has increased, however, the energy dissipated by on-chip caches has become significant. There have been several proposals for reducing the power consumption of on-chip caches. MDM (Multiple-Divided Module) cache[8] attempts to reduce the power consumption by means of partitioning the cache into s...
Memory Exploration for Low Power, Embedded Systems
, 1999
"... In embedded system design, the designer has to choose an onchip memory configuration that is suitable for a specific application. To aid in this design choice, we present a memory exploration strategy based on three performance metrics, namely, cache size, the number of processor cycles and the ener ..."
Abstract
-
Cited by 73 (2 self)
- Add to MetaCart
In embedded system design, the designer has to choose an onchip memory configuration that is suitable for a specific application. To aid in this design choice, we present a memory exploration strategy based on three performance metrics, namely, cache size, the number of processor cycles and the energy consumption. We show how the performance is affected by cache parameters such as cache size, line size, set associativity and tiling, and the off-chip data organization. We show the importance of including energy in the performance metrics, since an increase in the cache line size, cache size, tiling and set associativity reduces the number of cycles but does not necessarily reduce the energy consumption. These performance metrics help us find the minimum energy cache configuration if time is the hard constraint, or the minimum time cache configuration if energy is the hard constraint.

