### Table 1. Comparison of power consumption.

"... In PAGE 4: ... In all the circuits, only one multiplier was allocated. Table1 shows the comparison results obtained for 4 circuits: 11th order fir filter, wavelet filter, noise canceller, and volterra filter. The third column compares the switched capacitance values estimated for functional units and for the entire circuit.... In PAGE 4: ...1, ........, a5 to obtain the final result. For the adaptive noise cancellation using LMS(Least Mean Square) algorithm and volterra filter, we observe that both of power consumed by the functional units and the entire circuit increase contrary to the decrease in the switched capacitance shown in the third column of Table1 . It is because the latency is reduced due to the loop folding as shown in Table 2.... In PAGE 5: ... In the volterra filter, the power consumed by functional units does not dominate the total power of the circuit. In particular, in the volterra filter the ratio on the fifth column of Table1 is so small that the energy reduction of 20% in the functional units results in the reduction of just 4.9% in the entire circuit.... ..."

### Table 1. Comparison of power consumption.

"... In PAGE 4: ... In all the circuits, only one multiplier was allocated. Table1 shows the comparison results obtained for 4 circuits: 11th order fir filter, wavelet filter, noise canceller, and volterra filter. The third column compares the switched capacitance values estimated for functional units and for the entire circuit.... In PAGE 4: ...1, ........, a5 to obtain the final result. For the adaptive noise cancellation using LMS(Least Mean Square) algorithm and volterra filter, we observe that both of power consumed by the functional units and the entire circuit increase contrary to the decrease in the switched capacitance shown in the third column of Table1 . It is because the latency is reduced due to the loop folding as shown in Table 2.... In PAGE 5: ... In the volterra filter, the power consumed by functional units does not dominate the total power of the circuit. In particular, in the volterra filter the ratio on the fifth column of Table1 is so small that the energy reduction of 20% in the functional units results in the reduction of just 4.9% in the entire circuit.... ..."

### Table 1: Simulated energy consumption (nJ) per switching Switching case

2006

"... In PAGE 2: ... For each of the above cases, we present the energy consumption when (i) the cou- pling capacitance was ignored (No Cc); (ii) the coupling ca- pacitance was considered as ground capacitances on the nets (Grounded Cc); and (iii) the coupling capacitance was con- sidered between the coupled nets (Exact Cc). From Table1 , it is observed that ignoring coupling leads to large underestimation of power in most cases (up to 53% in case of simultaneous opposite switching). In addition, con- sidering coupling as capacitance to ground also leads to large errors: ranging from underestimating power by 26% (for the case of simultaneous opposite switching) to overestimating power by 56% (in case of simultaneous similar switching).... ..."

Cited by 2

### Table 1: Simulated energy consumption (nJ) per switching Switching case

2006

"... In PAGE 2: ... For each of the above cases, we present the energy consumption when (i) the cou- pling capacitance was ignored (No Cc); (ii) the coupling ca- pacitance was considered as ground capacitances on the nets (Grounded Cc); and (iii) the coupling capacitance was con- sidered between the coupled nets (Exact Cc). From Table1 , it is observed that ignoring coupling leads to large underestimation of power in most cases (up to 53% in case of simultaneous opposite switching). In addition, con- sidering coupling as capacitance to ground also leads to large errors: ranging from underestimating power by 26% (for the case of simultaneous opposite switching) to overestimating power by 56% (in case of simultaneous similar switching).... ..."

Cited by 2

### Table 1. Average power consumption of di erent functional units.

"... In PAGE 7: ...In Table1 , the average power consumption of some 16-bit functional units are listed (A = ripple-carry adder, B = carry-save array multiplier, C = carry- save array multiplier with two pipeline stages, D = Wallace-tree multiplier with three pipeline stages). Each functional unit has two input operands.... In PAGE 12: ... At the beginning of a computation, the whole matrix B is input simultaneously to the array, whereas the matrix A is fed sequentially row by row from the left side into the array. Since the matrix A has N1 rows, one operand of the multiplier is xed for = N1 clock cycles which signi cantly reduces the power consumption in the multipliers by 45% (see Table1 ). On account of the design regularity the power savings can multiplied by #PE (line 23 of the algorithm).... ..."

### Table 4 shows the power consumption and power-latency product for several 24-bit TCS adders. In this table, the (2,2) block was replaced by (4t) block and simulated by HEAT tool. We can nd from the table 4 that the 24-bit TCS adders including more number of 2-bit carry-select blocks leads to lower power consumption and larger power-latency product.

"... In PAGE 9: ... This is because the designs including more number of smaller blocks have a small number of multiplexers and reduce the glitching. Table4 : Power Consumption and Power-latency Product of several 24-bit tree-based adders... ..."

### Table 3: Power Consumption and Power-latency Product of several 24-bit carry-select adders Design Power ( W) Area (# of mux) Latency (tmux) Power-latency Product

"... In PAGE 9: ... 3(a) with latency 2tmux is replaced by the 4-bit carry-select block of Fig. 3(b) as shown in Table3 . We can observe from the table 3 that the 24-bit carry-select adders including more number of smaller blocks lead to lower power consumption, but have a higher power-latency product.... ..."

### Table 2: Power Consumption and Power-latency Product for various types of 24-bit TCS Adders Design Power ( W) Area (# of mux) Latency (tmux) Power-latency Product

"... In PAGE 7: ...1 0 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 C0 C0 C0 C0 C1 C2 C3 C4 X+ X X+ X X+ X X+ X X+ X X+ X X+ X X+ X X+ X 1 0 1 0 1 0 1 0 + X4 X4 X+ 2 X2 1 0 1 0 1 0 1 0 0 1 C C C C 5 6 7 8 X X+ 0 0 1 1 6 6 1 1 3 3 3 3 5 5 5 5 7 7 7 7 Figure 8: Modi ed 8-bit tree-based carry generator for unknown C0 X - X0 3 C - C1 4 4tl X - X C - C 4t 1 4 7 X - X X - X C - C C - C X - X X - X C - C C - C 8 4 4 4 4t 11 12 15 16 19 20 23 5 8 9 12 13 16 17 20 21 24 Figure 9: 24-bit TCS adder with C0=1 Transition probability of each input = 0.5 Table2 shows the power consumption and power-latency product of 24-bit TCS adders (which include input rewriting circuitry, carry generation, and output sum generation). We have designed several 24-bit hybrid adders with repect to latency and analyzed power consumption.... ..."

### Table 2: Contributors to Total Power Consumption for Evert MPEG running at 15 frames/second

2000

"... In PAGE 8: ... We plan to extend the current Cai power model[3] to include the additional information our analysis requires once we have determined the power numbers for the components. The mode switching and duration results, shown in Table2 , are again dominated by the slack time available at the end of most frames. Because we apply the same techniques for IPCM throughout the run of the application, we switch excessively during the slack time rather than taking advantage of this extended period of reduced activity.... ..."

Cited by 63

### Table 7. Instruction Power Consumption Power (Watts) Latency (Cycles) Throughput (Cycles)

2005

"... In PAGE 4: ... Table7 [4] shows that high latency instructions such as floating point type, consume less power than the 36W minimum predicted by our models. One possible cause is greater opportunity for clock gating.... In PAGE 4: ... Further investigation will be required to validate this hypothesis. Since Table7 supports our conclusion that high latency instructions consume less power, we then proposed to improve our power model by including this behavior. Our approach was to note that most high latency instructions are composed of relatively long uop sequences provided by the microcode ROM.... ..."

Cited by 9