Results 1  10
of
49
A coding framework for lowpower address and data busses
 IEEE Transactions on VLSI Systems
, 1999
"... Abstract—This paper presents a sourcecoding framework for the design of coding schemes to reduce transition activity. These schemes are suited for highcapacitance busses where the extra power dissipation due to the encoder and decoder circuitry is offset by the power savings at the bus. In this fr ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
Abstract—This paper presents a sourcecoding framework for the design of coding schemes to reduce transition activity. These schemes are suited for highcapacitance busses where the extra power dissipation due to the encoder and decoder circuitry is offset by the power savings at the bus. In this framework, a data source (characterized in a probabilistic manner) is first passed through a decorrelating function �I. Next, a variant of entropy coding function �P is employed, which reduces the transition activity. The framework is then employed to derive novel encoding schemes whereby practical forms for �I and �P are proposed. Simulation results with an encoding scheme for data busses indicate an average reduction in transition activity of 36%. This translates into a reduction in total power dissipation for bus capacitances greater than 14 pF/b in 1.2"m CMOS technology. For a typical value for bus capacitance of 50 pF/b, there is a 36 % reduction in power dissipation and eight times more power savings compared to existing schemes. Simulation results with an encoding scheme for instruction address busses indicate an average reduction in transition activity by a factor of 1.5 times over known coding schemes. Index Terms — CMOS VLSI, coding, highcapacitance busses, lowpower design, switching activity.
A Mathematical Basis For PowerReduction In Digital VLSI Systems
 IEEE Trans. Circuits Syst. II
, 1997
"... Presented in this paper is a mathematical basis for powerreduction in VLSI systems. This basis is employed to 1.) derive lower bounds on the power dissipation in digital systems and 2.) unify existing powerreduction techniques under a common framework. The proposed basis is derived from informatio ..."
Abstract

Cited by 24 (15 self)
 Add to MetaCart
Presented in this paper is a mathematical basis for powerreduction in VLSI systems. This basis is employed to 1.) derive lower bounds on the power dissipation in digital systems and 2.) unify existing powerreduction techniques under a common framework. The proposed basis is derived from informationtheoretic arguments. In particular, a digital signal processing algorithm is viewed as a process of information transfer with an inherent information transfer rate requirement of R bits/sec. Architectures implementing a given algorithm are equivalent to communication networks each with a certain capacity C (also in bits/sec). The absolute lower bound on the power dissipation for any given architecture is then obtained by minimizing the signal power such that its channel capacity C is equal to the desired information transfer rate R. By including various implementation constraints, increasingly realistic lower bounds are calculated. The usefulness of the proposed theory is demonstrated via...
HighSpeed Architectures for ReedSolomon Decoders
 IEEE transactions on VLSI Systems
, 2001
"... New highspeed VLSI architectures for decoding ReedSolomon codes with the BerlekampMassey algorithm are presented in this paper. The speed bottleneck in the BerlekampMassey algorithm is in the iterative computation of discrepancies followed by the updating of the errorlocator polynomial. This ..."
Abstract

Cited by 22 (3 self)
 Add to MetaCart
New highspeed VLSI architectures for decoding ReedSolomon codes with the BerlekampMassey algorithm are presented in this paper. The speed bottleneck in the BerlekampMassey algorithm is in the iterative computation of discrepancies followed by the updating of the errorlocator polynomial. This bottleneck is eliminated via a series of algorithmic transformations that result in a fully systolic architecture in which a single array of processors computes both the errorlocator and the errorevaluator polynomials. In contrast to conventional BerlekampMassey architectures in which the critical path passes through two multipliers and 1+ log 2 ( +1) adders, the critical path in the proposed architecture passes through only one multiplier and one adder, which is comparable to the critical path in architectures based on the extended Euclidean algorithm. More interestingly, the proposed architecture requires approximately 25% fewer multipliers and a simpler control structure than the architectures based on the popular extended Euclidean algorithm. For blockinterleaved ReedSolomon codes, embedding the interleaver memory into the decoder results in a further reduction of the critical path delay to just one XOR gate and one multiplexer, leading to speed ups of as much as an order of magnitude over conventional architectures.
Wave digital filter structures for highspeed narrowband and wideband filtering
 IEEE Trans. Circuits Syst. II
, 1999
"... Abstract—Wave digital filter (WDF) structures for highspeed narrowband and wideband filtering are introduced. The narrowband filter is composed of a periodic model filter and one or several, possibly periodic, masking filters in cascade. Lattice and bireciprocal lattice WDF filters are used for ..."
Abstract

Cited by 12 (9 self)
 Add to MetaCart
Abstract—Wave digital filter (WDF) structures for highspeed narrowband and wideband filtering are introduced. The narrowband filter is composed of a periodic model filter and one or several, possibly periodic, masking filters in cascade. Lattice and bireciprocal lattice WDF filters are used for the model and masking filters, respectively. The wideband filter consists of a narrowband filter in parallel with an allpass filter. The overall filters can be designed by separately designing the model and masking filters. The filters obtained in this approach also serve as good initial filters for further optimization. Both nonlinear and approximately linear phase filters are considered. One major advantage of the new filters over the corresponding conventional filters is that they have a substantially higher maximal sample frequency. In the case of approximately linear phase, the computational complexity can also be reduced. Further, the use of bireciprocal lattice wave digital (WD) masking filters also makes it possible to reduce the complexity, compared with the case in which FIR masking filters are used. Several design examples and a discussion of finite wordlength effects are included for demonstrating the properties of the new filters. Index Terms—Highspeed filter, narrowband filter, wave digital filters, wideband filter. I.
Feedforward Architectures for parallel Viterbi Decoding
 Kluwer J. on VLSI signal processing
, 1991
"... The ViterbiAlgorithm (VA) is a common application of dynamic programming. Since it contains a nonlinear feedback loop (ACSfeedback, ACS: addcompareselect), this loop is the bottleneck in high data rate implementations. In this paper we show that asymptotically the ACSfeedback no longer has to b ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
The ViterbiAlgorithm (VA) is a common application of dynamic programming. Since it contains a nonlinear feedback loop (ACSfeedback, ACS: addcompareselect), this loop is the bottleneck in high data rate implementations. In this paper we show that asymptotically the ACSfeedback no longer has to be processed recursively, i.e. there is no feedback, resulting in negligible performance loss. This can be exploited to derive purely feedforward architectures for Viterbi decoding, such that a modular cascadable implementation results. By designing one cascadable module, any speedup can be achieved simply by adding modules to the implementation. It is shown that optimization criteria, e.g. minimum latency or maximum hardware efficiency, are met by very different architectures. be seen that they merge into a unique path, the optimum one. The survivor depth D is then defined as that depth in which it is highly probable that all paths merge (time kD). In a practical implementation of the VA, called Viterbi decoder (VD), this allows the decoded transition to be given out with latency D. The computation of the best path to each node of the trellis is achieved through dynamic programming by calculating a path metric yi,k for each state Si at every time instant k according to the "ACSrecursion" v s.: yi,k+l = ma.ximum (,\.. y,k + 'j,k) V j+i which for the simple example Fig. 1 leads to 1.
CompileTime Scheduling of Dataflow Program graphs with Dynamic Constructs
 University of California, Berkeley
, 1992
"... by ..."
Power Optimization in Programmable Processors and ASIC Implementations of Linear Systems: Transformationbased Approach
, 1995
"... Linear computations form an important type of computation that is widely used in video and image processing, DSP, control, communications, and many other applications. With the ongoing rapid proliferation of portable computation and communication, power minimization has been gaining importance as a ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Linear computations form an important type of computation that is widely used in video and image processing, DSP, control, communications, and many other applications. With the ongoing rapid proliferation of portable computation and communication, power minimization has been gaining importance as a crucial design metric. However, while approaches for optimal optimization of throughput and joint optimization of latency and throughput in linear computations are available, until now no approach has been proposed which efficiently optimizes power. We introduce two approaches for power minimization in linear computations using transformations. First we show how unfolding combined with the procedure for maximally fast implementation of linear computations reduces power in single processor and multiprocessor implementations by factors 2.7 and 15.6 respectively, the former with no hardware penalty. For the custom ASIC implementation even higher improvements are achievable using the second tran...
Dynamic Algorithm Transformations (DAT)  A Systematic Approach to LowPower Reconfigurable Signal Processing
 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS
, 1999
"... In this paper, dynamic algorithm transformations (DAT's) for designing lowpower reconfigurable signalprocessing systems are presented. These transformations minimize energy dissipation while maintaining a specified level of mean squared error or signaltonoise ratio. This is achieved by modeling ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
In this paper, dynamic algorithm transformations (DAT's) for designing lowpower reconfigurable signalprocessing systems are presented. These transformations minimize energy dissipation while maintaining a specified level of mean squared error or signaltonoise ratio. This is achieved by modeling the nonstationarities in the input as temporal/spatial transitions between states in the input statespace. The reconfigurable hardware fabric is characterized by its configuration statespace. The configurable parameters are taken to be the filter taps, coefficient and data precisions, and supply voltage Vdd . An energyoptimal reconfiguration strategy is derived as a mapping from the input to the configuration statespace. In this strategy, taps are powered down starting with the tap with the smallest value of [w 2 k =Em(wk )] (where wk and Em(wk ) are, respectively, the coefficient and energy dissipation of the kth tap). Optimal values for precisions and supply voltage Vdd are subsequently computed from the roundoff error and critical path delay requirements, respectively. The DATbased adaptive filter is employed as a nearend crosstalk (NEXT) canceller in a 155.52Mb/s asynchronous transfer modelocal area network transceiver over category3 wiring. Simulation results indicate that the energy savings range from 02% to 87% as the cable length varies from 110 to 40 m, respectively, with an average savings of 69%. An average savings of 62% is achieved for the case where the supply voltage Vdd is kept fixed.
HighSpeed Recursive Digital Filters Based on the FrequencyResponse Masking Approach”, submitted to
 IEEE Trans. on Circuits and SystemsII: Analog and Digital Signal Processing
"... Highspeed recursive digital filters are of interest for applications focusing on highspeed as well as low power consumption because excess speed can be traded for low power consumption through the use of power supply voltage scaling techniques. This paper gives an overview of highspeed recursive ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
Highspeed recursive digital filters are of interest for applications focusing on highspeed as well as low power consumption because excess speed can be traded for low power consumption through the use of power supply voltage scaling techniques. This paper gives an overview of highspeed recursive digital filters based on frequency masking techniques. 1
HighRate Viterbi Processor: A Systolic Array Solution
 IEEE J. SELECTED AREAS COMMS
, 1990
"... In exploiting the potentials of highly parallel architectures to speed up the computation rate of systems enabled by VLSI, special attention has to be paid to designing algorithms such that they can be mapped onto parallel hardware. The main part of the Viterbi algorithm (VA) is a nonlinear feedback ..."
Abstract

Cited by 10 (2 self)
 Add to MetaCart
In exploiting the potentials of highly parallel architectures to speed up the computation rate of systems enabled by VLSI, special attention has to be paid to designing algorithms such that they can be mapped onto parallel hardware. The main part of the Viterbi algorithm (VA) is a nonlinear feedback loop, the ACS recursion (addcompareselect recursion), which presents a bottleneck for highspeed implementations and cannot be circumvented by standard means. By identifying that the two operations of the loop form an algebraic structure called serniring, we show that the ACS recursion of the Viterbi algorithm can be written as a linear vector recursion. This allows us to employ the powerful techniques of parallel processing and pipelining, known for conventional linear systems, to achieve high throughput rates. Since the VA can be written as a linear vector recursion, it can be implemented by systolic arrays. For the class of shuffle exchange codes to be decoded by the Viterbi algorithm, hardware efficient codeoptimized arrays are presented. In addition, it is shown that carrysave arithmetic can be used for the operations of ACS recursion, allowing each wordlevel operation to be pipelined and carried out by an efficient bitlevel systolic array.