Results 1  10
of
22
RealTime Algorithms And Architectures For Multiuser Channel Estimation And Detection In Wireless BaseStation Receivers
 in Wireless Basestation Receivers,” Submitted to IEEE Journal in Selected Areas in Communication (JSAC
, 2002
"... This paper presents efficient algorithms and architecture designs that can meet realtime requirements of multiuser channel estimation and detection in future wireless basestation receivers. Sophisticated algorithms proposed to implement multiuser channel estimation and detection make their realti ..."
Abstract

Cited by 25 (14 self)
 Add to MetaCart
This paper presents efficient algorithms and architecture designs that can meet realtime requirements of multiuser channel estimation and detection in future wireless basestation receivers. Sophisticated algorithms proposed to implement multiuser channel estimation and detection make their realtime implementation difficult on current Digital Signal Processor (DSP)based receivers. A maximumlikelihood based multiuser channel estimation scheme requiring matrix inversions is redesigned from an implementation perspective for a reduced complexity, iterative scheme with a simple fixedpoint VLSI architecture. A reducedcomplexity, bitstreaming multiuser detection algorithm that avoids the need for multishot detection is also developed for a simple, pipelined VLSI architecture. Thus, we show that realtime solutions, with 34 orders of magnitude performance improvements over DSPs, can be achieved for next generation wireless systems by (1) designing the algorithms from an implementation ...
Reduced Power Dissipation Through Truncated Multiplication
 in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction Highspeed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
Efficient VLSI Architectures for Baseband Signal Processing in Wireless BaseStation Receivers
 in 12 th IEEE International Conference on Applicationspecific Systems, Architectures and Processors (ASAP
, 2000
"... A realtime VLSI architecture is designed for Multiuser Channel Estimation, one of the core baseband processing operations in wireless basestation receivers. Future wireless basestation receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Several ..."
Abstract

Cited by 8 (7 self)
 Add to MetaCart
A realtime VLSI architecture is designed for Multiuser Channel Estimation, one of the core baseband processing operations in wireless basestation receivers. Future wireless basestation receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Several features in these algorithms that can help meet realtime requirements are not utilized effectively in DSPs. These features, such as bit level arithmetic and parallel structure, can be revealed and well exploited by task partitioning the algorithms. We modify the channel estimation algorithm for a reduced complexity fixedpoint hardware implementation. We show the complexity and hardware required for three different areatime tradeoffs': an areaconstrained, a timeconstrained and an areatime efficient architecture. The areaconstrained architecture achieves low data rates with minimum hardware, which may be used in 'picocell' basestations. The timeconstrained solution exploits the entire available parallelism and determines the maximum theoretical data rates. The areatime efficient architecture meets realtime requirements with minimum area overhead. The ordersofmagnitude difference between area and time constrained solutions reveals significant inherent parallelism in the algorithm. All VLSI solutions exhibit better time performance than a previous DSP implementation.
Automated leastsignificant bit datapath optimization for FPGAs
 In IEEE Symposium on FieldProgrammable Custom Computing Machines
, 2004
"... Abstract — In this paper we present a method for FPGA datapath precision optimization subject to userdefined area and error constraints. This work builds upon our previous research [1] which presented a methodology for optimizing for dynamic range—the most significant bit position. In this work, we ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Abstract — In this paper we present a method for FPGA datapath precision optimization subject to userdefined area and error constraints. This work builds upon our previous research [1] which presented a methodology for optimizing for dynamic range—the most significant bit position. In this work, we present an automated optimization technique for the leastsignificant bit position of circuit datapaths. We present results describing the effectiveness of our methods on typical signal and image processing kernels. I.
Design Tradeoffs Using Truncated Multipliers in Fir Filter Implementations
, 2002
"... This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is indep ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is independent of the number of unformed columns is presented, as well as equations describing the signaltonoise ratio of the truncation error. A software tool written in Java is described that automatically generates structural VHDL models for specific filters based on this architecture, given parameters such as the number of taps, operand lengths, number of multipliers, and the number of truncated columns. We show that a 22.5% reduction in area can be achieved for a 24tap filter with 16bit coe#cients. The ratio of the average error to the full scale value is only 1.4 10 9 , with only an 8.4 dB reduction in SNR for this implementation.
VariableCorrection Truncated Floating Point Multipliers
 in Proceedings of the Thirty Fourth Asilomar Conference on Signals, Circuits and Systems
, 2000
"... About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an ef ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an efficient method for designing variablecorrection truncated floating point multipliers that produce results with a maximum error of less than one unit in the last place. With this method, several of the less significant columns of the significand multiplier and the rounding logic for floating point multiplication are eliminated. Technical areas: (13) DSP hardware, software, and coreware; (14) ASIC and FPGA algorithm/processor design. POC: Michael Schulte, 19 Memorial Dr. West, EECS Dept., Lehigh University, Bethlehem, PA 18015. Email: mschulte@eecs.lehigh.edu, Phone: (610) 7585036, FAX: (610) 7586279. Extended Abstract Most modern processors perform floating point operations accord...
Combined IEEE Compliant and Truncated Floating Point Multipliers for Reduced Power Dissipation
 IN IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD
, 2001
"... Truncated multiplication can be used to significantly reduce power dissipation for applications that do not require correctly rounded results. This paper presents a power efficient method for designing floating point multipliers that can perform either correctly rounded IEEE compliant multiplication ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Truncated multiplication can be used to significantly reduce power dissipation for applications that do not require correctly rounded results. This paper presents a power efficient method for designing floating point multipliers that can perform either correctly rounded IEEE compliant multiplication or truncated multiplication, based on an input control signal. Compared to conventional IEEE floating point multipliers, these multipliers require only a small amount of additional area and delay, yet provide a significant reduction in power dissipation for applications that do not require IEEE compliant results.
HighSpeed Inverse Square Roots
 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
, 1999
"... Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximat ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a highspeed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximation to the inverse square root. This is followed by a modified NewtonRaphson iteration, consisting of one square, one multiplycomplement, and one multiplyadd operation. The initial approximation and NewtonRaphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. Application of this method is illustrated through the design of an inverse square root unit for operands in the IEEE single precision format. An implementation of this unit with a 4layer metal, 2.5 Volt, 0.25 micron CMOS standard cell library has a cycle time of 6.7 ns, an area of 0.41 mm 2 , a latency of five cycles, and a throughput of one result per cycle. 1. Introduction Square roots a...
HighSpeed Reciprocal Approximations
 in Proceedings of the Thirty First Asilomar Conference on Signals, Circuits and Systems
, 1998
"... This paper presents a highspeed algorithm for computing reciprocal approximations. This algorithm uses two parallel table lookups and an addition to obtain an initial approximation to the reciprocal. This is followed by a modified NewtonRaphson iteration, consisting of one multiplycomplement and o ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper presents a highspeed algorithm for computing reciprocal approximations. This algorithm uses two parallel table lookups and an addition to obtain an initial approximation to the reciprocal. This is followed by a modified NewtonRaphson iteration, consisting of one multiplycomplement and one multiplyadd operation to obtain a more accurate reciprocal approximation. The initial approximation and NewtonRaphson iteration use specialized hardware to reduce the area requirements. Application of this algorithm is illustrated through the design of a reciprocal approximation unit that has a latency of three cycles and a peak throughput of one result per cycle for operands in the IEEE single precision format. 1. Introduction Reciprocal approximations and division are important for several applications in digital signal and image processing, computer graphics, and scientific computing [1]  [3]. Most algorithms for performing these operations, however, have long latencies or large a...
Baseband Architecture Design for Future Wireless BaseStation Receivers
 RICE UNIVERSITY
, 2000
"... ..."