Results 1 - 10
of
16
Real-Time Algorithms And Architectures For Multiuser Channel Estimation And Detection In Wireless Base-Station Receivers
- in Wireless Base-station Receivers,” Submitted to IEEE Journal in Selected Areas in Communication (JSAC
, 2002
"... This paper presents efficient algorithms and architecture designs that can meet real-time requirements of multiuser channel estimation and detection in future wireless base-station receivers. Sophisticated algorithms proposed to implement multiuser channel estimation and detection make their real-ti ..."
Abstract
-
Cited by 22 (12 self)
- Add to MetaCart
This paper presents efficient algorithms and architecture designs that can meet real-time requirements of multiuser channel estimation and detection in future wireless base-station receivers. Sophisticated algorithms proposed to implement multiuser channel estimation and detection make their real-time implementation difficult on current Digital Signal Processor (DSP)-based receivers. A maximum-likelihood based multiuser channel estimation scheme requiring matrix inversions is redesigned from an implementation perspective for a reduced complexity, iterative scheme with a simple fixed-point VLSI architecture. A reduced-complexity, bit-streaming multiuser detection algorithm that avoids the need for multishot detection is also developed for a simple, pipelined VLSI architecture. Thus, we show that real-time solutions, with 3-4 orders of magnitude performance improvements over DSPs, can be achieved for next generation wireless systems by (1) designing the algorithms from an implementation ...
Reduced Power Dissipation Through Truncated Multiplication
- in IEEE Alessandro Volta Memorial Workshop on Low Power Design
, 1999
"... Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be signi ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Reducing the power dissipation of parallel multipliers is important in the design of digital signal processing systems. In many of these systems, the products of parallel multipliers are rounded to avoid growth in word size. The power dissipation and area of rounded parallel multipliers can be significantly reduced by a technique known as truncated multiplication. With this technique, the least significant columns of the multiplication matrix are not used. Instead, the carries generated by these columns are estimated. This estimate is added with the most significant columns to produce the rounded product. This paper presents the design and implementation of parallel truncated multipliers. Simulations indicate that truncated parallel multipliers dissipate between 29 and 40 percent less power than standard parallel multipliers for operand sizes of 16 and 32 bits. 1: Introduction High-speed parallel multipliers are fundamental building blocks in digital signal processing systems [1]. In...
Efficient VLSI Architectures for Baseband Signal Processing in Wireless Base-Station Receivers
- in 12 th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP
, 2000
"... A real-time VLSI architecture is designed for Multiuser Channel Estimation, one of the core baseband processing operations in wireless base-station receivers. Future wireless base-station receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Several ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
A real-time VLSI architecture is designed for Multiuser Channel Estimation, one of the core baseband processing operations in wireless base-station receivers. Future wireless base-station receivers will need to use sophisticated algorithms to support extremely high data rates and multimedia. Several features in these algorithms that can help meet real-time requirements are not utilized effectively in DSPs. These features, such as bit level arithmetic and parallel structure, can be revealed and well exploited by task partitioning the algorithms. We modify the channel estimation algorithm for a reduced complexity fixed-point hardware implementation. We show the complexity and hardware required for three different area-time tradeoffs': an area-constrained, a timeconstrained and an area-time efficient architecture. The area-constrained architecture achieves low data rates with minimum hardware, which may be used in 'picocell' base-stations. The timeconstrained solution exploits the entire available parallelism and determines the maximum theoretical data rates. The area-time efficient architecture meets real-time requirements with minimum area overhead. The orders-of-magnitude difference between area and time constrained solutions reveals significant inherent parallelism in the algorithm. All VLSI solutions exhibit better time performance than a previous DSP implementation.
Variable-Correction Truncated Floating Point Multipliers
- in Proceedings of the Thirty Fourth Asilomar Conference on Signals, Circuits and Systems
, 2000
"... About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an ef ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
About half the hardware for floating point multipliers is needed only to guarantee correctly rounded results. For multimedia, graphics, and DSP systems, a significant reduction in area, delay, and power can be achieved by producing results that are not correctly rounded. This paper presents an efficient method for designing variable-correction truncated floating point multipliers that produce results with a maximum error of less than one unit in the last place. With this method, several of the less significant columns of the significand multiplier and the rounding logic for floating point multiplication are eliminated. Technical areas: (13) DSP hardware, software, and coreware; (14) ASIC and FPGA algorithm/processor design. POC: Michael Schulte, 19 Memorial Dr. West, EECS Dept., Lehigh University, Bethlehem, PA 18015. Email: mschulte@eecs.lehigh.edu, Phone: (610) 758-5036, FAX: (610) 758-6279. Extended Abstract Most modern processors perform floating point operations accord...
Design Tradeoffs Using Truncated Multipliers in Fir Filter Implementations
, 2002
"... This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is indep ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents a general FIR filter architecture utilizing truncated tree multipliers for computation. The average error, maximum error, and variance of error due to truncation are derived for the proposed architecture. A novel technique that reduces the average error of the filter and is independent of the number of unformed columns is presented, as well as equations describing the signal-to-noise ratio of the truncation error. A software tool written in Java is described that automatically generates structural VHDL models for specific filters based on this architecture, given parameters such as the number of taps, operand lengths, number of multipliers, and the number of truncated columns. We show that a 22.5% reduction in area can be achieved for a 24-tap filter with 16-bit coe#cients. The ratio of the average error to the full scale value is only 1.4 10 -9 , with only an 8.4 dB reduction in SNR for this implementation.
High-Speed Inverse Square Roots
- Proceedings of the 14th IEEE Symposium on Computer Arithmetic
, 1999
"... Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a high-speed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximat ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Inverse square roots are used in several digital signal processing, multimedia, and scientific computing applications. This paper presents a high-speed method for computing inverse square roots. This method uses a table lookup, operand modification, and multiplication to obtain an initial approximation to the inverse square root. This is followed by a modified Newton-Raphson iteration, consisting of one square, one multiply-complement, and one multiplyadd operation. The initial approximation and NewtonRaphson iteration employ specialized hardware to reduce the delay, area, and power dissipation. Application of this method is illustrated through the design of an inverse square root unit for operands in the IEEE single precision format. An implementation of this unit with a 4-layer metal, 2.5 Volt, 0.25 micron CMOS standard cell library has a cycle time of 6.7 ns, an area of 0.41 mm 2 , a latency of five cycles, and a throughput of one result per cycle. 1. Introduction Square roots a...
Combined IEEE Compliant and Truncated Floating Point Multipliers for Reduced Power Dissipation
- IN IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD
, 2001
"... Truncated multiplication can be used to significantly reduce power dissipation for applications that do not require correctly rounded results. This paper presents a power efficient method for designing floating point multipliers that can perform either correctly rounded IEEE compliant multiplication ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Truncated multiplication can be used to significantly reduce power dissipation for applications that do not require correctly rounded results. This paper presents a power efficient method for designing floating point multipliers that can perform either correctly rounded IEEE compliant multiplication or truncated multiplication, based on an input control signal. Compared to conventional IEEE floating point multipliers, these multipliers require only a small amount of additional area and delay, yet provide a significant reduction in power dissipation for applications that do not require IEEE compliant results.
High-Speed Reciprocal Approximations
- in Proceedings of the Thirty First Asilomar Conference on Signals, Circuits and Systems
, 1998
"... This paper presents a high-speed algorithm for computing reciprocal approximations. This algorithm uses two parallel table lookups and an addition to obtain an initial approximation to the reciprocal. This is followed by a modified Newton-Raphson iteration, consisting of one multiplycomplement and o ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents a high-speed algorithm for computing reciprocal approximations. This algorithm uses two parallel table lookups and an addition to obtain an initial approximation to the reciprocal. This is followed by a modified Newton-Raphson iteration, consisting of one multiplycomplement and one multiply-add operation to obtain a more accurate reciprocal approximation. The initial approximation and Newton-Raphson iteration use specialized hardware to reduce the area requirements. Application of this algorithm is illustrated through the design of a reciprocal approximation unit that has a latency of three cycles and a peak throughput of one result per cycle for operands in the IEEE single precision format. 1. Introduction Reciprocal approximations and division are important for several applications in digital signal and image processing, computer graphics, and scientific computing [1] - [3]. Most algorithms for performing these operations, however, have long latencies or large a...
Baseband Architecture Design for Future Wireless Base-Station Receivers
- RICE UNIVERSITY
, 2000
"... ..."
Automated least-significant bit datapath optimization for FPGAs
- In IEEE Symposium on Field-Programmable Custom Computing Machines
, 2004
"... Abstract — In this paper we present a method for FPGA datapath precision optimization subject to user-defined area and error constraints. This work builds upon our previous research [1] which presented a methodology for optimizing for dynamic range—the most significant bit position. In this work, we ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract — In this paper we present a method for FPGA datapath precision optimization subject to user-defined area and error constraints. This work builds upon our previous research [1] which presented a methodology for optimizing for dynamic range—the most significant bit position. In this work, we present an automated optimization technique for the least-significant bit position of circuit datapaths. We present results describing the effectiveness of our methods on typical signal and image processing kernels. I.

