Results 11  20
of
58
Complex division with prescaling of operands
 PROCEEDINGS OF ASAP’03: 14TH IEEE CONFERENCE ON APPLICATIONSPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS
, 2003
"... We adapt the radixr digitrecurrence division algorithm to complex division. By prescaling the operands, we make the selection of quotient digits simple. This leads to a simple hardware implementation, and allows correct rounding of complex quotient. To reduce large prescaling tables required for r ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
We adapt the radixr digitrecurrence division algorithm to complex division. By prescaling the operands, we make the selection of quotient digits simple. This leads to a simple hardware implementation, and allows correct rounding of complex quotient. To reduce large prescaling tables required for radices greater than 4, we adapt the bipartitetable method to multipleoperand functions.
Complex Square Root with Operand Prescaling
 in "Journal of VLSI Signal Processing
, 2006
"... prescaling. We propose a radixr digitrecurrence algorithm for complex squareroot. The operand is prescaled to allow the selection of squareroot digits by rounding of the residual. This leads to a simple hardware implementation. Moreover, the use of digit recurrence approach allows correct roundin ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
prescaling. We propose a radixr digitrecurrence algorithm for complex squareroot. The operand is prescaled to allow the selection of squareroot digits by rounding of the residual. This leads to a simple hardware implementation. Moreover, the use of digit recurrence approach allows correct rounding of the result. The algorithm, compatible with the complex division, and its design are described at a highlevel. We also give rough comparisons of its latency and cost with respect to implementation based on standard floatingpoint instructions as used in software routines for complex square root. 1
Implementation of near Shannon Limit errorcorrecting codes using reconfigurable hardware
 Proc. IEEE Symp. on FieldProg. Cust. Comput. Mach
, 2000
"... Abstract  Error correcting codes (ECCs) are widely used in digital communications. Recently, new types of ECCs have been proposed which permit errorfree data transmission over noisy channels at rates which approach the Shannon capacity. For wireless communication, these new codes allow more data t ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Abstract  Error correcting codes (ECCs) are widely used in digital communications. Recently, new types of ECCs have been proposed which permit errorfree data transmission over noisy channels at rates which approach the Shannon capacity. For wireless communication, these new codes allow more data to be carried in the same spectrum, lower transmission power, and higher data security and compression. One new type of ECC, referred to as \Turbo Codes," has received a lot of attention, but is computationally expensive to decode and di cult to realize in hardware. Low Density Parity Check Codes (LDPCs), another ECC, also provide near Shannon limit error correction ability. However, LDPCs use a decoding scheme which is much more amenable to hardware implementation. This paper will rst present an overview of these coding schemes, then discuss the issues involved in building an LDPC decoder using recon gurable hardware. We present a hypothetical LDPC implementation using a commercial FPGA, which will give an idea of future research issues and performance gains.
RNcoding of numbers: definition and some properties
 in "Proceedings of the 17th IMACS World Congress on Scientific Computation, Applied Mathematics and Simulation
, 2004
"... Abstract — We define RNcodings as radixsigned representations of numbers for which rounding to the nearest is always identical to truncation. After giving characterizations of such representations, we investigate some of their properties, and we suggest algorithms for conversion to and from these ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract — We define RNcodings as radixsigned representations of numbers for which rounding to the nearest is always identical to truncation. After giving characterizations of such representations, we investigate some of their properties, and we suggest algorithms for conversion to and from these codings.
The Setup for Triangle Rasterization
, 1996
"... Integrating the slope and setup calculations for triangles to the rasterizer offloads the host processor from intensive calculations and can significantly increase 3D system performance. The processing on the host is greatly reduced and much less data is passed from the host to the graphics subsyste ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Integrating the slope and setup calculations for triangles to the rasterizer offloads the host processor from intensive calculations and can significantly increase 3D system performance. The processing on the host is greatly reduced and much less data is passed from the host to the graphics subsystem. A setup architecture handling generalized triangle meshes and computing all necessary parameters for a highend raster pipeline to generate Gouraud shaded, texture and bumpmapped triangles is described and its benefits on the final bandwidth are shown. To efficiently compute the slopes and color gradients for each triangle, some implementation aspects on division and multiplication pipelines are discussed. The Setup for Triangle Rasterization Anders Kugler University of Tübingen  Computer Graphics Laboratory (1) (1) Universität Tübingen WilhelmSchickardInstitut für Informatik GraphischInteraktive Systeme Auf der Morgenstelle 10 D72076 Tübingen  Germany email: kugler@gris.unit...
2D DCT Using OnLine Arithmetic
 In International Conference on Acoustics, Speech, and Signal Processing (ICASSP
, 1995
"... We present a VLSI architecture for the evaluation of the (8x8)point 2D DCT with online arithmetic. The utilization of online arithmetic, in combination with an algorithm based on FCT and matrix multiplication, reduces the total hardware maintaining a data rate and a latency similar to approa ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We present a VLSI architecture for the evaluation of the (8x8)point 2D DCT with online arithmetic. The utilization of online arithmetic, in combination with an algorithm based on FCT and matrix multiplication, reduces the total hardware maintaining a data rate and a latency similar to approaches based on distributed or parallel arithmetic. The architecture has been integrated in a chip using a 1 CMOS technology, occupying an area of 56:7mm 2 . 1. INTRODUCTION The two dimensional Discrete Cosine Transform is considered an efficient technique for image compression and is being utilized as standard in several applications, including video compression, storing and transmission of still images (JPEG) and moving pictures (MPEG) and HDTV. Since direct implementation of the 2D DCT of an NxN real matrix is computationally intensive, it is usually implemented by means of the rowcolumn decomposition technique (separated 2D DCT), in which the N point 1D DCT of each column of...
Accelerating correctly rounded floatingpoint division when the divisor is known in advance
 IEEE Transactions on Computers
, 2004
"... optimization. We present techniques for accelerating the floatingpoint computation of x/y when y is known before x. The proposed algorithms are oriented towards architectures with available fusedMAC operations. The goal is to get exactly the same result as with usual division with rounding to near ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
optimization. We present techniques for accelerating the floatingpoint computation of x/y when y is known before x. The proposed algorithms are oriented towards architectures with available fusedMAC operations. The goal is to get exactly the same result as with usual division with rounding to nearest. These techniques can be used by compilers to accelerate some numerical programs without loss of accuracy. 1 Motivation of this research We wish to provide methods for accelerating floatingpoint divisions of the form x/y, when y is known before x, either at compiletime, or at run time. We assume that a fused multiplyaccumulator is available, and that division is done in software (this happens for instance on RS6000, PowerPC or Itanium architectures). The computed result must be the correctly rounded result. A naive approach consists in computing the reciprocal of y (with rounding to nearest), and then, once x is available, multiplying the obtained result by x. It is well known
Unified Mixed Radix 24 Redundant Cordic Processor
, 1996
"... We present a unified mixed radix CORDIC algorithm with carrysave arithmetic and constant scale factor. The pipelined architecture of the processor is determined by a unique sequence of microrotations for the two modes of operation (rotation and vectoring) in circular and hyperbolic coordinates. ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
We present a unified mixed radix CORDIC algorithm with carrysave arithmetic and constant scale factor. The pipelined architecture of the processor is determined by a unique sequence of microrotations for the two modes of operation (rotation and vectoring) in circular and hyperbolic coordinates. The combination of radix2 and radix4 microrotations allows us to reduce the latency and size of the pipeline significantly. The unified algorithm is based on the correcting microrotation method, which we have extended to the vectoring mode in hyperbolic coordinates. We have also generalized the use of radix4 microrotations to the two operation modes and coordinate systems. Index Terms: Unified CORDIC algorithm, redundant arithmetic, pipelined design, high speed processor. I INTRODUCTION CORDIC is an iterative algorithm for carrying out rotations using only addition and shift operations [7] [12] [13]. The basic iteration (microrotationextension) is [13] x i+1 = x i + moe i 2 ...
Multiprecision Division on an 8Bit Processor
 in Proc. 13th IEEE Symp. Computer Arithmetic, IEEE CS
, 1997
"... Small processors can be especially useful in massively parallel architectures. This paper considers multiprecision division algorithms on an 8bit processor (the Kestrel processor, currently in fabrication) that includes a small amount of memory and an 8bit multiplier. We evaluate several variation ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
Small processors can be especially useful in massively parallel architectures. This paper considers multiprecision division algorithms on an 8bit processor (the Kestrel processor, currently in fabrication) that includes a small amount of memory and an 8bit multiplier. We evaluate several variations of the NewtonRaphson reciprocal approximation methods for use with division. Our final singleprecision algorithm requires 41 cycles to divide two 24bit numbers to produce a 26bit result. The doubleprecision version requires 98 cycles to divide two 53bit numbers to produce a 55bit result. This low cycle count is the result of several techniques including lowprecision arithmetic, early introduction of dividends, and simple yet good initial reciprocal estimates. 1. Introduction This paper presents a study of division on an 8bit processor. It is motivated by the Kestrel architecture, an 8bit parallel processor tuned to sequence analysis [8]. The word size is a natural choice for seq...
Implementing Division and Other FloatingPoint Operations: A System Perspective
 Scientific Computing and Validated Numerics (Proceedings of SCAN'95
, 1995
"... this paper has attempted to clarify the important tradeoffs in implementing an FP divider in hardware. ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
this paper has attempted to clarify the important tradeoffs in implementing an FP divider in hardware.