Results 11 
17 of
17
Parallel Canonical Recoding
 Electronics Letters
, 1996
"... We introduce a parallel algorithm for generating the canonical signeddigit expansion of an nbit number in O#log n# time using O#n# gates. The algorithm is similar to the computation of the carries in a carry lookahead circuit. We also prove that if the binary number x + bx=2c is given, then th ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We introduce a parallel algorithm for generating the canonical signeddigit expansion of an nbit number in O#log n# time using O#n# gates. The algorithm is similar to the computation of the carries in a carry lookahead circuit. We also prove that if the binary number x + bx=2c is given, then the canonical signeddigit recoding of x can be computed in O#1# time using O#n# gates. 1 Introduction Recoding techniques #Booth recoding, bitpair recoding, etc.# for sparse signeddigit representations of binary numbers have been e#ectively used in multiplication #3, 4# and exponentiation algorithms #2#. For example, the original Booth recoding technique #3, 4# scans the bits of the multiplier one bit at a time, and adds or subtracts the multiplicand to or from the partial product, depending on the value of the current bit and the previous bit. The modi#ed versions of the Booth algorithm scan the bits of the multiplier two bits or three bits at a time #4#. These techniques are equivalent ...
An Effective Load Balancing Policy for Geometric Decaying Algorithms
"... Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel alg ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Parallel algorithms are often first designed as a sequence of rounds, where each round includes any number of independent constant time operations. This socalled worktime presentation is then followed by a processor scheduling implementation ona more concrete computational model. Many parallel algorithms are geometricdecaying in the sense that the sequence of work loads is upper bounded by a decreasing geometric series. A standard scheduling implementation of such algorithms consists of a repeated application of load balancing. We present a more effective, yet as simple, policy for the utilization of load balancing in geometric decaying algorithms. By making a more careful choice of when and how often load balancing should be employed, and by using a simple amortization argument, we showthat the number of required applications of load balancing should be nearlyconstant. The policy is not restricted to any particular model of parallel computation, and, up to a constant factor, it is the best possible.
Parallel Prefix Computation with Few Processors
, 1992
"... We present a parallel prefix algorithm which uses... ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a parallel prefix algorithm which uses...
Fast Computation of Divided Differences and Parallel Hermite Interpolation
"... We present parallel algorithms for fast polynomial interpolation. These algorithms can be used for constructing and evaluating polynomials interpolating the function values and its derivatives of arbitrary order (Hermite interpolation). For interpolation, the parallel arithmetic complexity is O(log& ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present parallel algorithms for fast polynomial interpolation. These algorithms can be used for constructing and evaluating polynomials interpolating the function values and its derivatives of arbitrary order (Hermite interpolation). For interpolation, the parallel arithmetic complexity is O(log² M + log N) for large M and N...
Generalized Scans and TriDiagonal Systems
"... Motivated by the analysis of known parallel techniques for the solution of linear tridiagonal system, weintroduce generalized scans, a class of recursively de#ned lengthpreserving, sequencetosequence transformations that generalize the wellknown pre#x computations #scans#. Generalized scan functi ..."
Abstract
 Add to MetaCart
Motivated by the analysis of known parallel techniques for the solution of linear tridiagonal system, weintroduce generalized scans, a class of recursively de#ned lengthpreserving, sequencetosequence transformations that generalize the wellknown pre#x computations #scans#. Generalized scan functions are described in terms of three algorithmic phases, the reduction phase that saves data for the third or expansion phase and prepares data for the second phase which is a recursiveinvocation of the same function on one fewer variable. Both the reduction and expansion phases operate on bounded numberofvariables, a key feature for their parallelization. Generalized scans enjoya property, called here protoassociativity, that gives rise to ordinary associativity when generalized scans are specialized to ordinary scans. We show that the solution of positive de#nite block tridiagonal linear systems can be cast as a generalized scan, thereby shedding light on the underlying structure enabling k...
Optimal Parallel Prefix on Mesh Architectures
"... Algorithms for efficient implementation of computation of prefix products on meshconnected... ..."
Abstract
 Add to MetaCart
Algorithms for efficient implementation of computation of prefix products on meshconnected...
On Fast Computation of Continued Fractions
, 1991
"... We give an O(log n) algorithm to compute the nth convergent of a periodic continued fraction. The algorithm is based on matrix representation of continued fractions, due to MilneThomson. This approach also allows for the computation of first n convergents of a general continued fraction in O(log n) ..."
Abstract
 Add to MetaCart
We give an O(log n) algorithm to compute the nth convergent of a periodic continued fraction. The algorithm is based on matrix representation of continued fractions, due to MilneThomson. This approach also allows for the computation of first n convergents of a general continued fraction in O(log n) time using O(n/log n) processors.