## An Optimal Multiplication Algorithm on Reconfigurable Mesh (1997)

Venue: | IEEE Transactions on Parallel and Distributed Systems |

Citations: | 13 - 3 self |

### BibTeX

@INPROCEEDINGS{Jang97anoptimal,

author = {Ju-wook Jang and Heonchul Park and Viktor K. Prasanna},

title = {An Optimal Multiplication Algorithm on Reconfigurable Mesh},

booktitle = {IEEE Transactions on Parallel and Distributed Systems},

year = {1997},

pages = {384--391}

}

### OpenURL

### Abstract

An O(1) time algorithm to multiply two N-bit binary numbers using an N N bit-model of reconfigurable mesh is shown. It uses optimal mesh size and it improves previously known results for multiplication on the reconfigurable mesh. The result is obtained by using novel techniques for data representation and data movement and using multidimensional Rader Transform. The algorithm is extended to result in AT 2 optimality over 1 TN in a variant of the bit-model of VLSI. Index Terms---Integer multiplication, reconfigurable mesh, optimal algorithm, area-time trade off, VLSI architecture. ------------------------------ F ------------------------------ 1I NTRODUCTION HE reconfigurable mesh is a two-dimensional mesh of processors connected by reconfigurable buses [17]. Though the buses outside the Processing Elements (PEs) are fixed, the internal connection between the I/O ports of each PE can be reconfigured by individual PEs during the execution of algorithms. The reconfigurable mesh cap...

### Citations

85 | The Power of Reconfiguration
- Ben-Asher
- 1991
(Show Context)
Citation Context ...id. Parallel algorithms have been developed on the reconfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems =-=[3],-=- [31], and for sorting [3], [8], [15], [20], [25], [30]. In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmodel of reconfigurable mesh. Pre... |

71 | Polymorphic-Torus Network - Li, Maresca - 1989 |

66 |
The image understanding architecture
- Weems, Levitan, et al.
(Show Context)
Citation Context ...igurable meshes are being built. The Image Understanding Architecture (IUA) is a multilevel system designed for supporting research in real-time image understanding and knowledgebased computer vision =-=[32],-=- [33], [34]. The lowest level of the IUA is the CAAPP, a 512 ¥ 512 square grid of bit serial processors intended to perform low-level image processing. Each CAAPP processor is connected to its four n... |

57 |
Constant time algorithms for the transitive closure and some related graph problems on processor arrays with reconfigurable bus systems
- Wang, Chen
- 1990
(Show Context)
Citation Context ..., called the Gate Chip, provides connections between the four I/O ports and the DSP chip at the grid. Parallel algorithms have been developed on the reconfigurable mesh for graph problems [17], [18], =-=[29]-=-, for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems [3], [31], and for sorting [3], [8], [15], [20], [25], [30]. In this paper, we show that multiplication of... |

53 |
Meshes with reconfigurable buses
- Miller
- 1988
(Show Context)
Citation Context ...tion, reconfigurable mesh, optimal algorithm, area-time trade off, VLSI architecture. 1 INTRODUCTION T HE reconfigurable mesh is a two-dimensional mesh of processors connected by reconfigurable buses =-=[17]-=-. Though the buses outside the Processing Elements (PEs) are fixed, the internal connection between the I/O ports of each PE can be reconfigured by individual PEs during the execution of algorithms. T... |

51 |
An optimal sorting algorithm on reconfigurable mesh
- Jang, Prasanna
- 1995
(Show Context)
Citation Context ...n developed on the reconfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems [3], [31], and for sorting [3], =-=[8],-=- [15], [20], [25], [30]. In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmodel of reconfigurable mesh. Previously known result for constan... |

36 | A Fast Algorithm for Computing Histograms on a Reconfigurable Mesh - Jang, Park, et al. - 1992 |

35 | Introduction to the configurable, highly parallel computer - Snyder - 1982 |

34 |
Constant time sorting on a processor array with a reconfigurable bus system
- Wang, Chen, et al.
- 1990
(Show Context)
Citation Context ...nfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems [3], [31], and for sorting [3], [8], [15], [20], [25], =-=[30].-=- In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmodel of reconfigurable mesh. Previously known result for constant time multiplication on... |

28 | The area-time complexity of binary multiplication
- Brent, Kung
- 1981
(Show Context)
Citation Context ...d by adding the three 2N-bit numbers obtained by concatenation of C3l, C3l+1, C3l+2. This addition can be performed in O(1) time on an N ¥ N reconfigurable mesh. † 3.3 Area-Time Trade-off It is kno=-=wn [5] t-=-hat for the multiplication of two N-bit numbers, AT 1+a = W(N 1+a ), for 0 £ a £ 1, in the twodimensional bit-model of VLSI (details of the bit-model of VLSI can be found in [28]). In the bit-model ... |

23 | Meshes with multiple buses - Stout - 1986 |

22 |
Fourier transforms in VLSI
- Thompson
- 1983
(Show Context)
Citation Context ...rade-off It is known [5] that for the multiplication of two N-bit numbers, AT 1+a = W(N 1+a ), for 0 £ a £ 1, in the twodimensional bit-model of VLSI (details of the bit-model of VLSI can be found i=-=n [28]-=-). In the bit-model of reconfigurable mesh, each PE has O(1)-bit memory and O(1) area control unit which can store O(1) patterns of switch settings. In addition, the ALU in each PE occupies O(1) area.... |

21 |
Fast convolution using Fermat number transforms with applications to digital filtering
- Agarwal, Burrus
- 1974
(Show Context)
Citation Context ...the ring. If the result r from an arithmetic operation is greater than 2 B , r mod (2 B + 1) should be computed to generate a valid output. To avoid this extra modular computation, Agarwal and Burrus =-=[2]-=- limit their realization to B-bit arithmetic in which only those operands in the range of [0, 2 B - 1] are correctly represented. This involves some quantization error when 2 B occurs as an operand. I... |

18 |
Simplified Binary Arithmetic for the Fermat Number Transform
- Leibowitz, “A
- 1976
(Show Context)
Citation Context ...nds in the range of [0, 2 B - 1] are correctly represented. This involves some quantization error when 2 B occurs as an operand. In order to overcome this problem, we employ the diminished-1 notation =-=[12]. -=-By using this notation, we can perform modular arithmetic as simple as in [2] without any quantization error. In the diminished-1 notation, number i, 1 £ i £ 2 B , is represented by the binary repre... |

14 |
The Image Understanding Architecture and its programming environment
- Weems, Burrill
- 1991
(Show Context)
Citation Context ...hes are being built. The Image Understanding Architecture (IUA) is a multilevel system designed for supporting research in real-time image understanding and knowledgebased computer vision [32], [33], =-=[34].-=- The lowest level of the IUA is the CAAPP, a 512 ¥ 512 square grid of bit serial processors intended to perform low-level image processing. Each CAAPP processor is connected to its four nearest neigh... |

13 |
Discrete Convolutions via Mersenne Transforms
- Rader
- 1972
(Show Context)
Citation Context ...rform cyclic convolution. 3.1 Rader Transform It is known that RT, which is a transform in a finite field (and more generally in a ring), has cyclic convolution property and is without roundoff error =-=[23]. -=-The ring is closed under addition and multiplication modulo some integer Ft of the t 2 + 1 form 2 . Given input x(n), 0 £ n £ N - 1, where x(n) is an element in the ring, the RT given by: 16 N - nk ... |

12 | A Model of Computation for VLSI with Related Complexity Results - Chazelle, Monier - 1985 |

12 |
Sorting n Numbers on n × n Reconfigurable Meshes with Buses
- Nigam, Sahni
- 1992
(Show Context)
Citation Context ... on the reconfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems [3], [31], and for sorting [3], [8], [15], =-=[20],-=- [25], [30]. In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmodel of reconfigurable mesh. Previously known result for constant time multi... |

10 |
Fast one-dimensional digital convolution by multidimensional techniques
- Agarwal, Burrus
- 1974
(Show Context)
Citation Context ... of two D ¥ D ¥ K ¥ D arrays, where D = (2 d-1 P) 1/d . The 2 d-1 P numbers are the original P numbers padded with 0s to allow cyclic convolution along each of the d dimensions. For details, refer =-=to [1]-=-. For ease of explanation, we assume (2 d-1 P) 1/d to be an integer. Cyclic convolution along each dimension can be performed by (2 d-1 P) 1-1/d cyclic convolutions with each convolution performed on ... |

10 |
Reconfigurable Mesh Algorithms for Image
- Jenq, Sahni
- 1991
(Show Context)
Citation Context ...is model has been denoted as MRN in [20] and as LRN in [4]. The connection patterns allowed by MRN/LRN are shown in Fig. 8. In MRN, the number of possible connection patterns within each PE is 10. In =-=[10], -=-the RMESH is introduced, which doesn’t allow the {EW, NS}, {NE, SW}, and the {NW, SE} connections that are allowed in MRN. However, the RMESH allows {NEWS}, {NEW, S}, {NES, W}, {NWS, E}, and {N, EWS... |

10 | Reconfigurable mesh algorithms for the area and perimeter of image components
- Jenq, Sahni
- 1991
(Show Context)
Citation Context ...vides connections between the four I/O ports and the DSP chip at the grid. Parallel algorithms have been developed on the reconfigurable mesh for graph problems [17], [18], [29], for image processing =-=[11]-=-, [19], [18], for geometric problems [25], for arithmetic problems [3], [31], and for sorting [3], [8], [15], [20], [25], [30]. In this paper, we show that multiplication of two N bit numbers can be p... |

10 |
Area-Time Optimal VLSI Integer Multiplier with Minimum Computation Time
- Mehlhorn, Preparata
- 1983
(Show Context)
Citation Context ...d for multiplication is based on information transfer arguments and hence it is true in the model considered here. Known VLSI designs satisfying the AT 2 lower bound fall into the range log N £ T £ =-=N [16]. -=-In this section, Theorem 1 is extended to show a VLSI design which satisfies AT 2 optimality over 1 £ T £ N. THEOREM 2. There is a VLSI design to multiply two N-bit numbers satisfying AT 2 = O(N 2 )... |

6 |
An Efficient Convex Hull Computation on the Reconfigurable Mesh
- Reisis
- 1992
(Show Context)
Citation Context ...rts and the DSP chip at the grid. Parallel algorithms have been developed on the reconfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems =-=[25],-=- for arithmetic problems [3], [31], and for sorting [3], [8], [15], [20], [25], [30]. In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmode... |

3 |
Optimal Simulations in Reconfigurable Arrays
- Ben-Asher, Gordon, et al.
- 1992
(Show Context)
Citation Context ...n [3], the Reconfigurable Network (RN) model has been introduced. Several algorithms on this model have been shown under the mesh restriction. This model has been denoted as MRN in [20] and as LRN in =-=[4]. -=-The connection patterns allowed by MRN/LRN are shown in Fig. 8. In MRN, the number of possible connection patterns within each PE is 10. In [10], the RMESH is introduced, which doesn’t allow the {EW... |

3 | A mesh-connected area-time optimal VLSI multiplier of large integers - Preparata - 1983 |

3 |
Configurational Computation: A New Computation Method on
- Wang, Chen, et al.
- 1991
(Show Context)
Citation Context ...arallel algorithms have been developed on the reconfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems [3], =-=[31],-=- and for sorting [3], [8], [15], [20], [25], [30]. In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmodel of reconfigurable mesh. Previousl... |

2 |
A Reconfigurable Processor Array with Routing LSIs and General Purpose DSPs
- Levinson, Kuroda, et al.
- 1992
(Show Context)
Citation Context ...derstanding Benchmark has been performed on the IUA [35]. A 1/64th prototype of the system has been built at the Hughes Research Labs. Another reconfigurable mesh having 512 PEs has been built by NEC =-=[13]. This architectur-=-e consists of an array of processors, and a message passing network. Each processor consists of a DSP chip. The message passing net¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥¥ • J-w. Jang is with the Department... |

2 | Parallel Computations on Meshes with Static and Reconfigurable Buses - Reisis - 1989 |

1 | Immediate Parallel Solution of the Longest - Champion, Rothstein - 1987 |

1 |
Sorting in O(1) Time on an n ¥ n Reconfigurable Mesh,” Parallel Computing: From Theory to Sound Practice
- Lin, Olariu, et al.
- 1992
(Show Context)
Citation Context ...eloped on the reconfigurable mesh for graph problems [17], [18], [29], for image processing [11], [19], [18], for geometric problems [25], for arithmetic problems [3], [31], and for sorting [3], [8], =-=[15],-=- [20], [25], [30]. In this paper, we show that multiplication of two N bit numbers can be performed in O(1) time on an N ¥ N bitmodel of reconfigurable mesh. Previously known result for constant time... |

1 |
Area-Time Optimal VLSI Networks for Computing Integer Multiplication and Discrete Fourier Transform
- Preparata, Vuillemin
- 1981
(Show Context)
Citation Context ...tidimensional convolution. Choosing Rader transform at the expense of long word length frees us from storing twiddle factors in advance, which is needed in other designs for multiplication [5], [16], =-=[22]-=-. It is also shown that our algorithm can be simulated on other (restricted) reconfigurable mesh models without asymptotic increase in time or the number of PEs used. ACKNOWLEDGEMENTS We would like to... |

1 |
Ju-wook Jang received the BS degree in electronic engineering from Seoul National University in 1983, the MS degree in electrical engineering from the Korea Advanced Institute of Science and Technology in 1985, and the PhD degree in electrical engineering
- Weems
- 1988
(Show Context)
Citation Context ...s a coterie network for communication, which consists of a grid-shaped bus and a set of locally controllable switches. The DARPA Integrated Image Understanding Benchmark has been performed on the IUA =-=[35]-=-. A 1/64th prototype of the system has been built at the Hughes Research Labs. Another reconfigurable mesh having 512 PEs has been built by NEC [13]. This architecture consists of an array of processo... |