## Convolution on Splash 2 (1995)

Venue: | Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines |

Citations: | 26 - 3 self |

### BibTeX

@INPROCEEDINGS{Ratha95convolutionon,

author = {Nalini K. Ratha and Ani K. Jain and Diane T. Rover},

title = {Convolution on Splash 2},

booktitle = {Proceedings of IEEE Workshop on FPGAs for Custom Computing Machines},

year = {1995},

pages = {204--213},

publisher = {CS Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Convolution is a fundamental operation in many signal and image processing applications. Since the computation and communication pattern in a conuolu-tzon operation is regular, a number of specaal archttec-tures have been designed and implemented for this op-erator. The Von Neumann architectures cannot meet the real-time requirements of applications that use con-volution as an intermediate step. We combine the advantages of systolic algorithms with the low cost of developing application specific designs using field pro-grammable gate arrays (FPGAs) to buald a scalable convolver for use in computer vision systems. The performance of the systolic algorithm of Kung et al. [I] as compared theoretically and experimentally with many other convolution algorithms reported in the lit-erature. The implementation of a convolution opera-tion on Splash 2, an attached processor based on Xilinx 4010 FPGAs, is reported with impressive performance gains. 1

### Citations

229 |
Why systolic architectures
- Kung
- 1982
(Show Context)
Citation Context ...he algorithm is fairly straight forward and also scalable to higher dimensions using the 1-dimensional convolution algorithm as the building block. In his landmark paper "Why systolic architectur=-=es?" [22]-=-, Kung describes many convolution algorithms on systolic structures. Based on a general inner product computation, Kulkarni and Yen [3] propose a systolic algorithm for 1-dimensional and 2-dimensional... |

98 |
Splash 2
- Arnold, Buell, et al.
- 1992
(Show Context)
Citation Context ...5 conclusions and plans for future work are given. 2 Splash 2 Architecture The Splash 2 system consists of an array of Xilinx 4010 FPGAs, improving on the design of the Splash 1 based on Xilinx 3090s =-=[13]-=-. The Splash 2 system is connected to the Sun host through an interface board that extends the address and data buses. Theshost can read/write to memories and memory-mapped control registers of Splash... |

63 | Real-time image processing on a custom computing platform - Athanas, Abbott - 1995 |

37 | The Splash 2 Software Environment - Arnold - 1993 |

25 |
FPGA computing in a data parallel C
- Gokhale, Minnich
- 1993
(Show Context)
Citation Context ...h a set of routines callable by a C program. Using the host interface, the memory addressable by each PE can be initialized. Currently, efforts are being made to provide a C-like language, called dbC =-=[10]-=-, to program Splash 2 to keep the hardware architecture and communication issues transparent to end users. The programming model supported in dbC is that of a SIMD processor array with a host processo... |

13 |
Agganval, “Parallel 2-D convolution on a mesh connected array processor
- Lee, K
(Show Context)
Citation Context ... Krishnan [18] proposed an algorithm with the best time complexity of O(N 2 =K 2 + logN ). With a fixed number of PEs, their time complexity changes to O(k 2 logk + logN ). ffl Mesh: Many researchers =-=[15, 19, 2]-=- have proposed schemes for convolution on mesh connected architectures. Lee et al. [15] use computation along a Hamiltonian path ending at the center of the convolution mask, called the convolution pa... |

12 |
Parallel algorithms for image template matching on hypercube SIMD computers
- Fang, Li, et al.
- 1985
(Show Context)
Citation Context ...eger-valued General integer values Values are powers of 2 Figure 1: A taxonomy for convolution. Many approaches to perform convolution using special architectures have been reported in the literature =-=[1, 14, 19]-=-. A very simple distributed algorithm for convolution is to split the input image, possibly overlapping, into a set of smaller sub-images; the number of subimages is same as the number of processing e... |

11 |
On the communication complexity of generalized 2-D convolution on array processors
- Fang, Li, et al.
- 1989
(Show Context)
Citation Context ... Krishnan [18] proposed an algorithm with the best time complexity of O(N 2 =K 2 + logN ). With a fixed number of PEs, their time complexity changes to O(k 2 logk + logN ). ffl Mesh: Many researchers =-=[15, 19, 2]-=- have proposed schemes for convolution on mesh connected architectures. Lee et al. [15] use computation along a Hamiltonian path ending at the center of the convolution mask, called the convolution pa... |

10 |
A Two-Level Pipelined Systolic Array for Convolutions
- Kung, Ruane, et al.
- 1981
(Show Context)
Citation Context ...eger-valued General integer values Values are powers of 2 Figure 1: A taxonomy for convolution. Many approaches to perform convolution using special architectures have been reported in the literature =-=[1, 14, 19]-=-. A very simple distributed algorithm for convolution is to split the input image, possibly overlapping, into a set of smaller sub-images; the number of subimages is same as the number of processing e... |

10 | VHDL Programming on Splash 2 - Arnold, Buell - 1993 |

7 | et al., "Building and using a highly parallel programmable logic array - Gokhale - 1991 |

7 |
L o - D i m e n s i o n a l Convolution on a Pyramid Computer
- Ibarra, Pong, et al.
- 1987
(Show Context)
Citation Context ...heir algorithm by an order of magnitude. ffl Pyramid: Pyramid architectures are useful in dealing with multi-resolution images. An O(k 2 + logN) time complexity algorithm is described by Chang et al. =-=[17]-=-. ffl VLSI/ASIC: Most of the approaches recommend the suitability of their algorithm for a VLSI implementation. Chakrabarti and Jaja use a linear array of processors in their algorithm [23]. In [3] co... |

6 |
Decomposition methods for convolution operators
- Manseur, Wilson
- 1991
(Show Context)
Citation Context ...he mask size increases. We are exploring designing a fixed point multiplier on the chip to provide us with same performance as the integer masks and also have good scalability with mask size. Manseur =-=[16]-=- has proposed a scheme to decompose large sized convolution masks to a set of 3 \Theta 3 masks. We will explore this technique further to experiment with convolution of large sized masks. Because of l... |

6 |
Why linear arrays are better image processors
- Jonker
- 1994
(Show Context)
Citation Context ...idth. Most of the algorithms proposed on special architectures assume that data are already available on the PEs. This, in a way, avoids the I/O bandwidth problem of the convolution operation. Jonker =-=[20]-=- argues that linear arrays are better for image processing algorithms. A linear array of PEs operating in a systolic mode offers two advantages: (i) systolic arrays can balance I/O with computations, ... |

5 |
E cient parallel algorithm for image template matching on hypercube SIMD machines
- Kumar, Krishnan
- 1989
(Show Context)
Citation Context ...l. [14] have described an O(k 2 =p 2 + klog(N=p) + logNslogp) algorithm, where 1spsk, and using N 2 k 2 PEs and an O(N 2 M 2 =L 2 ) algorithm using L 2 PEs. Using N 2 PEs, Prasanna Kumar and Krishnan =-=[18]-=- proposed an algorithm with the best time complexity of O(N 2 =K 2 + logN ). With a fixed number of PEs, their time complexity changes to O(k 2 logk + logN ). ffl Mesh: Many researchers [15, 19, 2] ha... |

5 |
Convolution on Mesh Connected Multicomputers
- Ranka, Sanhi
- 1990
(Show Context)
Citation Context ...eger-valued General integer values Values are powers of 2 Figure 1: A taxonomy for convolution. Many approaches to perform convolution using special architectures have been reported in the literature =-=[1, 14, 19]-=-. A very simple distributed algorithm for convolution is to split the input image, possibly overlapping, into a set of smaller sub-images; the number of subimages is same as the number of processing e... |

5 |
Low level image processing operators on FPGA: Implementation examples and performance evaluation
- Barros, Akil
- 1994
(Show Context)
Citation Context ... computing structures. Recent advances in hardware technology, enabling more logic blocks in FPGAs, make them more suitable for many complex applications needing more logic. Recently, Barros and Akil =-=[21]-=- have used FPGAs for low-level image processing. We use Splash 2, an attached processor on Sun SPARCstations based on Xilinx 4010 FPGA PEs. Our 2-dimensional convolution algorithm has been designed an... |

4 |
Systolic processing and implementation for signal and image
- Kulkarni, Len
- 1982
(Show Context)
Citation Context ...ding block. In his landmark paper "Why systolic architectures?" [22], Kung describes many convolution algorithms on systolic structures. Based on a general inner product computation, Kulkarn=-=i and Yen [3]-=- propose a systolic algorithm for 1-dimensional and 2-dimensional convolution. for i = 1 to N do for j = 1 to N do for u = -k/2 to k/2 do for v= -k/2 to k/2 do sum = sum+image[i+u][j+v] \Theta mask[u]... |

4 | Convolution with separable masks for early image processing - Wiejak, Buxton, et al. - 1985 |

4 |
Low level vision processing on connection machine CM-5
- Prasanna, Wang, et al.
- 1993
(Show Context)
Citation Context ...hm (Figure 3) running on different Sun host machines has been timed. In addition, timing on a recently developed i-860 based system from Alacron is reported. The timing results on CM-5 are taken from =-=[16]-=- and are for edge detection using a set of six 5 x 5 convolution masks. For implementing the convolution algorithm on Splash 2, we have chosen three standard edge detectors used in low-level computer ... |

3 |
An efficient VLSI architecture for template matching based on moment preserving pattern matching
- Ranganathan, Venugopal
- 1994
(Show Context)
Citation Context ...rray of processors in their algorithm [23]. In [3] convolution is viewed as a generalized inner product and a VLSI implementation for 2-dimensional convolution is described. Ranganathan and Venugopal =-=[11]-=- have described a VLSI architecture for template matching using k 2 PEs and they achieve a time complexity of O(N 2 =2 +K 2 ). In this paper, we describe a convolution algorithm suitable for implement... |

3 |
VLSI architectures for template matching and block matching
- Chakrabarti, Jaja
- 1991
(Show Context)
Citation Context ...ang et al. [17]. ffl VLSI/ASIC: Most of the approaches recommend the suitability of their algorithm for a VLSI implementation. Chakrabarti and Jaja use a linear array of processors in their algorithm =-=[23]-=-. In [3] convolution is viewed as a generalized inner product and a VLSI implementation for 2-dimensional convolution is described. Ranganathan and Venugopal [11] have described a VLSI architecture fo... |

2 |
et al., "Splash 2
- Arnold
- 1992
(Show Context)
Citation Context ...n Section 5 conclusions and future work are given. 2 Splash 2 Architecture The Splash 2 system consists of an array of Xilinx 4010 FPGAs, improving on the design of the Splash 1 based on Xilinx 3090s =-=[7]-=-. Figure 4 shows a system-level view of the Splash 2 architecture (taken from [5]). The host is connected to the Splash 2 through an interface board that extends the address and data buses. The Sun ho... |

2 |
Addressing the computational needs of highspeed image processing with a custom computing machine
- Peterson, Athanas
(Show Context)
Citation Context ...timings for these edge detectors on Splash 2 are shown in Table 3. Our approach for implementing convolution operation on Splash 2 is different in many ways from the approach taken by Peterson et al. =-=[17]-=-. The main differences are: (i) We are not limited by a fixed mask size of 8 x 8 as done in [17]. For smaller masks, Peterson et al. [17] have used the same 8 x 8 masks filled with zeros. This may be ... |

1 |
A Splash 2 Tutorial, Ver 1.2, SRC
- Buell
- 1993
(Show Context)
Citation Context ...oard that extends the address and data buses. The Sun host can read/write to memories and memory-mapped control registers of Splash 2 via these buses. A detailed description of the system is given in =-=[4, 5]-=-. We describe the major components of the Splash 2 system below. Each Splash 2 processing board has 16 Xilinx 4010s as PEs (X 1 \Gamma X 16 ) in addition to a seventeenth Xilinx 4010 (X 0 ) which cont... |

1 | Parallelization of computer vision algorithms on a reconfigurable multiprocessor - Bhandarkar, Arabnia - 1994 |

1 | The HIPS Image Processang Software - Software, York - 1993 |