## The Scalability of FFT on Parallel Computers (1993)

Citations: | 78 - 17 self |

### BibTeX

@MISC{Gupta93thescalability,

author = {Anshul Gupta and Vipin Kumar},

title = {The Scalability of FFT on Parallel Computers},

year = {1993}

}

### OpenURL

### Abstract

### Citations

2564 |
h~ Design and Analysis of Computer Algorithms
- Hopcroft, Ullman
- 1974
(Show Context)
Citation Context ...the ports on which it sends and receives can be different. 3 The FFT Algorithm Figure 1 outlines the serial Cooley-Tukey algorithm for an n point single dimensional unordered radix-2 FFT adapted from =-=[2, 36]-=-. X is the input vector of length n (n = 2 r for some integer r) and Y is its Fourier Transform. # k denotes the complex number e j 2# n k , where j = # -1. More generally, # is the primitive nth 4 1.... |

505 |
Introduction to Parallel Computing: Design and Analysis of Algorithms
- Kumar, Grama, et al.
- 1994
(Show Context)
Citation Context ...ectures [27, 26, 17, 48, 23, 49, 12, 44, 33]. In this paper, we analyze the scalability of the FFT algorithm on a few important architectures using the isoefficiency metric developed by Kumar and Rao =-=[25, 13]-=-. The isoefficiency function of a combination of a parallel algorithm and a parallel architecture relates the problem size to the number of processors necessary for an increase in speedup in proportio... |

343 |
Computational Frameworks for the Fast Fourier Transform (SIAM
- Loan
- 1992
(Show Context)
Citation Context ...everal important insights. On the hypercube architecture, a commonly used parallel formulation of the FFT algorithm (which we shall refer to as the binary-exchange algorithm in the rest of the paper) =-=[3, 4, 6, 11, 21, 32, 41, 36, 31]-=- can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in problem size. This is not surprising in the light of the fact that the FFT computation... |

257 | Reevaluate amdahl’s law
- Gustafson
- 1988
(Show Context)
Citation Context ...ttempts to simply extrapolate its performance based on that for a similar smaller system. Many different measures have been developed to study the scalability of parallel algorithms and architectures =-=[27, 26, 17, 48, 23, 49, 12, 44, 33]-=-. In this paper, we analyze the scalability of the FFT algorithm on a few important architectures using the isoefficiency metric developed by Kumar and Rao [25, 13]. The isoefficiency function of a co... |

234 |
Optimum Broadcasting and Personalized Communication in the Hypercube
- Johnsson, Ho
- 1989
(Show Context)
Citation Context ...n (known as all-to-all personalized communication) can be performed by executing the following code on each processor: for i = 1 to p do send data to processor number (self address #i) It is shown in =-=[20]-=-, that on a hypercube, in each iteration of the above code, each pair of communicating processors have a contention-free communication path. On a hypercube with store-and-forward routing, this communi... |

180 |
The Design and Analysis of Parallel Algorithms
- Akl
- 1989
(Show Context)
Citation Context ...everal important insights. On the hypercube architecture, a commonly used parallel formulation of the FFT algorithm (which we shall refer to as the binary-exchange algorithm in the rest of the paper) =-=[3, 4, 6, 11, 21, 32, 41, 36, 31]-=- can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in problem size. This is not surprising in the light of the fact that the FFT computation... |

149 |
Speedup Versus Efficiency in Parallel Systems
- Eager, Zahorjan, et al.
- 1989
(Show Context)
Citation Context ...ttempts to simply extrapolate its performance based on that for a similar smaller system. Many different measures have been developed to study the scalability of parallel algorithms and architectures =-=[27, 26, 17, 48, 23, 49, 12, 44, 33]-=-. In this paper, we analyze the scalability of the FFT algorithm on a few important architectures using the isoefficiency metric developed by Kumar and Rao [25, 13]. The isoefficiency function of a co... |

141 |
Fast Fourier Transformation and Convolution Algorithms, 2nd ed
- Nussbaumer
- 1982
(Show Context)
Citation Context ...choice in terms of both concurrency and communication overheads. 6 Impact of Variations of Cooley-Tukey Algorithm on Scalability Several schemes of computing the DFT have been suggested in literature =-=[34, 45, 37]-=- that involve fewer arithmetic operations on a serial computer than the simple Cooley-Tukey FFT algorithm. Notable among these are computing the single dimensional FFTs with radix greater than 2 and c... |

139 | FFTs in External or Hierarchical Memory
- Bailey
- 1990
(Show Context)
Citation Context ...lysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers =-=[4, 6, 11, 14, 21, 32, 41, 5]-=-. In this paper we analyze the scalability of the parallel FFT algorithm on mesh and hypercube connected multicomputers. We also present experimental performance results on a 1024-processor nCUBE1 T M... |

119 | Development of Parallel Methods for a 1024-Processor Hypercube - Gustafson, Montry, et al. - 1988 |

107 |
Advanced computer architecture: parallelism, scalability, programmability
- Hwang
- 1993
(Show Context)
Citation Context ...ary for an increase in speedup in proportion to the number of processors. Isoefficiency analysis has been found to be very useful in characterizing the scalability of a variety of parallel algorithms =-=[25, 16, 15, 19, 27, 28, 30, 38, 40, 47, 46, 42, 24]-=-. An important feature of isoefficiency analysis is that it succinctly captures the effects of characteristics of the parallel algorithm as well as the parallel architecture on which it is implemented... |

104 |
Designing Efficient Algorithms for Parallel Computers
- Quinn
- 1987
(Show Context)
Citation Context ...everal important insights. On the hypercube architecture, a commonly used parallel formulation of the FFT algorithm (which we shall refer to as the binary-exchange algorithm in the rest of the paper) =-=[3, 4, 6, 11, 21, 32, 41, 36, 31]-=- can obtain linearly increasing speedup with respect to the number of processors with only a moderate increase in problem size. This is not surprising in the light of the fact that the FFT computation... |

98 |
Communication complexity of PRAMs
- Aggarwal, Chandra, et al.
- 1990
(Show Context)
Citation Context ...ommunication. Following the methodology in our paper, these expressions can be used to compute the scalability of FFT on shared memory systems for various mappings of data. Chandra, Snir and Aggarwal =-=[1]-=- analyze the performance of FFT and other algorithms on LPRAM - a new model for parallel computation. This model differs from the standard PRAM model as the remote accesses are more expensive than loc... |

92 | Analyzing scalability of parallel algorithms and architectures - Kumar, Gupta - 1994 |

89 |
A VLSI Architecture for Concurrent Data Structures
- Dally
- 1986
(Show Context)
Citation Context ...ined as O( # p), where p is the number of processors on the mesh, then the scalability can be improved considerably. Addition of features such as cut-through-routing (also known as worm-hole routing) =-=[9]-=- to the mesh architecture improve the scalability of several parallel algorithms; e.g., see [30] . But these features do not improve the overall scalability characteristics of the FFT algorithm on thi... |

75 |
Isoefficiency: measuring the scalability of parallel algorithms and architectures. IEEE parallel and distributed technology: systems and applications
- Grama, Gupta, et al.
- 1993
(Show Context)
Citation Context ...ectures [27, 26, 17, 48, 23, 49, 12, 44, 33]. In this paper, we analyze the scalability of the FFT algorithm on a few important architectures using the isoefficiency metric developed by Kumar and Rao =-=[25, 13]-=-. The isoefficiency function of a combination of a parallel algorithm and a parallel architecture relates the problem size to the number of processors necessary for an increase in speedup in proportio... |

69 |
Multiprocessor FFTs
- Swarztrauber
- 1987
(Show Context)
Citation Context ...lysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers =-=[4, 6, 11, 14, 21, 32, 41, 5]-=-. In this paper we analyze the scalability of the parallel FFT algorithm on mesh and hypercube connected multicomputers. We also present experimental performance results on a 1024-processor nCUBE1 T M... |

65 |
Measuring parallel processor performance
- Karp, Flatt
- 1990
(Show Context)
Citation Context ...ttempts to simply extrapolate its performance based on that for a similar smaller system. Many different measures have been developed to study the scalability of parallel algorithms and architectures =-=[27, 26, 17, 48, 23, 49, 12, 44, 33]-=-. In this paper, we analyze the scalability of the FFT algorithm on a few important architectures using the isoefficiency metric developed by Kumar and Rao [25, 13]. The isoefficiency function of a co... |

64 |
The indirect binary n-cube microprocessorarray
- Pease
- 1977
(Show Context)
Citation Context ...espect to the number of processors with only a moderate increase in problem size. This is not surprising in the light of the fact that the FFT computation maps naturally to the hypercube architecture =-=[35]-=-. However, there is a limit on the achievable efficiency which is determined by the ratio of CPU speed and communication bandwidth of the hypercube channels. This limit can be raised by increasing the... |

48 | Parallel depth-first search, part II: Analysis
- Kumar, Rao
- 1987
(Show Context)
Citation Context |

41 |
Sealable parallel formulations of depth-first search
- Kumar, Rao
- 1990
(Show Context)
Citation Context ...becomes quite bad. This extreme sensitivity of the isoefficiency function to hardware related constants is rather unique to this algorithm. In many other parallel algorithms (e.g., depth-first search =-=[29]-=-), the hardware dependent constants such the CPU speed and communication bandwidth appear only as multiplicative factors in the isoefficiency function. Hence, if the CPU speed goes up by a factor of 1... |

37 |
Wire-efficient VLSI Multiprocessor Communication Networks
- Dally
- 1987
(Show Context)
Citation Context ...putation will be 8p # p and p log p, respectively. However, if the cost of the network is considered to be a function of the bisection width of the network, as may be the case in VLSI implementations =-=[10]-=-, then the picture improves for the mesh. The bisection widths of a hypercube and a mesh containing p processors each are p 2 and # p respectively. In order to match the performance of the mesh with t... |

36 | Scalability of Parallel Algorithms for the All-Pairs Shortest Path Problem
- Kumar, Singh
- 1991
(Show Context)
Citation Context ...ary for an increase in speedup in proportion to the number of processors. Isoefficiency analysis has been found to be very useful in characterizing the scalability of a variety of parallel algorithms =-=[25, 16, 15, 19, 27, 28, 30, 38, 40, 47, 46, 42, 24]-=-. An important feature of isoefficiency analysis is that it succinctly captures the effects of characteristics of the parallel algorithm as well as the parallel architecture on which it is implemented... |

33 | Multiprocessor FFTs,” Parallel Computing 5 - Swarztrauber - 1987 |

29 |
Hypercube Algorithms for Image Processing and Pattern Recognition
- Ranka, Sahni
- 1990
(Show Context)
Citation Context ...ary for an increase in speedup in proportion to the number of processors. Isoefficiency analysis has been found to be very useful in characterizing the scalability of a variety of parallel algorithms =-=[25, 16, 15, 19, 27, 28, 30, 38, 40, 47, 46, 42, 24]-=-. An important feature of isoefficiency analysis is that it succinctly captures the effects of characteristics of the parallel algorithm as well as the parallel architecture on which it is implemented... |

28 | Performance and scalability of preconditioned conjugate gradient methods on the CM-5
- Gupta, Kumar, et al.
- 1993
(Show Context)
Citation Context |

25 |
The effect of time constraints on scaled speedup
- Worley
- 1990
(Show Context)
Citation Context |

24 |
Silberger A J. Parallelization and performance analysis of the Cooley-Tukey FFT algorithm for sharedmemory architectures
- Norton
(Show Context)
Citation Context ...lysis, solving Linear Partial Differential Equations, Convolution, Digital Signal Processing and Image Filtering, etc. Hence, there has been a great interest in implementing FFT on parallel computers =-=[4, 6, 11, 14, 21, 32, 41, 5]-=-. In this paper we analyze the scalability of the parallel FFT algorithm on mesh and hypercube connected multicomputers. We also present experimental performance results on a 1024-processor nCUBE1 T M... |

24 |
Fourier transform in VLSI
- Thompson
- 1983
(Show Context)
Citation Context ...constant efficiency; hence the algorithm is not very scalable on a simple mesh. Any different mapping of input vector X on the processors does not reduce the communication overhead. It has been shown =-=[43]-=- that in any mapping, there will be at least one iteration in which the pairs of processors that need to communicate will be at least # p 2 hops apart. Hence the expression for T o used in the above a... |

19 | Scalability of parallel sorting on mesh multicomputers
- Singh, Kumar, et al.
- 1991
(Show Context)
Citation Context |

18 |
A parallel FFT on an MIMD machine
- Averbuch, Gabber, et al.
- 1990
(Show Context)
Citation Context |

18 |
The scalability of Matrix Multiplication Algorithms on parallel computers
- Gupta, Kumar
(Show Context)
Citation Context |

17 |
Load balancing on the hypercube architecture
- Kumar, Ran
- 1989
(Show Context)
Citation Context |

16 |
Experimental application-driven architecture analysis of an SIMD/MIMD parallel processing system
- Bronson, Casavant, et al.
- 1990
(Show Context)
Citation Context ...ters and studying its performance. In the following, we briefly review the work of other authors who have studied the scalability of FFT and/or have tried to do performance prediction. Jamieson et al =-=[7]-=- describe an implementation of parallel FFT on the PASM parallel processing system which has a hypercube interconnect. They implement a single dimensional unordered FFT on 2 and 4 processor machines w... |

16 |
A new principle for fast Fourier transformation
- Rader, Brenner
- 1976
(Show Context)
Citation Context ...choice in terms of both concurrency and communication overheads. 6 Impact of Variations of Cooley-Tukey Algorithm on Scalability Several schemes of computing the DFT have been suggested in literature =-=[34, 45, 37]-=- that involve fewer arithmetic operations on a serial computer than the simple Cooley-Tukey FFT algorithm. Notable among these are computing the single dimensional FFTs with radix greater than 2 and c... |

14 |
Agarwal A.: Scalability of Parallel Machines
- Nussbaum
- 1991
(Show Context)
Citation Context |

14 | Computing biconnected components on a hypercube
- Woo, Sahni
- 1991
(Show Context)
Citation Context |

13 |
Towards a general model for evaluating the relative performance of computer systems
- Van-Catledge
- 1989
(Show Context)
Citation Context |

10 |
Measuring the scalability of parallel computer systems
- Zorbas, Reble, et al.
- 1989
(Show Context)
Citation Context |

9 |
A radix 2 FFT on the connection machine
- Johnsson, Krawitz, et al.
- 1989
(Show Context)
Citation Context |

9 | Optimal granularity of grid iteration problems - Tang, Li - 1990 |

7 |
Performance analysis of the FFT algorithm on a shared-memory parallel architecture
- Cvetanovic
- 1987
(Show Context)
Citation Context ...ube is applicable to the PASM by equating t s to zero. Our analysis can provide more general performance predictions as it would be valid for any problem size and any number of processors. Cvetanovic =-=[8]-=- and Norton et al [32] give a rather comprehensive performance analysis of the FFT algorithm on pseudo-shared memory architectures such as IBM RP/3. They consider various mappings of data to memory bl... |

6 |
Implementing the discrete Fourier transform on a hypercube vector-parallel computer
- Desbat, Trystram
- 1989
(Show Context)
Citation Context |

6 |
Ichiyoshi Nobuyuki. Probabilistic analysis of the eciency of the dynamic load distribution
- Kimura
- 1991
(Show Context)
Citation Context |

5 |
The giant-Fourier-transform
- Bershader, Kraay, et al.
- 1989
(Show Context)
Citation Context |

4 | A parallel FFT on an MIMD machine, Parallel Computing 15 - Averbuch, Gabber, et al. - 1990 |

3 |
Fast Fourier transform algorithm design and tradeoffs
- Kamin, Adams
- 1988
(Show Context)
Citation Context ... by doing the scalability analysis along the lines of Section 4. Parallel FFT algorithms and their implementation and experimental evaluation on various architectures has been pursued by many authors =-=[21, 4, 41, 6, 11, 22, 5]-=-. In most of these cases, analysis of scalability along the lines of this paper can predict the performance for larger number of processors as well as for different problem sizes. 10 Concluding Remark... |

3 |
A new method for computing DFT
- Winograd
- 1977
(Show Context)
Citation Context ...choice in terms of both concurrency and communication overheads. 6 Impact of Variations of Cooley-Tukey Algorithm on Scalability Several schemes of computing the DFT have been suggested in literature =-=[34, 45, 37]-=- that involve fewer arithmetic operations on a serial computer than the simple Cooley-Tukey FFT algorithm. Notable among these are computing the single dimensional FFTs with radix greater than 2 and c... |

2 | Reevaluating Amdahl's Law. Communications of the ACM - Gustafson - 1988 |

1 |
Hypercubecomputing: Connectedcomponents. Journal of Supercomputing,1991. Also available as TR 88-50 from the
- JinwoonWoo, Sahni
(Show Context)
Citation Context |