• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Multiprocessor out-of-core FFTs with distributed memory and parallel disks (1997)

by Thomas H Cormen, Jake Wegmann, David M Nicol
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 11
Next 10 →

A Survey of Out-of-Core Algorithms in Numerical Linear Algebra

by Sivan Toledo - DIMACS SERIES IN DISCRETE MATHEMATICS AND THEORETICAL COMPUTER SCIENCE , 1999
"... This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks. The paper focuses on scheduling techniques that result in mostly sequential data acces ..."
Abstract - Cited by 44 (2 self) - Add to MetaCart
This paper surveys algorithms that efficiently solve linear equations or compute eigenvalues even when the matrices involved are too large to fit in the main memory of the computer and must be stored on disks. The paper focuses on scheduling techniques that result in mostly sequential data accesses and in data reuse, and on techniques for transforming algorithms that cannot be effectively scheduled. The survey covers out-of-core algorithms for solving dense systems of linear equations, for the direct and iterative solution of sparse systems, for computing eigenvalues, for fast Fourier transforms, and for N-body computations. The paper also discusses reasonable assumptions on memory size, approaches for the analysis of out-of-core algorithms, and relationships between out-of-core, cache-aware, and parallel algorithms.

ViC*: A compiler for virtual-memory C

by Alex Colvin, Thomas H. Cormen Y - In Proceedings of the Third International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS ’98 , 1998
"... This paper describes the functionality of ViC*, a compiler for a variant of the data-parallel language C * with support for out-of-core data. The compiler translates C * programs with shapes declared outofcore, whichdescribe parallel data stored on disk. The compiler output is a SPMD-style program i ..."
Abstract - Cited by 13 (3 self) - Add to MetaCart
This paper describes the functionality of ViC*, a compiler for a variant of the data-parallel language C * with support for out-of-core data. The compiler translates C * programs with shapes declared outofcore, whichdescribe parallel data stored on disk. The compiler output is a SPMD-style program in standard C with I/Oand library calls added to e ciently access out-of-core parallel data. The ViC * compiler also applies several program transformations to improve out-of-core data layout and access. 1

Out-of-Core FFTs with Parallel Disks

by Thomas H. Cormen, David M. Nicol - ACM SIGMETRICS Performance Evaluation Review , 1997
"... We examine approaches to computing the Fast Fourier Transform (FFT) when the data size exceeds the size of main memory. Analytical and experimental evidence shows that relying on native virtual memory with demand paging can yield extremely poor performance. We then present approaches based on minimi ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
We examine approaches to computing the Fast Fourier Transform (FFT) when the data size exceeds the size of main memory. Analytical and experimental evidence shows that relying on native virtual memory with demand paging can yield extremely poor performance. We then present approaches based on minimizing I/O costs with the Parallel Disk Model (PDM). Each of these approaches explicitly plans and performs disk accesses so as to minimize their number. 1 Introduction Although in most cases, Fast Fourier Transforms (FFTs) can be computed entirely in the main memory of a computer, in a few exceptional cases, the input vector is too large to fit. One application that uses very large FFTs is seismic analysis [2]; in one industrial application, an out-of-core one-dimensional FFT is necessary (as part of a higher dimensional FFT) even when the computer memory has 16 gigabytes of available RAM. Another application is in the area of radio astronomy. The High-Speed Data Acquisition and Very Large ...

Malleable Memory Mapping: User-Level Control of Memory Bounds for Effective Program Adaptation

by Dimitrios S. Nikolopoulos - In Proc. of the 17th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS’03 , 2003
"... This paper presents a user-level runtime system which provides memory malleability to programs running on non-dedicated computational environments. Memory malleability is analogous to processor malleability in the memory space, i.e. it lets a program shrink and expand its resident set size in respon ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
This paper presents a user-level runtime system which provides memory malleability to programs running on non-dedicated computational environments. Memory malleability is analogous to processor malleability in the memory space, i.e. it lets a program shrink and expand its resident set size in response to runtime events, without affecting the correct execution of the program. Malleability becomes relevant in the context of grid computing, where loosely coupled distributed programs assume to run on busy computational nodes with fluctuating CPU and memory loads. User-level malleable memory is proposed as a portable solution to obtain as much as possible out of the available memory of a computational node, without reverting to more drastic solutions such as job suspension or migration, and without causing the system to thrash. Malleable memory mapping is also a solution to cope with the unpredictable behavior of existing virtual memory management policies under oversized memory loads. The current prototype is simple but leaves plenty of room for application-independent or application-specific optimizations, compiler support and other extensions. Our performance evaluation is a proof of concept that grid programs with malleable memory can improve their performance by an order of magnitude as opposed to grid programs that let their memory being reclaimed and reallocated by the OS. 1

Determining an Out-of-Core FFT Decomposition Strategy for Parallel Disks by Dynamic Programming

by Thomas H. Cormen - ALGORITHMS FOR PARALLEL PROCESSING, VOLUME 105 OF IMA VOLUMES IN MATHEMATICS AND ITS APPLICATIONS , 1999
"... We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the al ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
We present an out-of-core FFT algorithm based on the in-core FFT method developed by Swarztrauber. Our algorithm uses a recursive divide-and-conquer strategy, and each stage in the recursion presents several possibilities for how to split the problem into subproblems. We give a recurrence for the algorithm's I/O complexity on the Parallel Disk Model and show how to use dynamic programming to determine optimal splits at each recursive stage. The algorithm to determine the optimal splits takes only \Theta(lg 2 N) time for an N-point FFT, and it is practical. The out-of-core FFT algorithm itself takes considerably longer.

Pc-opt: Optimal offline prefetching and caching for parallel i/o systems

by Mahesh Kallahalla, Peter J. Varman - IEEE TRANSACTIONS ON COMPUTERS , 2002
"... We address the problem of prefetching and caching in a parallel I/O system and present a new algorithm for parallel disk scheduling. Traditional buffer management algorithms that minimize the number of block misses are substantially suboptimal in a parallel I/O system where multiple I/Os can proceed ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
We address the problem of prefetching and caching in a parallel I/O system and present a new algorithm for parallel disk scheduling. Traditional buffer management algorithms that minimize the number of block misses are substantially suboptimal in a parallel I/O system where multiple I/Os can proceed simultaneously. We show that in the offline case, where a priori knowledge of all the requests is available, PC-OPT performs the minimum number of I/Os to service the given I/O requests. This is the first parallel I/O scheduling algorithm that is provably offline optimal in the parallel disk model. In the online case, we study the context of global L-block lookahead, which gives the buffer management algorithm a lookahead consisting of L distinct requests. We show that the competitive ratio of PC-OPT, with global L-block lookahead, is ðM L þ DÞ, when L M, and ðMD=LÞ, when L>M, where the number of disks is D and buffer size is M.

Blocking in Parallel Multisearch Problems (Extended Abstract)

by Wolfgang Dittrich, David Hutchinson, Anil Maheshwari , 1998
"... ) Wolfgang Dittrich Bosch Telecom GmbH, UC-ON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleto ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
) Wolfgang Dittrich Bosch Telecom GmbH, UC-ON/ERS Gerberstrae 33 71522 Backnang, Germany Wolfgang.Dittrich@bk.bosch.de David Hutchinson y School of Computer Science Carleton University Ottawa, Canada K1S 5B6 hutchins@scs.carleton.ca Anil Maheshwari z School of Computer Science Carleton University Ottawa, Canada K1S 5B6 maheshwa@scs.carleton.ca Abstract External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Block-wise access to data is a central theme in the design of efficient EM algorithms. A similar requirement arises in the transmission of data between processors in certain parallel computation models such as BSP* and CGM, for which block-wise communication is a crucial issue. We consider multisearch problems, where a large number of queries are to be simultaneously processed and satisfied by navigating through large data structures on parallel ...

Parallel Algorithms in External Memory

by David Alexander Hutchinson , 2000
"... External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. The Parallel Disk Model (PDM) of Vitter and Shriver is widely used to discriminate between external memory algorithms on the ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. The Parallel Disk Model (PDM) of Vitter and Shriver is widely used to discriminate between external memory algorithms on the basis of input/output (I/O) complexity. Parallel algorithms are designed to efficiently utilize the computing power of multiple processing units, interconnected by a communication mechanism. A popular model for developing and analyzing parallel algorithms is the Bulk Synchronous Parallel (BSP) model due to Valiant. In this work we develop simulation techniques, both randomized and deterministic, which produce efficient EM algorithms from efficient algorithms developed under BSPlike parallel computing models. Our techniques can accommodate one or multiple processors on the EM target machine, each with one or more disks, and they also adapt to the disk blocking factor of the target machine. ...

I/O in Parallel and Distributed Systems

by David Kotz
"... One is scientific computing with massive datasets, such as those found in seismic processing, climate modeling, and so forth [dC94]. The second is databases [DG92]. The I/O bottleneck continues to be a serious concern for scientific computing, particularly Grand Challenge problems, where it is now ..."
Abstract - Cited by 4 (0 self) - Add to MetaCart
One is scientific computing with massive datasets, such as those found in seismic processing, climate modeling, and so forth [dC94]. The second is databases [DG92]. The I/O bottleneck continues to be a serious concern for scientific computing, particularly Grand Challenge problems, where it is now commonly recognized as an obstacle. Many scientific applications generate 1 GB of I/O per run [dC94], and applications performing an order of magnitude more are not uncommon: applications in computational physics and fluid dynamics are projected to require I/O on the order of 1 TB [dC94]. It seems clear that these total I/O requirements will keep increasing as scientists continue to study phenomena at larger space and time scales, and at finer space and time resolutions. Since the response time that humans can tolerate for obtaining computational results--- no matter how comprehensive and detailed--- is always bounded, the I/O rates required will continue to increase also. Thus while curre

Multidimensional, Multiprocessor, Out-of-Core FFTs with Distributed Memory and Parallel Disks (Extended Abstract)

by Lauren M. Baptist, Thomas H. Cormen - In Proceedings of the Eleventh Annual ACM Symposium on Parallel Algorithms and Architectures , 1999
"... ) Lauren M. Baptist Thomas H. Cormen # {lmb, thc}@cs.dartmouth.edu Dartmouth College Department of Computer Science Abstract We show how to compute multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
) Lauren M. Baptist Thomas H. Cormen # {lmb, thc}@cs.dartmouth.edu Dartmouth College Department of Computer Science Abstract We show how to compute multidimensional Fast Fourier Transforms (FFTs) on a multiprocessor system with distributed memory when problem sizes are so large that the data do not fit in the memory of the entire system. Instead, data reside on a parallel disk system and are brought into memory in sections. We use the Parallel Disk Model for implementation and analysis. Our method is a straightforward out-of-core variant of a wellknown method for in-core, multidimensional FFTs. It performs 1-dimensional FFT computations on each dimension in turn. This method is easy to generalize to any number of dimensions, and it also readily permits the individual dimensions to be of any sizes that are integer powers of 2. The key step is an out-of-core transpose operation that places the data along each dimension into contiguous positions on the parallel disk system so that the...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University