• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Implementation of Parallel Graph Algorithms on a Massively Parallel SIMD Computer with Virtual Processing (1995)

by Tsan-sheng Hsu, Vijaya Ramachandran, Nathaniel Dean
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs)

by David A. Bader, Sukanya Sreshta, Nina R. Weisse-bernstein - Proc. 9th Int’l Conf. on High Performance Computing (HiPC 2002), volume 2552 of Lecture Notes in Computer Science , 2002
"... The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the res ..."
Abstract - Cited by 23 (8 self) - Add to MetaCart
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors and across the entire range of instance sizes tested. This linear speedup with the number of processors is one of the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is for evaluating arithmetic expression trees using the algorithmic techniques of list ranking and tree contraction; this problem is not only of interest in its own right, but is representativeof a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.

Parallel Implementation of Algorithms for Finding Connected Components in Graphs

by Tsan-Sheng Hsu, Vijaya Ramachandran, Nathaniel Dean , 1997
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract - Cited by 22 (1 self) - Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and fine-tuning techniques without expending too much effort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and fine-tuning techniques that we developed for the problem of finding connected components in parallel; many of the fine-tuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.

Using PRAM Algorithms on a Uniform-Memory-Access Shared-Memory Architecture

by David A. Bader, Ajith Illendula, Bernard M. E. Moret, Nina R. Weisse-bernstein - Proc. 5th Int’l Workshop on Algorithm Engineering (WAE 2001), volume 2141 of Lecture Notes in Computer Science , 2001
"... The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the res ..."
Abstract - Cited by 20 (11 self) - Add to MetaCart
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors (from 1 to 64) and across the entire range of instance sizes tested. This linear speedup with the number of processors is, to our knowledge, the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is a graph decomposition algorithm that also requires the computation of a spanning tree; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.

Graph Augmentation And Related Problems: Theory And Practice

by Tsan-Sheng Hsu , 1993
"... ..."
Abstract - Cited by 16 (2 self) - Add to MetaCart
Abstract not found

Connected Components Algorithms For Mesh-Connected Parallel Computers

by Steve Goddard, Subodh Kumar, Jan, Jan F. Prins - Parallel Algorithms: 3rd DIMACS Implementation Challenge October 17-19, 1994, volume 30 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science , 1995
"... . We present a new CREW PRAM algorithm for finding connected components. For a graph G with n vertices and m edges, algorithmA 0 requires at most O(logn) parallel steps and performs O((n+m) log n) work in the worst case. The advantage our algorithm has over others in the literature is that it can be ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
. We present a new CREW PRAM algorithm for finding connected components. For a graph G with n vertices and m edges, algorithmA 0 requires at most O(logn) parallel steps and performs O((n+m) log n) work in the worst case. The advantage our algorithm has over others in the literature is that it can be adapted to a 2-D mesh-connected communication model in which all CREW operations are replaced by O(logn) parallel row and column operations without increasing the time complexity. We present the mapping of A 0 to a mesh-connected computer and describe two implementations, A 1 and A 2 . Algorithm A 1 , which uses an adjacency matrix to represent the graph, performs O(n 2 log n) work. Hence, it only achieves work efficiency on dense graphs. The second implementation, A 2 , uses a sparse representation of the adjacency matrix and again performs O(logn) row and column operations but reduces the work to O((m + n) log n) on all graphs. We report MasPar MP-1 performance figures for implementati...

Language and library support for practical PRAM programming

by Christoph W. Keßler, Jesper Larsson Träff - 5 EUROMICO WORKSHOP ON PARALLEL AND DISTRIBUTED PROCESSING , 1997
"... We investigate the well-known PRAM model of parallel computation as a practical parallel programming model. The two components of this project are a general-purpose PRAM programming language called Fork95, and a library, called PAD, of efficient, basic parallel algorithms and data structures. We out ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
We investigate the well-known PRAM model of parallel computation as a practical parallel programming model. The two components of this project are a general-purpose PRAM programming language called Fork95, and a library, called PAD, of efficient, basic parallel algorithms and data structures. We outline the primary features of Fork95 as they apply to the implementation of PAD. We give a brief overview of PAD and sketch the implementation of library routines for prefix-sums and bucket sorting. Both language and library can be used with the SBPRAM, an emulation of the PRAM in hardware.

Runtime Synthesis of Parallel and High-Performance Computational Kernels

by Christopher Mueller
"... ..."
Abstract - Add to MetaCart
Abstract not found

A library of basic PRAM algorithms and its implementation in FORK

by Christoph W. Keßler, Jesper Larsson Träff - PROC 8TH SPAA , 1996
"... A library, called PAD, of basic parallel algorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM as a practical programming model, and to provide an organized collection of basic P ..."
Abstract - Add to MetaCart
A library, called PAD, of basic parallel algorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM as a practical programming model, and to provide an organized collection of basic PRAM algorithms for the SB-PRAM under completion at the University of Saarbrücken. We give a brief survey of Fork95, and describe the main components of PAD. Finally we report on the status of the language and library and discuss further developments.

Some Results on Ongoing Research on Parallel Implementation of Graph Algorithms

by Isabelle Guérin Lassous, Michel Morvan , 1997
"... In high performance computing, three recognized important points are usability, scalability and portability. No models seemed to satisfy these three steps till recently: a few proposed models try to fulfill the previous goals. Among them, the BSP-like CGM model seemed adapted to us to facilitate the ..."
Abstract - Add to MetaCart
In high performance computing, three recognized important points are usability, scalability and portability. No models seemed to satisfy these three steps till recently: a few proposed models try to fulfill the previous goals. Among them, the BSP-like CGM model seemed adapted to us to facilitate the way between algorithms design and real implementations. Many algorithms have been designed but few implementations have been carried out to demonstrate the practical relevance of this model. In this article, we propose to test this model actually on an irregular problem. We present the results of implementations of permutation graph algorithms written in two different models: the PRAM and the BSP-like CGM model. These implementation have been made on a CM5 and a PC cluster. We compare the results of these implementations with the performances of sequential code for this problem. With a classical problem in gaph theory, we validate BSP-like CGM model: it is possible to write portable code o...

Finding Connected Components in Graphs

by TSAN-SHENG HSU, VIJAYA RAMACHANDRAN, Nathaniel Dean , 1996
"... In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, ..."
Abstract - Add to MetaCart
In this paper, we describe our implementation of several parallel graph algorithms for finding connected components. Our implementation, with virtual processing, is on a 16,384-processor MasPar MP-1 using the language MPL. We present extensive test data on our code. In our previous projects [21, 22, 23], we reported the implementation of an extensible parallel graph algorithms library. We developed general implementation and ne-tuning techniques without expending too much e ort on optimizing each individual routine. We also handled the issue of implementing virtual processing. In this paper, we describe several algorithms and fine-tuning techniques that we developed for the problem of finding connected components in parallel; many of the fine-tuning techniques are of general interest, and should be applicable to code for other problems. We present data on the execution time and memory usage of our various implementations.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University