Results 11  20
of
30
The Euler tour technique and parallel rooted spanning tree
 In Proc. Int’l Conf. on Parallel Processing (ICPP
, 2004
"... Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanni ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Many parallel algorithms for graph problems start with finding a spanning tree and rooting the tree to define some structural relationship on the vertices which can be used by following problem specific computations. The generic procedure is to find an unrooted spanning tree and then root the spanning tree using the Euler tour technique. With a randomized worktime optimal unrooted spanning tree algorithm and worktime optimal list ranking, finding rooted spanning trees can be done worktime optimally on EREW PRAM w.h.p. Yet the Euler tour technique assumes as “given ” a circular adjacency list, it is not without implications though to construct the circular adjacency list for the spanning tree found on the fly by a spanning tree algorithm. In fact our experiments show that this “hidden ” step of constructing a circular adjacency list could take as much time as both spanning tree and list ranking combined. In this paper we present new efficient algorithms that find rooted spanning trees without using the Euler tour technique and incur little or no overhead over the underlying spanning tree algorithms. We also present two new approaches that construct Euler tours efficiently when the circular adjacency list is not given. One is a deterministic PRAM algorithm and the other is a randomized algorithm in the symmetric multiprocessor (SMP) model. The randomized algorithm takes a novel approach for the problems of constructing the Euler tour and rooting a tree. It computes a rooted spanning tree first, then constructs an Euler tour directly for the tree using depthfirst traversal. The tour constructed is cachefriendly with adjacent edges in the
ConnectedComponents Algorithms For MeshConnected Parallel Computers
 PRESENTED AT THE 3RD DIMACS IMPLEMENTATION CHALLENGE WORKSHOP
, 1995
"... We present efficient parallel algorithms for finding the connected components of sparse and dense graphs using a meshconnected parallel computer. We start with a PRAM algorithm with work complexity O(n²log n). The algorithm performs O(logn) reduction and broadcast operations on within the rows an ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
We present efficient parallel algorithms for finding the connected components of sparse and dense graphs using a meshconnected parallel computer. We start with a PRAM algorithm with work complexity O(n²log n). The algorithm performs O(logn) reduction and broadcast operations on within the rows and columns of a mesh connected computer. Next, a representation of the adjacency matrix for a sparse graph with m edges is chosen that preserves the communication structure of the algorithm but improves the work bound to O((n + m)logn). This work bound can be improved to the optimal O(n +m) bound through the use of graph contraction. In architectures like the MasPar MP1 and MP2, parallel row and column operations of the form described achieve high performance relative to unrestricted concurrent accesses typically found in parallel connected component algorithms for sparse graphs and exhibit no locality dependence. We present MasPar MP1 performance figures for implementations of the a...
Lockfree parallel algorithms: An experimental study
 In Proceedings of the 11th International Conference High Performance Computing
, 2004
"... Abstract. Lockfree shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lockfree data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Lockfree shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lockfree data structures include increasing fault tolerance of a (possibly heterogeneous) system and getting rid of the problems associated with critical sections such as priority inversion and deadlock. For parallel computers with closelycoupled processors and shared memory, these issues are no longer major concerns. While many of the results are applicable especially when the model used is shared memory multiprocessors, no prior studies have considered improving the performance of a parallel implementation by way of lockfree programming. As a matter of fact, often times in practice lock free data structures in a distributed setting do not perform as well as those that use locks. As the data structures and algorithms for parallel computing are often drastically different from those in distributed computing, it is possible that lockfree programs perform better. In this paper we compare the similarity and difference of lockfree programming in both distributed and parallel computing environments and explore the possibility of adapting lockfree programming to parallel computing to improve performances. Lockfree programming also provides a new way of simulating PRAM and asynchronous PRAM algorithms on current parallel machines.
Parallel and External List Ranking and Connected Components
, 1999
"... Improved parallel, external and parallelexternal algorithms for listranking and computing the connected components of a graph are presented. These algorithms are implemented and tested on a cluster of workstations using the C programming language and mpich, a portable implementation of the MPI (Me ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Improved parallel, external and parallelexternal algorithms for listranking and computing the connected components of a graph are presented. These algorithms are implemented and tested on a cluster of workstations using the C programming language and mpich, a portable implementation of the MPI (MessagePassing Interface) standard.
Techniques for Designing Efficient Parallel Graph Algorithms for SMPs and Multicore Processors
"... Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Graph problems are finding increasing applications in high performance computing disciplines. Although many regular problems can be solved efficiently in parallel, obtaining efficient implementations for irregular graph problems remains a challenge. We propose techniques for designing and implementing efficient parallel algorithms for graph problems on symmetric multiprocessors and chip multiprocessors with a case study of parallel tree and connectivity algorithms. The problems we study represent a wide range of irregular problems that have fast theoretic parallel algorithms but no known efficient parallel implementations that achieve speedup without serious restricting assumptions about the inputs. We believe our techniques will be of practical impact in solving largescale graph problems.
Parallel cluster labeling for largescale Monte Carlo Simulations
, 2008
"... We present an optimized version of a cluster labeling algorithm previously introduced by the authors. This algorithm is well suited for largescale Monte Carlo simulations of spin models using cluster dynamics on parallel computers with large numbers of processors. The algorithm divides physical spa ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present an optimized version of a cluster labeling algorithm previously introduced by the authors. This algorithm is well suited for largescale Monte Carlo simulations of spin models using cluster dynamics on parallel computers with large numbers of processors. The algorithm divides physical space into rectangular cells which are assigned to processors and combines a serial local labeling procedure with a relaxation process across nearestneighbor processors. By controlling overhead and reducing interprocessor communication this method attains good computational speedup and efficiency. Large systems of up to 65536 2 spins have been simulated at updating ... speeds of 11 nanosecs/site (90.7 × 10 6 spin updates/sec) using stateoftheart supercomputers. In the second part of the article we use the cluster algorithm to study the relaxation of magnetization and energy on large Ising models using SwendsenWang dynamics. We found evidence that exponential and power law factors are present in the relaxation process as has been proposed by Hackl et al. The variation of the powerlaw exponent λM taken at face value indicates that the value of zM falls in the interval 0.31 − 0.49 for the time interval analysed and appears to vanish asymptotically.
WHAT GOOD ARE SHAREDMEMORY MODELS?
 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING
, 1996
"... Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular messagepassing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they ma ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular messagepassing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they may have had. In this invited position papel; we discuss the continuing importance of shared memory models in the design and analysis of parallel algorithms. We describe a new model, the Queuing Shared Memory (QSM) model, that accounts for limited communication bandwidth while still providing a shared memory abstraction, and provide evidence of its practicality. Finally, we discuss important areas for future models research. We argue that the compelling need for parallel computing in large scale data analysis (e.g., decision support, data mining) implies that the most important modeling issue going forward concerns how best to model disk I/O.
IHSTMBDTEOK STATEMENT I' for paJblfc mi®im®$
, 1997
"... the Maui High Performance Computing Center under cooperative agreement number F296019320001 with the ..."
Abstract
 Add to MetaCart
the Maui High Performance Computing Center under cooperative agreement number F296019320001 with the