Results 1  10
of
23
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 191 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Scans as Primitive Parallel Operations
 IEEE Transactions on Computers
, 1987
"... In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of ..."
Abstract

Cited by 157 (12 self)
 Add to MetaCart
In most parallel randomaccess machine (PRAM) models, memory references are assumed to take unit time. In practice, and in theory, certain scan operations, also known as prefix computations, can executed in no more time than these parallel memory references. This paper outline an extensive study of the effect of including in the PRAM models, such scan operations as unittime primitives. The study concludes that the primitives improve the asymptotic running time of many algorithms by an O(lg n) factor, greatly simplify the description of many algorithms, and are significantly easier to implement than memory references. We therefore argue that the algorithm designer should feel free to use these operations as if they were as cheap as a memory reference. This paper describes five algorithms that clearly illustrate how the scan primitives can be used in algorithm design: a radixsort algorithm, a quicksort algorithm, a minimumspanning tree algorithm, a linedrawing algorithm and a mergi...
PEGASUS: A PetaScale Graph Mining System Implementation and Observations
 IEEE INTERNATIONAL CONFERENCE ON DATA MINING
, 2009
"... Abstract—In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga, Tera or P ..."
Abstract

Cited by 64 (21 self)
 Add to MetaCart
Abstract—In this paper, we describe PEGASUS, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node and finding the connected components. As the size of graphs reaches several Giga, Tera or Petabytes, the necessity for such a library grows too. To the best of our knowledge, PEGASUS is the first such library, implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components etc.) are essentially a repeated matrixvector multiplication. In this paper we describe a very important primitive for PEGASUS, called GIMV (Generalized Iterated MatrixVector multiplication). GIMV is highly optimized, achieving (a) good scaleup on the number of available machines (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the nonoptimized version of GIMV. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web Graphs, thanks to Yahoo!, with ≈ 6,7 billion edges. KeywordsPEGASUS; graph mining; hadoop I.
A Comparison of DataParallel Algorithms for Connected Components
 In Proc. 6th Ann. Symp. Parallel Algorithms and Architectures (SPAA94
, 1994
"... This paper presents a pragmatic comparison of three parallel algorithms for finding connected components, together with optimizations on these algorithms. Those being compared are two similar algorithms by Awerbuch and Shiloach [2] and by Shiloach and Vishkin [19] and a randomized contraction algori ..."
Abstract

Cited by 31 (1 self)
 Add to MetaCart
This paper presents a pragmatic comparison of three parallel algorithms for finding connected components, together with optimizations on these algorithms. Those being compared are two similar algorithms by Awerbuch and Shiloach [2] and by Shiloach and Vishkin [19] and a randomized contraction algorithm by Blelloch [7], based on algorithms by Reif [18] and Phillips [17]. Major improvements are given for the first two which significantly reduces the superlinear component of their work complexity. An improvement is also given for randomized algorithm, and this algorithm is shown to be the fastest of those tested. These comparisons are presented with NESL dataparallel code as executed on a Connection Machine 2. This research was sponsored in part by the Defense Advanced Research Projects Agency, CSTO, under the title "The Fox Project: Advanced Development of Systems Software", ARPA Order No. 8313, issued by ESD/AVS under Contract No. F1962891C0168, and in part by the ONR Graduate Fell...
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...
Connected Components on Distributed Memory Machines
 Parallel Algorithms: 3rd DIMACS Implementation Challenge October 1719, 1994, volume 30 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, 1994
"... . The efforts of the theory community to develop efficient PRAM algorithms often receive little attention from application programmers. Although there are PRAM algorithm implementations that perform reasonably on shared memory machines, they often perform poorly on distributed memory machines, where ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
. The efforts of the theory community to develop efficient PRAM algorithms often receive little attention from application programmers. Although there are PRAM algorithm implementations that perform reasonably on shared memory machines, they often perform poorly on distributed memory machines, where the cost of remote memory accesses is relatively high. We present a hybrid approach to solving the connected components problem, whereby a PRAM algorithm is merged with a sequential algorithm and then optimized to create an efficient distributed memory implementation. The sequential algorithm handles local work on each processor, and the PRAM algorithm handles interactions between processors. Our hybrid algorithm uses the ShiloachVishkin CRCW PRAM algorithm on a partition of the graph distributed over the processors and sequential breadthfirst search within each local subgraph. The implementation uses the SplitC language developed at Berkeley, which provides a global address space and al...
A Comparison of Parallel Algorithms for Connected Components
 in the Symposium on Parallel Algorithms and Architectures
, 1994
"... This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by ShiloachVishkin [22] and AwerbuchShiloach [2], a randomized contracti ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper presents a comparison of the pragmatic aspects of some parallel algorithms for finding connected components, together with optimizations on these algorithms. The algorithms being compared are two similar algorithms by ShiloachVishkin [22] and AwerbuchShiloach [2], a randomized contraction algorithm based on algorithms by Reif [21] and Phillips [20], and a hybrid algorithm [11]. Improvements are given for the first two to improve performance significantly, although without improving their asymptotic complexity. The hybrid combines features of the others and is generally the fastest of those tested. Timings were made using NESL [4] code as executed on a Connection Machine 2 and Cray YMP/C90. 1 Introduction The complexity of various PRAM algorithms has received much attention, but there has been relatively little work on the implementation and pragmatic efficiency of many of these algorithms. Moreover, much of this work has been for algorithms having regular communication ...
Connected Components Algorithms For MeshConnected Parallel Computers
 Parallel Algorithms: 3rd DIMACS Implementation Challenge October 1719, 1994, volume 30 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, 1995
"... . We present a new CREW PRAM algorithm for finding connected components. For a graph G with n vertices and m edges, algorithmA 0 requires at most O(logn) parallel steps and performs O((n+m) log n) work in the worst case. The advantage our algorithm has over others in the literature is that it can be ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
. We present a new CREW PRAM algorithm for finding connected components. For a graph G with n vertices and m edges, algorithmA 0 requires at most O(logn) parallel steps and performs O((n+m) log n) work in the worst case. The advantage our algorithm has over others in the literature is that it can be adapted to a 2D meshconnected communication model in which all CREW operations are replaced by O(logn) parallel row and column operations without increasing the time complexity. We present the mapping of A 0 to a meshconnected computer and describe two implementations, A 1 and A 2 . Algorithm A 1 , which uses an adjacency matrix to represent the graph, performs O(n 2 log n) work. Hence, it only achieves work efficiency on dense graphs. The second implementation, A 2 , uses a sparse representation of the adjacency matrix and again performs O(logn) row and column operations but reduces the work to O((m + n) log n) on all graphs. We report MasPar MP1 performance figures for implementati...
Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines
 In Proceedings of Supercomputing '95
, 1996
"... : We present and analyze a portable, highperformance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the ShiloachVishkin PRAM algorithm on the global colle ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
: We present and analyze a portable, highperformance algorithm for finding connected components on modern distributed memory multiprocessors. The algorithm is a hybrid of the classic DFS on the subgraph local to each processor and a variant of the ShiloachVishkin PRAM algorithm on the global collection of subgraphs. We implement the algorithm in SplitC and measure performance on the the Cray T3D, the Meiko CS2, and the Thinking Machines CM5 using a class of graphs derived from cluster dynamics methods in computational physics. On a 256 processor Cray T3D, the implementation outperforms all previous solutions by an order of magnitude. A characterization of graph parameters allows us to select graphs that highlight key performance features. We study the effects of these parameters and machine characteristics on the balance of time between the local and global phases of the algorithm and find that edge density, surfacetovolume ratio, and relative communication cost dominate perform...
Asynchronous Resource Discovery in Peer to Peer Networks
 In 21st Symp. on Reliable Distributed Systems, October 2002 Japan
, 2002
"... The resource discovery problem arises in the context of peer to peer (P2P) networks, where at any point of time a peer may be placed at or removed from any location over a general purpose network (e.g., an Internet site). A vertex (peer) can communicate with another vertex directly if and only if it ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
The resource discovery problem arises in the context of peer to peer (P2P) networks, where at any point of time a peer may be placed at or removed from any location over a general purpose network (e.g., an Internet site). A vertex (peer) can communicate with another vertex directly if and only if it knows a certain routing information to that other vertex. Hence, a critical task is for the peers to convey this routing information to each other. The problem was formalized by HarcholBalter, Leighton and Lewin [13]. The routing information needed for a vertex to reach another peer is that peer’s identifier (e.g., IP address). A logical directed edge represents the fact that the peer at the tail of the edge knows the IP address of the one at its head. A number of algorithms were developed in [13] for this problem in the model of a synchronous network over a weakly connected directed graph. The best of these algorithms was randomized. Subsequently, a deterministic algorithm for the problem on synchronous networks with improved complexity was presented in [15]. The current paper extends the deterministic algorithm of [15] to the environment of asynchronous networks, maintaining similar complexities (translated to the asynchronous model). These are lower than the complexities that would be needed to synchronize the system. The main technical difficulty in a directed, weakly connected system is to ensure that vertices take consistent steps, even if their knowledge about each other is not symmetric, and even if there is no timeout mechanism (which does exist in synchronous systems) to assist in that. (In particular, as opposed to the case in synchronous systems, here an algorithm cannot first