Results 1  10
of
12
Parallel crawlers
 In Proceedings of the 11th international conference on World Wide Web
, 2002
"... In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and ident ..."
Abstract

Cited by 86 (3 self)
 Add to MetaCart
In this paper we study how we can design an effective parallel crawler. As the size of the Web grows, it becomes imperative to parallelize a crawling process, in order to finish downloading pages in a reasonable amount of time. We first propose multiple architectures for a parallel crawler and identify fundamental issues related to parallel crawling. Based on this understanding, we then propose metrics to evaluate a parallel crawler, and compare the proposed architectures using 40 million pages collected from the Web. Our results clarify the relative merits of each architecture and provide a good guideline on when to adopt which architecture. 1
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
Connected Components Algorithms For MeshConnected Parallel Computers
 Parallel Algorithms: 3rd DIMACS Implementation Challenge October 1719, 1994, volume 30 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, 1995
"... . We present a new CREW PRAM algorithm for finding connected components. For a graph G with n vertices and m edges, algorithmA 0 requires at most O(logn) parallel steps and performs O((n+m) log n) work in the worst case. The advantage our algorithm has over others in the literature is that it can be ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
. We present a new CREW PRAM algorithm for finding connected components. For a graph G with n vertices and m edges, algorithmA 0 requires at most O(logn) parallel steps and performs O((n+m) log n) work in the worst case. The advantage our algorithm has over others in the literature is that it can be adapted to a 2D meshconnected communication model in which all CREW operations are replaced by O(logn) parallel row and column operations without increasing the time complexity. We present the mapping of A 0 to a meshconnected computer and describe two implementations, A 1 and A 2 . Algorithm A 1 , which uses an adjacency matrix to represent the graph, performs O(n 2 log n) work. Hence, it only achieves work efficiency on dense graphs. The second implementation, A 2 , uses a sparse representation of the adjacency matrix and again performs O(logn) row and column operations but reduces the work to O((m + n) log n) on all graphs. We report MasPar MP1 performance figures for implementati...
Thinking in parallel: Some basic dataparallel algorithms and techniques
 In use as class notes since
, 1993
"... Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Copyright 19922009, Uzi Vishkin. These class notes reflect the theorertical part in the Parallel
ConnectedComponents Algorithms For MeshConnected Parallel Computers
 PRESENTED AT THE 3RD DIMACS IMPLEMENTATION CHALLENGE WORKSHOP
, 1995
"... We present efficient parallel algorithms for finding the connected components of sparse and dense graphs using a meshconnected parallel computer. We start with a PRAM algorithm with work complexity O(n²log n). The algorithm performs O(logn) reduction and broadcast operations on within the rows an ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We present efficient parallel algorithms for finding the connected components of sparse and dense graphs using a meshconnected parallel computer. We start with a PRAM algorithm with work complexity O(n²log n). The algorithm performs O(logn) reduction and broadcast operations on within the rows and columns of a mesh connected computer. Next, a representation of the adjacency matrix for a sparse graph with m edges is chosen that preserves the communication structure of the algorithm but improves the work bound to O((n + m)logn). This work bound can be improved to the optimal O(n +m) bound through the use of graph contraction. In architectures like the MasPar MP1 and MP2, parallel row and column operations of the form described achieve high performance relative to unrestricted concurrent accesses typically found in parallel connected component algorithms for sparse graphs and exhibit no locality dependence. We present MasPar MP1 performance figures for implementations of the a...
unknown title
"... A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepte ..."
Abstract
 Add to MetaCart
A model of computation based on random access machines operating in parallel and sharing a common memory is presented. The computational power of this model is related to that of traditional models. In particular, deterministic parallel RAM's can accept in polynomial time exactly the sets accepted by polynomial tape bounded Turing machines; nondeterministic RAM's can accept in polynomial time exactly the sets accepted by nondeterministic exponential time bounded Turing machines. Similar results hold for other classes. The effect of limiting the size of the common memory is also considered. The speed of serial computers has increased
COLING 82, Z Horeck) (ed. } NorRHolland Pubhing Company Acaclem 1982
"... INTRODUCTION Le paralllisme pr6sente un grand int6rt en informatique : c'est un moyen d'aug menter la puissance des systmes de calcul en faisant le maximum de travail possible simultanment ou d'une fagon concurrente. L'utilisation extensive des sysmes informatiques qui n'exploitent pas cette possi ..."
Abstract
 Add to MetaCart
INTRODUCTION Le paralllisme pr6sente un grand int6rt en informatique : c'est un moyen d'aug menter la puissance des systmes de calcul en faisant le maximum de travail possible simultanment ou d'une fagon concurrente. L'utilisation extensive des sysmes informatiques qui n'exploitent pas cette possibilit a t un grand obstacle au d6veloppement de programmes parallles dans tousles domaines d'application. Beaucoup de calculs possedent un haut degr de paral161isme qui n'est pas exploit dans les architectures des ordinateurs classiques et les utilisateurs de ces syst6mes ne peuvent pas voir les effets du paral161isme sur la solution de leurs problmes. Dans le pass6, le paralllisme a t principalement utlis au niveau des syst6mes d'exploitation, afin de profiter des v6ritables architectures de multitraitement ou bien pour simuler ce concept sur des machines classiques. On peut esp6rer que les meilleurs candidats au paralllisme sont les processus qui consomment beaucoup de temps ou qui sont tr
A Novel Architecture of a Parallel Web Crawler
"... Due to the explosion in the size of the WWW[1,4,5] it becomes essential to make the crawling process parallel. In this paper we present an architecture for a parallel crawler that consists of multiple crawling processes called as Cprocs which can run on network of workstations. The proposed crawler ..."
Abstract
 Add to MetaCart
Due to the explosion in the size of the WWW[1,4,5] it becomes essential to make the crawling process parallel. In this paper we present an architecture for a parallel crawler that consists of multiple crawling processes called as Cprocs which can run on network of workstations. The proposed crawler is scalable, is resilient against system crashes and other event. The aim of this architecture is to efficiently and effectively crawl the current set of publically indexable web pages so that we can maximize the download rate while minimizing the overhead from parallelization
Fast GPU Algorithms for Graph Connectivity by
"... Abstract—Graphics processing units provide a large computational power at a very low price which position them as an ubiquitous accelerator. General purpose programming on the graphics processing units (GPGPU) is best suited for regular data parallel algorithms. They are not directly amenable for al ..."
Abstract
 Add to MetaCart
Abstract—Graphics processing units provide a large computational power at a very low price which position them as an ubiquitous accelerator. General purpose programming on the graphics processing units (GPGPU) is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as list ranking, and finding the connected components of a graph, and the like. In this work, we present a GPUoptimized implementation for finding the connected components of a given graph. Our implementation tries to minimize the impact of irregularity, both at the data level and functional level. Our implementation achieves a speed up of 9 to 12 times over the best sequential CPU implementation. For instance, our implementation finds connected components of a graph of 10 million nodes and 60 million edges in about 500 milliseconds on a GPU, given a random edge list. We also draw interesting observations on why PRAM algorithms, such as the ShiloachVishkin algorithm may not be a good fit for the GPU and how they should be modified.