Results 1  10
of
55
GraphChi: Largescale Graph Computation On just a PC
 In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, OSDI’12
, 2012
"... Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains c ..."
Abstract

Cited by 109 (6 self)
 Add to MetaCart
Current systems for graph computation require a distributed computing cluster to handle very large realworld problems, such as analysis on social networks or the web graph. While distributed computational resources have become more accessible, developing distributed graph algorithms still remains challenging, especially to nonexperts. In this work, we present GraphChi, a diskbased system for computing efficiently on graphs with billions of edges. By using a wellknown method to break large graphs into small parts, and a novel parallel sliding windows method, GraphChi is able to execute several advanced data mining, graph mining, and machine learning algorithms on very large graphs, using just a single consumerlevel computer. We further extend GraphChi to support graphs that evolve over time, and demonstrate that, on a single computer, GraphChi can process over one hundred thousand graph updates per second, while simultaneously performing computation. We show, through experiments and theoretical analysis, that GraphChi performs well on both SSDs and rotational hard drives. By repeating experiments reported for existing distributed systems, we show that, with only fraction of the resources, GraphChi can solve the same problems in very reasonable time. Our work makes largescale graph computation available to anyone with a modern PC. 1
Better external memory suffix array construction
 In: Workshop on Algorithm Engineering & Experiments
, 2005
"... Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. However, so far it has looked prohibitive to build suffix arrays for huge inputs that do not fit into main ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
(Show Context)
Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. However, so far it has looked prohibitive to build suffix arrays for huge inputs that do not fit into main memory. This paper presents design, analysis, implementation, and experimental evaluation of several new and improved algorithms for suffix array construction. The algorithms are asymptotically optimal in the worst case or on the average. Our implementation can construct suffix arrays for inputs of up to 4GBytes in hours on a low cost machine. As a tool of possible independent interest we present a systematic way to design, analyze, and implement pipelined algorithms.
MCSTL: The MultiCore Standard Template Library
"... Abstract. 1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The ..."
Abstract

Cited by 32 (11 self)
 Add to MetaCart
(Show Context)
Abstract. 1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The MultiCore Standard Template Library (MCSTL) simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8core 32thread SUN T1. 1
A computational study of externalmemory BFS algorithms
 In SODA
, 2006
"... Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent exte ..."
Abstract

Cited by 26 (4 self)
 Add to MetaCart
(Show Context)
Breadth First Search (BFS) traversal is an archetype for many important graph problems. However, computing a BFS level decomposition for massive graphs was considered nonviable so far, because of the large number of I/Os it incurs. This paper presents the first experimental evaluation of recent externalmemory BFS algorithms for general graphs. With our STXXL based implementations exploiting pipelining and diskparallelism, we were able to compute the BFS level decomposition of a webcrawl based graph of around 130 million nodes and 1.4 billion edges in less than 4 hours using single disk and 2.3 hours using 4 disks. We demonstrate that some rather simple externalmemory algorithms perform significantly better (minutes as compared to hours) than internalmemory BFS, even if more than half of the input resides internally. 1
The CacheOblivious Gaussian Elimination Paradigm: Theoretical Framework, Parallelization and Experimental Evaluation
, 2009
"... We consider triplynested loops of the type that occur in the standard Gaussian elimination algorithm, which we denote by GEP (or the Gaussian Elimination Paradigm). We present two related cacheoblivious methods IGEP and CGEP, both of which reduce the number of I/Os performed by the computation o ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
(Show Context)
We consider triplynested loops of the type that occur in the standard Gaussian elimination algorithm, which we denote by GEP (or the Gaussian Elimination Paradigm). We present two related cacheoblivious methods IGEP and CGEP, both of which reduce the number of I/Os performed by the computation over that performed by standard GEP by a factor of √ M, where M is the size of the cache. Cacheoblivious IGEP computes inplace and solves most of the known applications of GEP including Gaussian elimination and LUdecomposition without pivoting and FloydWarshall allpairs shortest paths. Cacheoblivious CGEP uses a modest amount of additional space, but is completely general and applies to any code in GEP form. Both IGEP and CGEP produce systemindependent cacheefficient code, and are potentially applicable to being used by optimizing compilers for loop transformation. We present parallel IGEP and CGEP that achieve good speedup and match the sequential caching performance cacheobliviously for both shared and distributed caches for sufficiently large inputs. We present extensive experimental results for both incore and outofcore performance of our algorithms. We consider both sequential and parallel implementations, and compare them with finelytuned cacheaware BLAS code for matrix multiplication and Gaussian elimination without pivoting. Our results indicate that cacheoblivious GEP offers an attractive tradeoff between efficiency and portability.
Revisiting Resistance Speeds Up I/OEfficient LTL Model Checking
, 2008
"... Revisiting resistant graph algorithms are those that can tolerate reexploration of edges without yielding incorrect results. Revisiting resistant I/O efficient graph algorithms exhibit considerable speedup in practice in comparison to nonrevisiting resistant algorithms. In the paper we present a ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
Revisiting resistant graph algorithms are those that can tolerate reexploration of edges without yielding incorrect results. Revisiting resistant I/O efficient graph algorithms exhibit considerable speedup in practice in comparison to nonrevisiting resistant algorithms. In the paper we present a new revisiting resistant I/O efficient LTL model checking algorithm. We analyze its theoretical I/O complexity and we experimentally compare its performance to already existing I/O efficient LTL model checking algorithms.
Inducing suffix and LCP arrays in external memory
 In Proc. ALENEX
, 2013
"... We consider full text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory, which utilizes an efficient EM priority queue and runs in sorting complexity. Practical tests show that this algorithm outperforms the previous best ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
We consider full text index construction in external memory (EM). Our first contribution is an inducing algorithm for suffix arrays in external memory, which utilizes an efficient EM priority queue and runs in sorting complexity. Practical tests show that this algorithm outperforms the previous best EM suffix sorter [Dementiev et al., JEA 2008] by a factor of about two in time and I/Ovolume. Our second contribution is to augment the first algorithm to also construct the array of longest common prefixes (LCPs). This yields the first EM construction algorithm for LCP arrays. The overhead in time and I/O volume for this extended algorithm over plain suffix array construction is roughly two. Our algorithms scale far beyond problem sizes previously considered in the literature (text size of 80 GiB using only 4 GiB of RAM in our experiments). 1
Computing Visibility on Terrains in External Memory
"... We describe a novel application of the distribution sweeping technique to computing visibility on terrains. Given an arbitrary viewpoint v, the basic problem we address is computing the visibility map or viewshed of v, which is the set of points in the terrain that are visible from v. We give the fi ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We describe a novel application of the distribution sweeping technique to computing visibility on terrains. Given an arbitrary viewpoint v, the basic problem we address is computing the visibility map or viewshed of v, which is the set of points in the terrain that are visible from v. We give the first I/Oefficient algorithm to compute the viewshed of v on a grid terrain in external memory. Our algorithm is based on Van Kreveld’s O(n lg n) time algorithm for the same problem in internal memory. It uses O(sort(n)) I/Os, where sort(n) is the complexity of sorting n items of data in the I/Omodel. We present an implementation and experimental evaluation of the algorithm. Our implementation clearly outperforms the previous (inmemory) algorithms and can compute visibility for terrains of up to 4 GB in a few hours on a lowcost machine.
ABSTRACT The GNU libstdc++ parallel mode: Software Engineering Considerations
"... The C++ Standard Library implementation provided with the free GNU C++ compiler, libstdc++, provides a “parallel mode ” as of version 4.3. Using this mode enables existing serial code to take advantage of many parallelized STL algorithms, an approach to making use of multicore processors which are ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
The C++ Standard Library implementation provided with the free GNU C++ compiler, libstdc++, provides a “parallel mode ” as of version 4.3. Using this mode enables existing serial code to take advantage of many parallelized STL algorithms, an approach to making use of multicore processors which are now or will soon will be ubiquitous. This paper describes the software engineering issues discovered during implementation, the results of user testing, and presents possible solutions to outstanding issues. Design issues with configuring the software environment to a wide variety of multicore hardware options, influencing algorithm and parameter choices at compile and run time, standards compliance, and the interplay between execution speed, the executable size, the library code size, and the compilation time are addressed.
On computational models for flash memory devices
 in Experimental Algorithms, 2009
"... Abstract. Flash memorybased solidstate disks are fast becoming the dominant form of enduser storage devices, partly even replacing the traditional harddisks. Existing twolevel memory hierarchy models fail to realize the full potential of flashbased storage devices. We propose two new computati ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Flash memorybased solidstate disks are fast becoming the dominant form of enduser storage devices, partly even replacing the traditional harddisks. Existing twolevel memory hierarchy models fail to realize the full potential of flashbased storage devices. We propose two new computation models, the general flash model and the unitcost model, for memory hierarchies involving these devices. Our models are simple enough for meaningful algorithm design and analysis. In particular, we show that a broad range of existing externalmemory algorithms and data structures based on the merging paradigm can be adapted efficiently into the unitcost model. Our experiments show that the theoretical analysis of algorithms on our models corresponds to the empirical behavior of algorithms when using solidstate disks as external memory. 1