Results 11  20
of
25
An experimental study of a parallel shortest path algorithm for solving largescale graph instances
 Ninth Workshop on Algorithm Engineering and Experiments (ALENEX 2007)
, 2007
"... We present an experimental study of the single source shortest path problem with nonnegative edge weights (NSSP) on largescale graphs using the $\Delta$stepping parallel algorithm. We report performance results on the Cray MTA2, a multithreaded parallel computer. The MTA2 is a highend shared m ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We present an experimental study of the single source shortest path problem with nonnegative edge weights (NSSP) on largescale graphs using the $\Delta$stepping parallel algorithm. We report performance results on the Cray MTA2, a multithreaded parallel computer. The MTA2 is a highend shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit finegrained parallelism, and lowoverhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for lowdiameter sparse graphs. For instance, $\Delta$stepping on a directed scalefree graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.
Deamortized Cuckoo Hashing: Provable WorstCase Performance and Experimental Results
"... Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whe ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
Cuckoo hashing is a highly practical dynamic dictionary: it provides amortized constant insertion time, worst case constant deletion time and lookup time, and good memory utilization. However, with a noticeable probability during the insertion of n elements some insertion requires Ω(log n) time. Whereas such an amortized guarantee may be suitable for some applications, in other applications (such as highperformance routing) this is highly undesirable. Kirsch and Mitzenmacher (Allerton ’07) proposed a deamortization of cuckoo hashing using queueing techniques that preserve its attractive properties. They demonstrated a significant improvement to the worst case performance of cuckoo hashing via experimental results, but left open the problem of constructing a scheme with provable properties. In this work we present a deamortization of cuckoo hashing that provably guarantees constant worst case operations. Specifically, for any sequence of polynomially many operations, with overwhelming probability over the randomness of the initialization phase, each operation is performed in constant time. In addition, we present a general approach for proving that the performance guarantees are preserved when using hash functions with limited independence
Parallel Shortest Path Algorithms for Solving . . .
, 2006
"... We present an experimental study of the single source shortest path problem with nonnegative edge weights (NSSP) on largescale graphs using the ∆stepping parallel algorithm. We report performance results on the Cray MTA2, a multithreaded parallel computer. The MTA2 is a highend shared memory s ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
We present an experimental study of the single source shortest path problem with nonnegative edge weights (NSSP) on largescale graphs using the ∆stepping parallel algorithm. We report performance results on the Cray MTA2, a multithreaded parallel computer. The MTA2 is a highend shared memory system offering two unique features that aid the efficient parallel implementation of irregular algorithms: the ability to exploit finegrained parallelism, and lowoverhead synchronization primitives. Our implementation exhibits remarkable parallel speedup when compared with competitive sequential algorithms, for lowdiameter sparse graphs. For instance, ∆stepping on a directed scalefree graph of 100 million vertices and 1 billion edges takes less than ten seconds on 40 processors of the MTA2, with a relative speedup of close to 30. To our knowledge, these are the first performance results of a shortest path problem on realistic graph instances in the order of billions of vertices and edges.
How to Build an Interference Graph
, 1988
"... This paper examines the tradeoffs between these two approaches ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
This paper examines the tradeoffs between these two approaches
On Computing the Subset Graph of a Collection of Sets
, 1995
"... Let a given collection of sets have size N measured by the sum of the cardinalities. Yellin and Jutla presented an algorithm which constructed the partial order induced by the subset relation (a "subset graph") in O(N 2 = log N) operations over a dictionary ADT, and exhibited a collection whose su ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Let a given collection of sets have size N measured by the sum of the cardinalities. Yellin and Jutla presented an algorithm which constructed the partial order induced by the subset relation (a "subset graph") in O(N 2 = log N) operations over a dictionary ADT, and exhibited a collection whose subset graph had \Theta(N 2 = log 2 N) edges. This paper establishes a matching upper bound on the number of edges in a subset graph, shows that the known bound on Yellin and Jutla's algorithm is tight, presents a simple implementation requiring O(1) bitparallel operations per ADT operation, and presents a variant of the algorithm with an implementation requiring O(N 2 = log N) RAM operations. 1 Introduction Yellin and Jutla [9] tackled the following problem. Our interest in it arose from the application studied in [6], but we feel the problem is a fundamental one, likely to arise in many contexts. Given is a collection F = fS 1 ; : : : ; S k g, where each S i is a set over the same d...
Fast Liveness Checking for SSAForm Programs
"... Liveness analysis is an important analysis in optimizing compilers. Liveness information is used in several optimizations and is mandatory during the codegeneration phase. Two drawbacks of conventional liveness analyses are that their computations are fairly expensive and their results are easily i ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Liveness analysis is an important analysis in optimizing compilers. Liveness information is used in several optimizations and is mandatory during the codegeneration phase. Two drawbacks of conventional liveness analyses are that their computations are fairly expensive and their results are easily invalidated by program transformations. We present a method to check liveness of variables that overcomes both obstacles. The major advantage of the proposed method is that the analysis result survives all program transformations except for changes in the controlflow graph. For common program sizes our technique is faster and consumes less memory than conventional dataflow approaches. Thereby, we heavily make use of SSAform properties, which allow us to completely circumvent dataflow equation solving. We evaluate the competitiveness of our approach in an industrial strength compiler. Our measurements use the integer part of the SPEC2000 benchmarks and investigate the liveness analysis used by the SSA destruction pass. We compare the net time spent in liveness computations of our implementation against the one provided by that compiler. The results show that in the vast majority of cases our algorithm, while providing the same quality of information, needs less time: an average speedup of 16%.
Register Allocation in the Gardens Point Compilers
 In Proceedings ACSC18, Adelaide, Australia. Australian Computer Science Society
, 1994
"... Gardens Point compiler backends interface to frontends through an abstract stack machine form. Code generating backends have been constructed for several machine architectures. They produce code which compares favourably with production compilers for the same machines. Here we discuss register alloc ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Gardens Point compiler backends interface to frontends through an abstract stack machine form. Code generating backends have been constructed for several machine architectures. They produce code which compares favourably with production compilers for the same machines. Here we discuss register allocation algorithms which are based on the theory of graph colouring. These algorithms pose implementation problems which are the subject of ongoing research. We discuss our approach to implementation, and the interaction of these algorithms with the most recent advances in global program analysis. 1 Overview The Gardens Point compiler family first began as a project to make the language Modula2 available in a consistent implementation on contemporary machines. It has become a flexible platform for research in compiler technology as well as associated areas[1, 2, 3, 4]. Currently compiler frontends exist for Modula2, Oberon2, C, and a presently unnamed object oriented language. A Sather f...
Efficient data flow analysis using DJgraphs: Elimination methods revisited
, 1995
"... In this paper we present a new approach to elimination based data flow analysis that uses a program representation called the DJ Graph. The skeleton of the DJ graph of a program is the dominator tree of its flowgraph (whose edges are called D edges in this paper), and the tree skeleton is augmented ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
In this paper we present a new approach to elimination based data flow analysis that uses a program representation called the DJ Graph. The skeleton of the DJ graph of a program is the dominator tree of its flowgraph (whose edges are called D edges in this paper), and the tree skeleton is augmented with join edges (called J edges in this paper). Unlike the previous elimination methods, which first reduce a flowgraph to a single node, our approach only eliminate J edges from the DJ graph in a bottomup fashion during the reduction process, while maintainting the dominator tree structure (which may be compressed). We propose two methods for eliminating variables: (1) eager elimination method, and (2) delayed elimination method. With eager elimination, we first perform variable elimination on the DJgraph in a bottomup manner. Once we determine the solution for the root node, we propagate this information in a topdown fashion on the dominator tree and determine the corresponding solutio...
Iterative dataflow analysis, revisited
, 2004
"... ABSTRACT The iterative algorithm is widely used to solve instances of dataflow analysis problems. The algorithm is attractive because it is easy to implement and robust in its behavior. The theory behind the algorithm shows that, for a broad class of problems, it terminates and produces correct res ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
ABSTRACT The iterative algorithm is widely used to solve instances of dataflow analysis problems. The algorithm is attractive because it is easy to implement and robust in its behavior. The theory behind the algorithm shows that, for a broad class of problems, it terminates and produces correct results. The theory also establishes a set of conditions where the algorithm runs in at most d(G) + 3 passes over the graph a roundrobin algorithm, running a "rapid " framework, on a reducible graph [22]. Fortunately, these restrictions encompass many practical analyses used in code optimization. In practice, compilers encounter situations that lie outside this carefully described region. Compilers encounter irreducible graphs probably more often than the early studies suggest. They use variations of the algorithm other than the roundrobin form. They run on problems that are not rapid.
Fully Persistent Graphs  Which One To Choose?
 9th Int. Workshop on Implementation of Functional Languages. LNCS 1467
, 1997
"... . Functional programs, by nature, operate on functional, or persistent, data structures. Therefore, persistent graphs are a prerequisite to express functional graph algorithms. In this paper we describe two implementations of persistent graphs and compare their running times on different graph probl ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
. Functional programs, by nature, operate on functional, or persistent, data structures. Therefore, persistent graphs are a prerequisite to express functional graph algorithms. In this paper we describe two implementations of persistent graphs and compare their running times on different graph problems. Both data structures essentially represent graphs as adjacency lists. The first uses the version tree implementation of functional arrays to make adjacency lists persistent. An array cache of the newest graph version together with a time stamping technique for speeding up deletions makes it asymptotically optimal for a class of graph algorithms that use graphs in a singlethreaded way. The second approach uses balanced search trees to store adjacency lists. For both structures we also consider several variations, for example, ignoring edge labels or predecessor information. 1 Introduction A data structure is called persistent if it is possible to access old versions after updates. It ...