Results 1  10
of
29
Hopscotch hashing
 22nd Intl. Symp. on Distributed Computing
, 2008
"... Abstract. We present a new class of resizable sequential and concurrent hash map algorithms directed at both uniprocessor and multicore machines. The new hopscotch algorithms are based on a novel hopscotch multiphased probing and displacement technique that has the flavors of chaining, cuckoo hash ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We present a new class of resizable sequential and concurrent hash map algorithms directed at both uniprocessor and multicore machines. The new hopscotch algorithms are based on a novel hopscotch multiphased probing and displacement technique that has the flavors of chaining, cuckoo hashing, and linear probing, all put together, yet avoids the limitations and overheads of these former approaches. The resulting algorithms provide tables with very low synchronization overheads and high cache hit ratios. In a series of benchmarks on a stateoftheart 64way Niagara II multicore machine, a concurrent version of hopscotch proves to be highly scalable, delivering in some cases 2 or even 3 times the throughput of today’s most efficient concurrent hash algorithm, Lea’s ConcurrentHashMap from java.concurr.util. Moreover, in tests on both Intel and Sun uniprocessor machines, a sequential version of hopscotch consistently outperforms the most effective sequential hash table algorithms including cuckoo hashing and bounded linear probing. The most interesting feature of the new class of hopscotch algorithms is that they continue to deliver good performance when the hash table is more than 90 % full, increasing their advantage over other algorithms as the table density grows. 1
On the Representation and Multiplication of Hypersparse Matrices
, 2008
"... Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on the multiplication of sparse matrices (SpGEMM). We first present the issues with existing sparse matrix representations and multiplication algorithms that make them unscalable to thousands of processors. Then, we develop and analyze two new algorithms that overcome these limitations. We consider our algorithms first as the sequential kernel of a scalable parallel sparse matrix multiplication algorithm and second as part of a polyalgorithm for SpGEMM that would execute different kernels depending on the sparsity of the input matrices. Such a sequential kernel requires a new data structure that exploits the hypersparsity of the individual submatrices owned by a single processor after the 2D partitioning. We experimentally evaluate the performance and characteristics of our algorithms and show that they scale significantly better than existing kernels.
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
(Show Context)
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on twodimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a stateoftheart MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.
Octavo: an FPGACentric Processor Family
"... Overlay processor architectures allow FPGAs to be programmed by nonexperts using software, but prior designs have mainly been based on the architecture of their ASIC predecessors. In this paper we develop a new processor architecture that from the beginning accounts for and exploits the predefined ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Overlay processor architectures allow FPGAs to be programmed by nonexperts using software, but prior designs have mainly been based on the architecture of their ASIC predecessors. In this paper we develop a new processor architecture that from the beginning accounts for and exploits the predefined widths, depths, maximum operating frequencies, and other discretizations and limits of the underlying FPGA components. The result is Octavo, a tenpipelinestage eightthreaded processor that operates at the block RAM maximum of 550MHz on a Stratix IV FPGA. Octavo is highly parameterized, allowing us to explore tradeoffs in datapath and memory width, memory depth, and number of supported thread contexts.
A Model Counter For Constraints Over Unbounded Strings
"... Model counting is the problem of determining the number of solutions that satisfy a given set of constraints. Model counting has numerous applications in the quantitative analyses of program execution time, information flow, combinatorial circuit designs as well as probabilistic reasoning. We pres ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Model counting is the problem of determining the number of solutions that satisfy a given set of constraints. Model counting has numerous applications in the quantitative analyses of program execution time, information flow, combinatorial circuit designs as well as probabilistic reasoning. We present a new approach to model counting for structured data types, specifically strings in this work. The key ingredient is a new technique that leverages generating functions as a basic primitive for combinatorial counting. Our tool SMC which embodies this approach can model count for constraints specified in an expressive string language efficiently and precisely, thereby outperforming previous finitesize analysis tools. SMC is expressive enough to model constraints arising in realworld JavaScript applications and UNIX C utilities. We demonstrate the practical feasibility of performing quantitative analyses arising in security applications, such as determining the comparative strengths of password strength meters and determining the information leakage via side channels. 1.
Fast Distributed Coloring Algorithms for TriangleFree Graphs
, 2013
"... Vertex coloring is a central concept in graph theory and an important symmetrybreaking primitive in distributed computing. Whereas degree ∆ graphs may require palettes of ∆+1 colors in the worst case, it is well known that the chromatic number of many natural graph classes can be much smaller. In ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
Vertex coloring is a central concept in graph theory and an important symmetrybreaking primitive in distributed computing. Whereas degree ∆ graphs may require palettes of ∆+1 colors in the worst case, it is well known that the chromatic number of many natural graph classes can be much smaller. In this paper we give new distributed algorithms to find (∆/k)coloring in graphs of girth 4 (trianglefree graphs), girth 5, and trees, where k is at most ( 1 4 − o(1)) ln ∆ in trianglefree graphs and at most (1 − o(1)) ln ∆ in girth5 graphs and trees, and o(1) is a function of ∆. Specifically,for∆sufficiently large we can find such a coloring in O(k +log ∗ n)time.Moreover,forany ∆ we can compute such colorings in roughly logarithmic time for trianglefree and girth5 graphs, and in O(log ∆ +log∆log n) time on trees. As a byproduct, our algorithm shows that the chromatic number of trianglefree graphs is at most (4 + o(1)) ∆ ln ∆, which improves on Jamall’s recent bound of (67 + o(1)) ∆ ln ∆ 1. Also, we show that (∆+1)coloring for trianglefree graphs can be obtained in sublogarithmic time for any ∆.
Amortized Resource Analysis with Polymorphic Recursion and Partial BigStep Operational Semantics  Extended Version
"... This paper studies the problem of statically determining upper bounds on the resource consumption of firstorder functional programs. A previous work approached the problem with an automatic typebased amortized analysis for polynomial resource bounds. The analysis is parametric in the resource and ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
This paper studies the problem of statically determining upper bounds on the resource consumption of firstorder functional programs. A previous work approached the problem with an automatic typebased amortized analysis for polynomial resource bounds. The analysis is parametric in the resource and can be instantiated to heap space, stack space, or clock cycles. Experiments with a prototype implementation have shown that programs are analyzed efficiently and that the computed bounds exactly match the measured worstcase resource behavior for many functions. This paper describes the inference algorithm that is used in the implementation of the system. It can deal with resourcepolymorphic recursion which is required in the type derivation of many functions. The computation of the bounds is fully automatic if a maximal degree of the polynomials is given. The soundness of the inference is proved with respect to a novel operational semantics for partial evaluations to show that the inferred bounds hold for terminating as well as nonterminating computations. A corollary is that runtime bounds also establish the termination of programs.
The 2nd Verified Software Competition: Experience Report
"... We report on the second verified software competition. It was organized by the three authors on a 48 hours period on November 8–10, 2011. This paper describes the competition, presents the five problems that were proposed to the participants, and gives an overview of the solutions sent by the 29 t ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We report on the second verified software competition. It was organized by the three authors on a 48 hours period on November 8–10, 2011. This paper describes the competition, presents the five problems that were proposed to the participants, and gives an overview of the solutions sent by the 29 teams that entered the competition.
An Experimental Study of Sorting and Branch Prediction
"... Sorting is one of the most important and well studied problems in Computer Science. Many good algorithms are known which offer various tradeoffs in efficiency, simplicity, memory use, and other factors. However, these algorithms do not take into account features of modern computer architectures tha ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Sorting is one of the most important and well studied problems in Computer Science. Many good algorithms are known which offer various tradeoffs in efficiency, simplicity, memory use, and other factors. However, these algorithms do not take into account features of modern computer architectures that significantly influence performance. Caches and branch predictors are two such features, and while there has been a significant amount of research into the cache performance of general purpose sorting algorithms, there has been little research on their branch prediction properties. In this paper we empirically examine the behaviour of the branches in all the most common sorting algorithms. We also consider the interaction of cache optimization on the predictability of the branches in these algorithms. We find insertion sort to have the fewest branch mispredictions of any comparisonbased sorting algorithm, that bubble and shaker sort operate in a fashion which makes their branches highly unpredictable, that the unpredictability of shellsort’s branches improves its caching behaviour and that several cache optimizations have little effect on mergesort’s branch mispredictions. We find also that optimizations to quicksort – for example the choice of pivot – have a strong influence on the predictability of its branches. We point out a simple way of removing branch instructions from a classic heapsort implementation, and show also that unrolling a loop in a cache optimized heapsort implementation improves the predicitability of its branches. Finally, we note that when sorting random data twolevel adaptive branch predictors are usually no better than simpler bimodal predictors. This is despite the fact that twolevel adaptive predictors are almost always superior to bimodal predictors in general.