Results 1 - 10
of
34
Cache-Oblivious Algorithms
, 1999
"... This thesis presents "cache-oblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cac ..."
Abstract
-
Cited by 70 (1 self)
- Add to MetaCart
This thesis presents "cache-oblivious" algorithms that use asymptotically optimal amounts of work, and move data asymptotically optimally among multiple levels of cache. An algorithm is cache oblivious if no program variables dependent on hardware configuration parameters, such as cache size and cache-line length need to be tuned to minimize the number of cache misses. We show that the ordinary algorithms for matrix transposition, matrix multiplication, sorting, and Jacobi-style multipass filtering are not cache optimal. We present algorithms for rectangular matrix transposition, FFT, sorting, and multipass filters, which are asymptotically optimal on computers with multiple levels of caches. For a cache with size Z and cache-line length L, where Z =# (L 2 ), the number of cache misses for an m × n matrix transpose is #(1 + mn=L). The number of cache misses for either an n-point FFT or the sorting of n numbers is #(1 + (n=L)(1 + log Z n)). The cache complexity of computing n ...
A Comparison of Sequential Delaunay Triangulation Algorithms
, 1996
"... This paper presents an experimental comparison of a number of different algorithms for computing the Deluanay triangulation. The algorithms examined are: Dwyer’s divide and conquer algorithm, Fortune’s sweepline algorithm, several versions of the incremental algorithm (including one by Ohya, Iri, an ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
This paper presents an experimental comparison of a number of different algorithms for computing the Deluanay triangulation. The algorithms examined are: Dwyer’s divide and conquer algorithm, Fortune’s sweepline algorithm, several versions of the incremental algorithm (including one by Ohya, Iri, and Murota, a new bucketing-based algorithm described in this paper, and Devillers’s version of a Delaunay-tree based algorithm that appears in LEDA), an algorithm that incrementally adds a correct Delaunay triangle adjacent to a current triangle in a manner similar to gift wrapping algorithms for convex hulls, and Barber’s convex hull based algorithm. Most of the algorithms examined are designed for good performance on uniformly distributed sites. However, we also test implementations of these algorithms on a number of non-uniform distibutions. The experiments go beyond measuring total running time, which tends to be machine-dependent. We also analyze the major high-level primitives that algorithms use and do an experimental analysis of how often implementations of these algorithms perform each operation.
Models of Computation -- Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Modeling Parallel Computers as Memory Hierarchies
- In Proc. Programming Models for Massively Parallel Computers
, 1993
"... A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for each machine that it is intended to represent, it should have a reasonably accurate specific model. The Parallel Memory Hierarchy (PMH) model of computation uses a single mechanism to model the costs of both interprocessor communication and memory hierarchy traffic. A computer is modeled as a tree of memory modules with processors at the leaves. All data movement takes the form of block transfers between children and their parents. This paper assesses the strengths and weaknesses of the PMH model as a generic model. 1 Introduction The raw computing power of multiprocessor computers is exploding. The challenge is to create software that can take advantage of this computing power. The diversit...
Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract
-
Cited by 41 (11 self)
- Add to MetaCart
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared-Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d, x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
Efficient External-Memory Data Structures and Applications
, 1996
"... In this thesis we study the Input/Output (I/O) complexity of large-scale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/O-efficient algorithms for them. A general theme in our work is to design I/O-effic ..."
Abstract
-
Cited by 38 (12 self)
- Add to MetaCart
In this thesis we study the Input/Output (I/O) complexity of large-scale problems arising e.g. in the areas of database systems, geographic information systems, VLSI design systems and computer graphics, and design I/O-efficient algorithms for them. A general theme in our work is to design I/O-efficient algorithms through the design of I/O-efficient data structures. One of our philosophies is to try to isolate all the I/O specific parts of an algorithm in the data structures, that is, to try to design I/O algorithms from internal memory algorithms by exchanging the data structures used in internal memory with their external memory counterparts. The results in the thesis include a technique for transforming an internal memory tree data structure into an external data structure which can be used in a batched dynamic setting, that is, a setting where we for example do not require that the result of a search operation is returned immediately. Using this technique we develop batched dynamic external versions of the (one-dimensional) range-tree and the segment-tree and we develop an external priority queue. Following our general philosophy we show how these structures can be used in standard internal memory sorting algorithms
Overcoming the Memory Bottleneck in Suffix Tree Construction
"... The suffix tree of a string is the fundamental data structure of string processing. Recent focus on massive data sets has sparked interest in overcoming the memory bottlenecks of known algorithms for building suffix trees. Our main contribution is a new algorithm for suffix tree construction in whic ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
The suffix tree of a string is the fundamental data structure of string processing. Recent focus on massive data sets has sparked interest in overcoming the memory bottlenecks of known algorithms for building suffix trees. Our main contribution is a new algorithm for suffix tree construction in which we choreograph almost all disk accesses to be via the sort and scan primitives. This algorithm achieves optimal results in a variety of sequential and parallel computational models. Two of our results are: In the traditional external memory model, in which only the number of disk accesses is counted, we achieve an optimal algorithm, both for single and multiple disk cases. This is the first optimal algorithm known for either model. Traditional disk page access counting does not differentiate between random page accesses and block transfers involving several consecutive pages. This difference is routinely exploited by expert programmers to get fast algorithms on real machines. We adopt a simple accounting scheme and show that our algorithm achieves the same optimal tradeoff for block versus random page accesses as the one we establish for sorting.
Large-Scale Sorting in Uniform Memory Hierarchies
, 1992
"... We present several efficient algorithms for sorting on the uniform memory hierarchy (UMH), introduced by Alpern, Carter, and Feig, and its parallelization P-UMH. We give optimal and nearly-optimal algorithms for a wide range of bandwidth degradations, including a parsimonious algorithm for constant ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
We present several efficient algorithms for sorting on the uniform memory hierarchy (UMH), introduced by Alpern, Carter, and Feig, and its parallelization P-UMH. We give optimal and nearly-optimal algorithms for a wide range of bandwidth degradations, including a parsimonious algorithm for constant bandwidth. We also develop optimal sorting algorithms for all bandwidths for other versions of UMH and P-UMH, including natural restrictions we introduce called RUMH and P-RUMH, which more closely correspond to current programming languages.
Models and Resource Metrics for Parallel and Distributed Computation
- PROC. 28TH ANNUAL HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES
, 1989
"... This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the different reso ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the different resource metrics chosen by different parallel models, categorizing the models into four classes: the basic synchronous models, and extensions of the basic models which more accurately reflect practical machines by incorporating notions of asynchrony, communication cost and memory hierarchy. We then present a new parallel computation model, the LogP-HMM model, as an illustration of design principles based on the framework of resource metrics. The LogP-HMM model extends an existing parameterized network model (LogP) with a sequential hierarchical memory model (HMM) characterizing each processor. The result accurately captures both network communication costs and the effects of multileveled memory ...
Optimal Parallel Sorting in Multi-Level Storage
- IN PROCEEDINGS OF THE 5TH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS
, 1994
"... We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multi-level storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n records on an n-processor hypercubic network. In that context, it is not known whether ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
We adapt the Sharesort algorithm of Cypher and Plaxton to run on various parallel models of multi-level storage, and analyze its resulting performance. Sharesort was originally defined in the context of sorting n records on an n-processor hypercubic network. In that context, it is not known whether Sharesort is asymptotically optimal. Nonetheless, we find that Sharesort achieves optimal time bounds for parallel sorting in multi-level storage, under a variety of models that have been defined in the literature.

