Results 1  10
of
11
BestEffort Cache Synchronization with Source Cooperation
 IN SIGMOD
, 2002
"... In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (outofdate) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by sele ..."
Abstract

Cited by 65 (3 self)
 Add to MetaCart
In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (outofdate) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by selectively refreshing modified objects. We call the online process of selecting which objects to refresh in order to minimize divergence besteffort synchronization. In most approaches to besteffort synchronization, the cache coordinates the process and selects objects to refresh. In this paper, we propose a besteffort synchronization scheduling policy that exploits cooperation between data sources and the cache. We also propose an implementation of our policy that incurs low communication overhead even in environments with very large numbers of sources. Our algorithm is adaptive to wide fluctuations in available resources and data update rates. Through experimental simulation over synthetic and realworld data, we demonstrate the effectiveness of our algorithm, and we quantify the significant decrease in divergence achievable with source cooperation.
A Parallelization of Dijkstra's Shortest Path Algorithm
 IN PROC. 23RD MFCS'98, LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... The single source shortest path (SSSP) problem lacks parallel solutions which are fast and simultaneously workefficient. We propose simple criteria which divide Dijkstra's sequential SSSP algorithm into a number of phases, such that the operations within a phase can be done in parallel. We giv ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
The single source shortest path (SSSP) problem lacks parallel solutions which are fast and simultaneously workefficient. We propose simple criteria which divide Dijkstra's sequential SSSP algorithm into a number of phases, such that the operations within a phase can be done in parallel. We give a PRAM algorithm based on these criteria and analyze its performance on random digraphs with random edge weights uniformly distributed in [0, 1]. We use
Parallelizing NPComplete Problems Using Tree Shaped Computations
, 1999
"... We explain how the parallelization aspects of a large class of applications can be modeled as tree shaped computations. This model is particularly suited for NPcomplete problems. One reason for this is that any computation on a nondeterministic machine can be emulated on a deterministic machine ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We explain how the parallelization aspects of a large class of applications can be modeled as tree shaped computations. This model is particularly suited for NPcomplete problems. One reason for this is that any computation on a nondeterministic machine can be emulated on a deterministic machine using a tree shaped computation. We then proceed to a particular example, the knapsack problem It turns out that a parallel depth first branchandbound algorithm based on tree shaped computations yields superlinear average speedup using 1024 processors.
BestEffort Cache Synchronization with Source Cooperation ∗
"... In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (outofdate) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by selecti ..."
Abstract
 Add to MetaCart
In environments where exact synchronization between source data objects and cached copies is not achievable due to bandwidth or other resource constraints, stale (outofdate) copies are permitted. It is desirable to minimize the overall divergence between source objects and cached copies by selectively refreshing modified objects. We call the online process of selecting which objects to refresh in order to minimize divergence besteffort synchronization. In most approaches to besteffort synchronization, the cache coordinates the process and selects objects to refresh. In this paper, we propose a besteffort synchronization scheduling policy that exploits cooperation between data sources and the cache. We also propose an implementation of our policy that incurs low communication overhead even in environments with very large numbers of sources. Our algorithm is adaptive to wide fluctuations in available resources and data update rates. Through experimental simulation over synthetic and realworld data, we demonstrate the effectiveness of our algorithm, and we quantify the significant decrease in divergence achievable with source cooperation. 1
A Survey on Parallel Algorithms for Priority Queue Operations
"... Parallel Priority Queue (PPQ) data structure supports parallel operations for manipulating data items with keys, such as inserting n new items, deleting n items with the smallest keys, creating a new PPQ that contains a set of items, and melding tow PPQ into one. In this article, we present some rec ..."
Abstract
 Add to MetaCart
Parallel Priority Queue (PPQ) data structure supports parallel operations for manipulating data items with keys, such as inserting n new items, deleting n items with the smallest keys, creating a new PPQ that contains a set of items, and melding tow PPQ into one. In this article, we present some recent research works on PPQ which support the simultaneous operations of the k smallest elements, k being a constant.
Priority Queues and Sorting Methods for Parallel Simulation
, 2000
"... We examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on di ..."
Abstract
 Add to MetaCart
We examine the design, implementation, and experimental analysis of parallel priority queues for device and network simulation. We consider: a) distributed splay trees using MPI, b) concurrent heaps using shared memory atomic locks, and c) a new, more general concurrent data structure based on distributed sorted lists, which is designed to provide dynamically balanced work allocation (with automatic or manual control) and efficient use of shared memory resources. We evaluate performance for all three data structures on a CrayT3E900 system at KFAJulich. Our comparisons are based on simulations of single buffers and a 64 \Theta 64 packet switch which supports multicasting. In all implementations, PEs monitor traffic at their preassigned input/output ports, while priority queue elements are distributed across the CrayT3E virtual shared memory. Our experiments with up to 60,000 packets and 2 to 64 PEs indicate that concurrent priority queues perform much better than distributed ones. Both concurrent implementations have comparable performance, while our new data structure uses less memory and has been further optimized. We also consider parallel simulation for symmetric networks by sorting integer conflict functions and implementing an interesting packet indexing scheme.
Invasive Computing—An Overview
"... Abstract A novel paradigm for designing and programming future parallel computing systems called invasive computing is proposed. The main idea and novelty of invasive computing is to introduce resourceaware programming support in the sense that a given program gets the ability to explore and dynami ..."
Abstract
 Add to MetaCart
Abstract A novel paradigm for designing and programming future parallel computing systems called invasive computing is proposed. The main idea and novelty of invasive computing is to introduce resourceaware programming support in the sense that a given program gets the ability to explore and dynamically spread its computations to neighbour processors in a phase called invasion, then to execute portions of code of high parallelism degree in parallel based on the available invasible region on a given multiprocessor architecture. Afterwards, once the program terminates or if the degree of parallelism should be lower again, the program may enter a retreat phase, deallocate resources and resume execution again, for example, sequentially on a single processor. In order to support this idea of selfadaptive and resourceaware programming, not only new programming concepts, languages, compilers and operating systems are necessary but also revolutionary architectural changes in the design of MPSoCs (MultiProcessor SystemsonaChip) must be provided so to efficiently support invasion, infection and retreat operations involving concepts for dynamic processor, interconnect and memory reconfiguration. This
Parallel and . . . Probabilistic Reasoning
, 2012
"... Scalable probabilistic reasoning is the key to unlocking the full potential of the age of big data. From untangling the biological processes that govern cancer to effectively targeting products and advertisements, probabilistic reasoning is how we make sense of noisy data and turn information into u ..."
Abstract
 Add to MetaCart
Scalable probabilistic reasoning is the key to unlocking the full potential of the age of big data. From untangling the biological processes that govern cancer to effectively targeting products and advertisements, probabilistic reasoning is how we make sense of noisy data and turn information into understanding and action. Unfortunately, the algorithms and tools for sophisticated structured probabilistic reasoning were developed for the sequential Von Neumann architecture and have therefore been unable to scale with big data. In this thesis we propose a simple set of design principles to guide the development of new parallel and distributed algorithms and systems for scalable probabilistic reasoning. We then apply these design principles to develop a series of new algorithms for inference in probabilistic graphical models and derive theoretical tools to characterize the parallel properties of statistical inference. We implement and assess the efficiency and scalability of the new inference algorithms in the multicore and distributed settings demonstrating the substantial gains from applying the thesis methodology to realworld probabilistic reasoning. Based on the lessons learned in statistical inference we introduce the GraphLab parallel abstraction which generalizes the thesis methodology and enable the rapid development of
and
"... This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors – machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numeric ..."
Abstract
 Add to MetaCart
This paper addresses the problem of designing scalable concurrent priority queues for large scale multiprocessors – machines with up to several hundred processors. Priority queues are fundamental in the design of modern multiprocessor algorithms, with many classical applications ranging from numerical algorithms through discrete event simulation and expert systems. While highly scalable approaches have been introduced for the special case of queues with a fixed set of priorities, the most efficient designs for the general case are based on the parallelization of the heap data structure. Though numerous intricate heapbased schemes have been suggested in the literature, their scalability seems to be limited to small machines in the range of ten to twenty processors. This paper proposes an alternative approach: to base the design of concurrent priority queues on the probabilistic skiplist data structure, rather than on a heap. To this end, we show that a concurrent skiplist structure, following a simple set of modifications, provides a concurrent priority queue with a higher level of parallelism and significantly less contention than the fastest known heapbased algorithms. Our initial empirical evidence, collected on a simulated 256 node shared memory multiprocessor architecture similar to the MIT Alewife, suggests that the new skiplist based priority queue algorithm scales significantly better than heap based schemes throughout most of the concurrency range. With 256 processors, they are about 3 times faster in performing deletions and up to 10 times faster in performing insertions.