Results 11  20
of
29
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems
 In Proc. Symp. on Architecture for Networking and Communications Systems (ANCS05
, 2005
"... Hash tables provide efficient table implementations, achieving O(1), query, insert and delete operations at low loads. However, at moderate or high loads collisions are quite frequent, resulting in decreased performance. In this paper, we propose the segmented hash table architecture, which ensures ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Hash tables provide efficient table implementations, achieving O(1), query, insert and delete operations at low loads. However, at moderate or high loads collisions are quite frequent, resulting in decreased performance. In this paper, we propose the segmented hash table architecture, which ensures constant time hash operations at high loads with high probability. To achieve this, the hash memory is divided into N logical segments so that each incoming key has N potential storage locations; the destination segment is chosen so as to minimize collisions. In this way, collisions, and the associated probe sequences, are dramatically reduced. In order to keep memory utilization minimized, probabilistic filters are kept onchip to allow the N segments to be accessed without increasing the number of offchip memory operations. These filters are kept small and accurate with the help of a novel algorithm, called selective filter insertion, which keeps the segments balanced while minimizing false positive rates (i.e., incorrect filter predictions). The performance of our scheme is quantified via analytical modeling and software simulations. Moreover, we discuss efficient implementations that are easily realizable in modern device technologies. The performance benefits are significant: average search cost is reduced by 40 % or more, while the likelihood of requiring more than one memory operation per search is reduced by several orders of magnitude.
Fast FaultTolerant Concurrent Access to Shared Objects
, 1996
"... We consider a synchronous model of distributed computation in which n nodes communicate via pointtopoint messages, subject to the following constraints: (i) in a single "step", a node can only send or receive O(log n) words, and (ii) communication is unreliable in that a constant fraction of all me ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
We consider a synchronous model of distributed computation in which n nodes communicate via pointtopoint messages, subject to the following constraints: (i) in a single "step", a node can only send or receive O(log n) words, and (ii) communication is unreliable in that a constant fraction of all messages are lost at each step due to node and/or link failures. We design and analyze a simple local protocol for providing fast concurrent access to shared objects in this faulty network environment. In our protocol, clients use a hashingbased method to access shared objects. When a large number of clients attempt to read a given object at the same time, the object is rapidly replicated to an appropriate number of servers. Once the necessary level of replication has been achieved, each remaining request for the object is serviced within O(1) expected steps. Our protocol has practical potential for supporting high levels of concurrency in distributed file systems over wide area networks.
Contention Resolution in Hashing Based Shared Memory Simulations
"... In this paper we study the problem of simulating shared memory on the Distributed Memory Machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. Thus the main problem is to design strategies that resolve cont ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
In this paper we study the problem of simulating shared memory on the Distributed Memory Machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. Thus the main problem is to design strategies that resolve contention at the memory modules. Developing ideas from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. Particularly, we show that an nprocessor CRCW PRAM can be simulated by an nprocessor DMM with delay O(log log log n log n), with high probability. Next we show a general technique that can be used to turn these simulations to timeprocessor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a timeprocessor optimal simulation of an (n log log log n log n)processor EREW PRAM on an nprocessor DMM with O(log log log n log n) delay. When a CRCW PRAM with (n...
Improved Optimal Shared Memory Simulations, and the Power of Reconfiguration
 In Proceedings of the 3rd Israel Symposium on Theory of Computing and Systems
"... We present timeprocessor optimal randomized algorithms for simulating a shared memory machine (EREW PRAM) on a distributed memory machine (DMM). The first algorithm simulates each step of an nprocessor EREW PRAM on an nprocessor DMM with O( log log n log log log n ) delay with high probability. ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
We present timeprocessor optimal randomized algorithms for simulating a shared memory machine (EREW PRAM) on a distributed memory machine (DMM). The first algorithm simulates each step of an nprocessor EREW PRAM on an nprocessor DMM with O( log log n log log log n ) delay with high probability. This simulation is work optimal and can be made timeprocessor optimal. The best previous optimal simulations require O(log log n) delay. We also study reconfigurable DMMs which are a "complete network version " of the well studied reconfigurable meshes. We show an algorithm that simulates each step of an n processor EREW PRAM on an nprocessor reconfigurable DMM with only O(log n) delay with high probability. We further show how to make this simulation timeprocessor optimal. 1 Introduction Parallel machines that communicate via a shared memory (Parallel Random Access Machines, PRAMs) are the most commonly used machine model for describing parallel algorithms [J92]. The PRAM is relative...
Applying Randomized Edge Coloring Algorithms to Distributed Communication: An Experimental Study
, 1995
"... We propose a parameterized, randomized edge coloring algorithm for use in coordinating data transfers in fully connected distributed architectures such as parallel 1/0 subsystems and multimedia information systems. Our approach is to preschedule 1/0 requests to eliminate contention for 1/0 ports whi ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We propose a parameterized, randomized edge coloring algorithm for use in coordinating data transfers in fully connected distributed architectures such as parallel 1/0 subsystems and multimedia information systems. Our approach is to preschedule 1/0 requests to eliminate contention for 1/0 ports while maintaining an efficient use of bandwidth. Request scheduling is equivalent to edge coloring a bipartite graph representing pending 1/0 requests. Although efficient optimal algorithms exist for centralized edge coloring where the global request graph is known, in distributed architectures heuristics must be used. We propose such heuristics and use experimental analysis to determine their ability to approach the centralized optimal. The performance of our algorithms is also compared with the work of other researchers experimentally. Our results show that our algorithms produce schedules within 5 % of the optimal schedule, a substantial improvement over existing algorithms. The use of experimental analysis allows us to evaluate the appropriateness of each heuristic for a variety of different architectural models and applications.
Simulating shared memory in real time: On the computation power of reconfigurable meshes
 in ``Proceedings of the 2nd IEEE Workshop on Reconfigurable Architectures
, 1995
"... We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reco ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reconfigurable interconnection. It guarantees delay O(log *n), with high probability. Next we study a reconfigurable mesh DMM (RMDMM). Here the n processors and n modules are connected via an n_n reconfigurable mesh. It was already known that an n_m reconfigurable mesh can simulate in constant time an nprocessor CRCW PRAM with shared memory of size m. In this paper we present a randomized step by step simulation of a CRCW PRAM with arbitrarily large shared memory on an RMDMM. It guarantees constant delay with high probability, i.e., it simulates in real time. Finally we prove a lower bound showing that size 0(n 2) for the reconfigurable mesh is necessary for real time simulations.] 1997 Academic Press * Supported by DFGGraduiertenkolleg ``Parallele Rechnernetzwerke in der Produktionstechnik,''
Contention Resolution with Bounded Delay
 In Proc. FOCS'95, IEEE Computer
"... When distributed processes contend for a shared resource, we need a good distributed contention resolution protocol, e.g., for multipleaccess channels (ALOHA, Ethernet), PRAM emulation, and optical routing. Under a stochastic model of request generation from n synchronous processes, Raghavan & Upfa ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
When distributed processes contend for a shared resource, we need a good distributed contention resolution protocol, e.g., for multipleaccess channels (ALOHA, Ethernet), PRAM emulation, and optical routing. Under a stochastic model of request generation from n synchronous processes, Raghavan & Upfal have shown a protocol which is stable for a positive request rate; their main result is that for every resource request, its expected delay (time to get serviced) is O(log n). Assuming that the initial clock times of the processes are within a known bound of each other, we present a stable protocol, wherein the expected delay for each request is O(1). We derive this by showing an analogous result for an infinite number of processes, assuming that all processes agree on the time. 1 Introduction In scenarios where a set of distributed processes have a single shared resource that can service at most one process per time slot, the main problem is devising a "good" distributed protocol for re...
ERCW PRAMs and Optical Communication
 in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because o ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fanin, bounded fanout (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 9023059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking
"... Abstract—Hash tables are extensively used in networking to implement datastructures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract—Hash tables are extensively used in networking to implement datastructures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, but also induces nondeterminism in the performance. Due to this nondeterminism, the performance of these hash tables degrades sharply in the multithreaded network processor based environments, where a collection of threads perform the hashing operations in a loosely synchronized manner. In such systems, it is critical to keep the hash operations more deterministic. A recent series of papers have been proposed, which employs a compact onchip memory to enable deterministic and fast hash queries. While effective, these schemes require substantial onchip memory, roughly 10bits for every entry in the hash table. This limits their general usability; specifically in the network processor context, where onchip resources are scarce. In this paper, we propose a novel hash table construction called Peacock hash, which reduces the onchip memory by more than 10folds while keeping a high degree of determinism in performance. This significantly reduced onchip memory not only makes Peacock hashing much more appealing for the general use but also makes it an attractive choice for the implementation of a hash hardware accelerator on a network processor.
Design and Analysis of Dynamic Processes: A Stochastic Approach
 ESA’1998, LNCS 1461
"... Abstract. Past research in theoretical computer science has focused mainly on static computation problems, where the input is known before the start of the computation and the goal is to minimize the number of steps till termination with a correct output. Many important processes in today’s computin ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Past research in theoretical computer science has focused mainly on static computation problems, where the input is known before the start of the computation and the goal is to minimize the number of steps till termination with a correct output. Many important processes in today’s computing are dynamic processes, whereby input is continuously injected to the system, and the algorithm is measured by its long term, steady state, performance. Examples of dynamic processes include communication protocols, memory management tools, and time sharing policies. Our goal is to develop new tools for the design and analyzing the performance of dynamic processes, in particular through modeling the dynamic process as an infinite stochastic processes. 1