Results 11 - 20
of
29
Segmented Hash: An Efficient Hash Table Implementation for High Performance Networking Subsystems
- In Proc. Symp. on Architecture for Networking and Communications Systems (ANCS05
, 2005
"... Hash tables provide efficient table implementations, achieving O(1), query, insert and delete operations at low loads. However, at moderate or high loads collisions are quite frequent, resulting in decreased performance. In this paper, we propose the segmented hash table architecture, which ensures ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
Hash tables provide efficient table implementations, achieving O(1), query, insert and delete operations at low loads. However, at moderate or high loads collisions are quite frequent, resulting in decreased performance. In this paper, we propose the segmented hash table architecture, which ensures constant time hash operations at high loads with high probability. To achieve this, the hash memory is divided into N logical segments so that each incoming key has N potential storage locations; the destination segment is chosen so as to minimize collisions. In this way, collisions, and the associated probe sequences, are dramatically reduced. In order to keep memory utilization minimized, probabilistic filters are kept on-chip to allow the N segments to be accessed without in-creasing the number of off-chip memory operations. These filters are kept small and accurate with the help of a novel algorithm, called selective filter insertion, which keeps the segments balanced while minimizing false positive rates (i.e., incorrect filter predictions). The performance of our scheme is quantified via analytical modeling and software simulations. Moreover, we discuss efficient implementations that are easily realizable in modern device technologies. The performance benefits are significant: average search cost is reduced by 40 % or more, while the likelihood of requiring more than one memory operation per search is reduced by several orders of magnitude.
Contention Resolution in Hashing Based Shared Memory Simulations
"... In this paper we study the problem of simulating shared memory on the Distributed Memory Machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. Thus the main problem is to design strategies that resolve cont ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
In this paper we study the problem of simulating shared memory on the Distributed Memory Machine (DMM). Our approach uses multiple copies of shared memory cells, distributed among the memory modules of the DMM via universal hashing. Thus the main problem is to design strategies that resolve contention at the memory modules. Developing ideas from random graphs and very fast randomized algorithms, we present new simulation techniques that enable us to improve the previously best results exponentially. Particularly, we show that an n-processor CRCW PRAM can be simulated by an n-processor DMM with delay O(log log log n log n), with high probability. Next we show a general technique that can be used to turn these simulations to time-processor optimal ones, in the case of EREW PRAMs to be simulated. We obtain a time-processor optimal simulation of an (n log log log n log n)-processor EREW PRAM on an n-processor DMM with O(log log log n log n) delay. When a CRCW PRAM with (n...
Fast Fault-Tolerant Concurrent Access to Shared Objects
, 1996
"... We consider a synchronous model of distributed computation in which n nodes communicate via point-topoint messages, subject to the following constraints: (i) in a single "step", a node can only send or receive O(log n) words, and (ii) communication is unreliable in that a constant fraction of all me ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We consider a synchronous model of distributed computation in which n nodes communicate via point-topoint messages, subject to the following constraints: (i) in a single "step", a node can only send or receive O(log n) words, and (ii) communication is unreliable in that a constant fraction of all messages are lost at each step due to node and/or link failures. We design and analyze a simple local protocol for providing fast concurrent access to shared objects in this faulty network environment. In our protocol, clients use a hashing-based method to access shared objects. When a large number of clients attempt to read a given object at the same time, the object is rapidly replicated to an appropriate number of servers. Once the necessary level of replication has been achieved, each remaining request for the object is serviced within O(1) expected steps. Our protocol has practical potential for supporting high levels of concurrency in distributed file systems over wide area networks.
Improved Optimal Shared Memory Simulations, and the Power of Reconfiguration
- In Proceedings of the 3rd Israel Symposium on Theory of Computing and Systems
"... We present time-processor optimal randomized algorithms for simulating a shared memory machine (EREW PRAM) on a distributed memory machine (DMM). The first algorithm simulates each step of an n-processor EREW PRAM on an n-processor DMM with O( log log n log log log n ) delay with high probability. ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
We present time-processor optimal randomized algorithms for simulating a shared memory machine (EREW PRAM) on a distributed memory machine (DMM). The first algorithm simulates each step of an n-processor EREW PRAM on an n-processor DMM with O( log log n log log log n ) delay with high probability. This simulation is work optimal and can be made timeprocessor optimal. The best previous optimal simulations require O(log log n) delay. We also study reconfigurable DMMs which are a "complete network version " of the well studied reconfigurable meshes. We show an algorithm that simulates each step of an n- processor EREW PRAM on an n-processor reconfigurable DMM with only O(log n) delay with high probability. We further show how to make this simulation time-processor optimal. 1 Introduction Parallel machines that communicate via a shared memory (Parallel Random Access Machines, PRAMs) are the most commonly used machine model for describing parallel algorithms [J92]. The PRAM is relative...
Simulating shared memory in real time: On the computation power of reconfigurable meshes
- in ``Proceedings of the 2nd IEEE Workshop on Reconfigurable Architectures
, 1995
"... We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reco ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We consider randomized simulations of shared memory on a distributed memory machine (DMM) where the n processors and the n memory modules of the DMM are connected via a reconfigurable architecture. We first present a randomized simulation of a CRCW PRAM on a reconfigurable DMM having a complete reconfigurable interconnection. It guarantees delay O(log *n), with high probability. Next we study a reconfigurable mesh DMM (RM-DMM). Here the n processors and n modules are connected via an n_n reconfigurable mesh. It was already known that an n_m reconfigurable mesh can simulate in constant time an n-processor CRCW PRAM with shared memory of size m. In this paper we present a randomized step by step simulation of a CRCW PRAM with arbitrarily large shared memory on an RM-DMM. It guarantees constant delay with high probability, i.e., it simulates in real time. Finally we prove a lower bound showing that size 0(n 2) for the reconfigurable mesh is necessary for real time simulations.] 1997 Academic Press * Supported by DFG-Graduiertenkolleg ``Parallele Rechnernetzwerke in der Produktionstechnik,''
Contention Resolution with Bounded Delay
- In Proc. FOCS'95, IEEE Computer
"... When distributed processes contend for a shared resource, we need a good distributed contention resolution protocol, e.g., for multiple-access channels (ALOHA, Ethernet), PRAM emulation, and optical routing. Under a stochastic model of request generation from n synchronous processes, Raghavan & Upfa ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
When distributed processes contend for a shared resource, we need a good distributed contention resolution protocol, e.g., for multiple-access channels (ALOHA, Ethernet), PRAM emulation, and optical routing. Under a stochastic model of request generation from n synchronous processes, Raghavan & Upfal have shown a protocol which is stable for a positive request rate; their main result is that for every resource request, its expected delay (time to get serviced) is O(log n). Assuming that the initial clock times of the processes are within a known bound of each other, we present a stable protocol, wherein the expected delay for each request is O(1). We derive this by showing an analogous result for an infinite number of processes, assuming that all processes agree on the time. 1 Introduction In scenarios where a set of distributed processes have a single shared resource that can service at most one process per time slot, the main problem is devising a "good" distributed protocol for re...
Applying Randomized Edge Coloring Algorithms to Distributed Communication: An Experimental Study
, 1995
"... We propose a parameterized, randomized edge coloring algorithm for use in coordinating data transfers in fully connected distributed architectures such as parallel 1/0 subsystems and multimedia information systems. Our approach is to preschedule 1/0 requests to eliminate contention for 1/0 ports whi ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We propose a parameterized, randomized edge coloring algorithm for use in coordinating data transfers in fully connected distributed architectures such as parallel 1/0 subsystems and multimedia information systems. Our approach is to preschedule 1/0 requests to eliminate contention for 1/0 ports while maintaining an efficient use of bandwidth. Request scheduling is equivalent to edge coloring a bipartite graph representing pending 1/0 requests. Although efficient optimal algorithms exist for centralized edge coloring where the global request graph is known, in distributed architectures heuristics must be used. We propose such heuristics and use experimental analysis to determine their ability to approach the centralized optimal. The performance of our algorithms is also compared with the work of other researchers experimentally. Our results show that our algorithms produce schedules within 5 % of the optimal schedule, a substantial improvement over existing algorithms. The use of experimental analysis allows us to evaluate the appropriateness of each heuristic for a variety of different architectural models and applications.
ERCW PRAMs and Optical Communication
- in Proceedings of the European Conference on Parallel Processing, EUROPAR ’96
, 1996
"... This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fan-in, bounded fan-out (or `BFO') circuits. Our results for these two models are of importance because o ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This paper presents algorithms and lower bounds for several fundamental problems on the Exclusive Read, Concurrent Write Parallel Random Access Machine (ERCW PRAM) and some results for unbounded fan-in, bounded fan-out (or `BFO') circuits. Our results for these two models are of importance because of the close relationship of the ERCW model to the OCPC model, a model of parallel computing based on dynamically reconfigurable optical networks, and of BFO circuits to the OCPC model with limited dynamic reconfiguration ability. Topics: Parallel Algorithms, Theory of Parallel and Distributed Computing. This research was supported by Texas Advanced Research Projects Grant 003658480. (philmac@cs.utexas.edu) y This research was supported in part by Texas Advanced Research Projects Grants 003658480 and 003658386, and NSF Grant CCR 90-23059. (vlr@cs.utexas.edu) 1 Introduction In this paper we develop algorithms and lower bounds for fundamental problems on the Exclusive Read Concurrent Wri...
Peacock Hashing: Deterministic and Updatable Hashing for High Performance Networking
"... Abstract—Hash tables are extensively used in networking to implement data-structures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—Hash tables are extensively used in networking to implement data-structures that associate a set of keys to a set of values, as they provide O(1), query, insert and delete operations. However, at moderate or high loads collisions are quite frequent which not only increases the access time, but also induces nondeterminism in the performance. Due to this non-determinism, the performance of these hash tables degrades sharply in the multi-threaded network processor based environments, where a collection of threads perform the hashing operations in a loosely synchronized manner. In such systems, it is critical to keep the hash operations more deterministic. A recent series of papers have been proposed, which employs a compact on-chip memory to enable deterministic and fast hash queries. While effective, these schemes require substantial onchip memory, roughly 10-bits for every entry in the hash table. This limits their general usability; specifically in the network processor context, where on-chip resources are scarce. In this paper, we propose a novel hash table construction called Peacock hash, which reduces the on-chip memory by more than 10-folds while keeping a high degree of determinism in performance. This significantly reduced on-chip memory not only makes Peacock hashing much more appealing for the general use but also makes it an attractive choice for the implementation of a hash hardware accelerator on a network processor.
Design and Analysis of Dynamic Processes: A Stochastic Approach
- ESA’1998, LNCS 1461
"... Abstract. Past research in theoretical computer science has focused mainly on static computation problems, where the input is known before the start of the computation and the goal is to minimize the number of steps till termination with a correct output. Many important processes in today’s computin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Past research in theoretical computer science has focused mainly on static computation problems, where the input is known before the start of the computation and the goal is to minimize the number of steps till termination with a correct output. Many important processes in today’s computing are dynamic processes, whereby input is continuously injected to the system, and the algorithm is measured by its long term, steady state, performance. Examples of dynamic processes include communication protocols, memory management tools, and time sharing policies. Our goal is to develop new tools for the design and analyzing the performance of dynamic processes, in particular through modeling the dynamic process as an infinite stochastic processes. 1

