Results 1 -
9 of
9
On the Cost-Effectiveness of PRAMs
, 1991
"... We introduce a formalism which allows to treat computer architecture as a formal optimization problem. We apply this to the design of shared memory parallel machines. Present computers of this type support the programming model of a shared memory. But simultaneous access to the shared memory by seve ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
We introduce a formalism which allows to treat computer architecture as a formal optimization problem. We apply this to the design of shared memory parallel machines. Present computers of this type support the programming model of a shared memory. But simultaneous access to the shared memory by several processors is in many situations processed sequentially. Asymptotically good solutions for this problem are offered by theoretical computer science. We modify these constructions under engineering aspects and improve the price/performance ratio by roughly a factor of 6. The resulting machine has surprisingly good price/performance ratio even if compared with distributed memory machines. For almost all access patterns of all processors into the shared memory, access is as fast as the access of only a single processor. 1 Introduction Commercially available parallel machines can be classified as distributed memory machines or shared memory machines. Exchange of data between different proce...
On the Cost-Effectiveness and Realization of the Theoretical PRAM Model
- SONDERFORSCHUNGSBEREICH 124 VLSI ENTWURFSMETHODEN UND PARALLELITAT, UNIVERSITAT SAARBRUCKEN
, 1991
"... Todays parallel computers provide good support for problems that can be easily embedded on the machines' topologies with regular and sparse communication patterns. But they show poor performance on problems that do not satisfy these conditions. A general purpose parallel computer should guarantee go ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Todays parallel computers provide good support for problems that can be easily embedded on the machines' topologies with regular and sparse communication patterns. But they show poor performance on problems that do not satisfy these conditions. A general purpose parallel computer should guarantee good performance on most parallelizable problems and should allow users to program without special knowledge about the underlying architecture. Access to memory cells should be fast for local and non local cells and should not depend on the access pattern. A theoretical model that reaches this goal is the PRAM. But it was thought to be very expensive in terms of constant factors. Our goal is to show that the PRAM is a realistic approach for a general purpose architecture for any class of algorithms. To do that we sketch a measure of cost--effectiveness that allows to determine constant factors in costs and speed of machines. This measure is based on the price/performance ratio and can be compu...
How to Sort N items using a sorting network of fixed I/O size
, 1999
"... Sorting networks of a fixed I/O size p have been used, thus far, for sorting a set of p elements. Somewhat surprisingly, the important problem of using such a sorting network for sorting arbitrarily large data sets has not been addressed in the literature. Our main contribution is to propose a si ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Sorting networks of a fixed I/O size p have been used, thus far, for sorting a set of p elements. Somewhat surprisingly, the important problem of using such a sorting network for sorting arbitrarily large data sets has not been addressed in the literature. Our main contribution is to propose a simple sorting architecture whose main feature is the pipelined use of a sorting network of fixed I/O size p to sort an arbitrarily large data set of N elements. A noteworthy feature of our design is that no extra data memory space is required, other than what is used for storing the input. As it turns out, our architecture is feasible for VLSI implementation and its time performance is virtually independent of the cost and depth of the underlying sorting network. Specifically, we show that by using our design N elements can be sorted in ) time without memory access conflicts. Finally, we show how to use an AT -optimal sorting network of fixed I/O size p to construct a similar architecture that sorts N elements in Key Words: computer architecture, sorting, parallel processing, pipelined processing, sorting networks.
Matching nuts and bolts in O(n log n) time
, 1998
"... . Given a set of n nuts of distinct widths and a set of n bolts such that each nut corresponds to a unique bolt of the same width, how should we match every nut with its corresponding bolt by comparing nuts with bolts? (No comparison is all ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
.<F3.783e+05> Given a set of<F3.804e+05> n<F3.783e+05> nuts of distinct widths and a set of<F3.804e+05> n<F3.783e+05> bolts such that each nut corresponds to a unique bolt of the same width, how should we match every nut with its corresponding bolt by comparing nuts with bolts? (No comparison is allowed between two nuts or two bolts.) The problem can be naturally viewed as a variant of the classic sorting problem as follows. Given two lists of<F3.804e+05> n<F3.783e+05> numbers each such that one list is a permutation of the other, how should we sort the lists by comparisons only between numbers in di#erent lists? We give an<F3.804e+05><F3.783e+05><F3.804e+05> O(n<F3.783e+05> log<F3.804e+05><F3.783e+05> n)-time deterministic algorithm for the problem. This is optimal up to a constant factor and answers an open question posed by Alon et al.<F3.695e+05> [Proceedings of the<F3.783e+05><F3.695e+05> 5th Annual ACM-SIAM Symposium on Discrete<F3.783e+05> Algorithms, 1994, pp. 690--696]. Moreov...
Real-Time Emulations of Bounded-Degree Networks
- Information Processing Letters
, 1998
"... this paper, we survey the state of the art in real-time network emulations. In particular, we consider emulation schemes whereby a host network of one type can mimic, in a step-by-step fashion, any computation that can be performed by a guest network of another type. An emulation is called real-time ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
this paper, we survey the state of the art in real-time network emulations. In particular, we consider emulation schemes whereby a host network of one type can mimic, in a step-by-step fashion, any computation that can be performed by a guest network of another type. An emulation is called real-time if sizes of the guest and the host are equal, to within a constant factor, and the time required by the host and the time used by the guest are also equal, to within a constant factor. We restrict our attention in this paper to bounded-degree
An -size fault-tolerant sorting network
- In Proceedings of the 28th Annual ACM Symposium on the Theory of Computing
, 1996
"... Abstract This thesis studies sorting circuits, networks, and PRAM algorithms that are tolerant to faults. We consider both worst-case and random fault models, although we mainly focus on the more challenging problem of random faults. In the random fault model, the circuit, network, or algorithm is r ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract This thesis studies sorting circuits, networks, and PRAM algorithms that are tolerant to faults. We consider both worst-case and random fault models, although we mainly focus on the more challenging problem of random faults. In the random fault model, the circuit, network, or algorithm is required to sort all n-input permutations with probability at least 1 \Gamma 1n even if the result of each comparison is independently faulty with probability upper bounded by a fixed constant. In particular, ffl we construct a passive-fault-tolerant sorting circuit with O(n log n log log n) comparators, thereby answering an open question posed by Yao and Yao in 1985, ffl we construct a reversal-fault-tolerant sorting network with O(n loglog2 3 n) comparators, thereby answering an open question posed by Assaf and Upfal in 1990, ffl we design an optimal O(log n)-step O(n)-processor deterministic EREW PRAM fault-tolerant sorting algorithm, thereby answering an open question posed by Feige, Peleg, Raghavan, and Upfal in 1990, and ffl we prove a tight lower bound of \Omega (n log2 n) on the number of comparators needed for any destructive-fault-tolerant sorting or merging network, thereby answering an open question posed by Assaf and Upfal in 1990.
Sorting Omega Networks Simulated with P Systems: Optimal Data Layouts
"... Summary. The paper introduces some sorting networks and their simulation with P systems, in which each processor/membrane can hold more than one piece of data, and perform operations on them internally. Several data layouts are discussed in this context, and an optimal one is proposed, together with ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Summary. The paper introduces some sorting networks and their simulation with P systems, in which each processor/membrane can hold more than one piece of data, and perform operations on them internally. Several data layouts are discussed in this context, and an optimal one is proposed, together with its implementation as a P system with dynamic communication graphs. 1
Spiking Neural P Systems – A Natural Model for Sorting Networks
"... Summary. This paper proposes two simulations of sorting networks with spiking neural P systems. A comparison between different models is also made. 1 ..."
Abstract
- Add to MetaCart
Summary. This paper proposes two simulations of sorting networks with spiking neural P systems. A comparison between different models is also made. 1
Matching Nuts and Bolts Optimally
, 1995
"... The nuts and bolts problem is the following: Given a collection of n nuts of distinct sizes and n bolts of distinct sizes such that for each nut there is exactly one matching bolt, find for each nut its corresponding bolt subject to the restriction that we can only compare nuts to bolts. That is we ..."
Abstract
- Add to MetaCart
The nuts and bolts problem is the following: Given a collection of n nuts of distinct sizes and n bolts of distinct sizes such that for each nut there is exactly one matching bolt, find for each nut its corresponding bolt subject to the restriction that we can only compare nuts to bolts. That is we can neither compare nuts to nuts, nor bolts to bolts. This humble restriction on the comparisons appears to make this problem quite difficult to solve. In this paper, we illustrate the existence of an algorithm for solving the nuts and bolts problem that makes O(n lg n) nutand-bolt comparisons. We show the existence of this algorithm by showing the existence of certain expander-based comparator networks. Our algorithm is asymptotically optimal in terms of the number of nut-and-bolt comparisons it does. Another view of this result is that we show the existence of a decision tree with depth O(n lg n) that solves this problem.

