Results 1 - 10
of
26
Special Purpose Parallel Computing
- Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract
-
Cited by 77 (5 self)
- Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Optimal Communication Algorithms On Star Graphs Using Spanning Tree Constructions
- Journal of Parallel and Distributed Computing
, 1993
"... In this paper we consider three fundamental communicationproblems on the star interconnection network: the problem of simultaneous broadcasting of the same message from every node to all other nodes, or multinode broadcasting, the problem of a single node sending distinct messages to each one of th ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
In this paper we consider three fundamental communicationproblems on the star interconnection network: the problem of simultaneous broadcasting of the same message from every node to all other nodes, or multinode broadcasting, the problem of a single node sending distinct messages to each one of the other nodes, or single node scattering, and finally the problem of each node sending distinct messages to every other node, or total exchange. All of these problems are studied under two different assumptions: the assumption that each node can transmit a message of fixed length to one of its neighbors and simultaneously it can receive a message of fixed length from one of its neighbors (not necessarily the same one) at each time step, or single link availability (SLA), and the assumption that each node can exchange messages of fixed length with all of its neighbors at each time step, or multiple link availability (MLA). In both cases communication is assumed to be bidirectional. The cases ...
A Linear-Processor Polylog-Time Algorithm for Shortest Paths in Planar Graphs
, 1993
"... We give an algorithm requiring polylog time and a linear number of processors to solve singlesource shortest paths in directed planar graphs, bounded-genus graphs, and 2-dimensional overlap graphs. More generally, the algorithm works for any graph provided with a decomposition tree constructed using ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
We give an algorithm requiring polylog time and a linear number of processors to solve singlesource shortest paths in directed planar graphs, bounded-genus graphs, and 2-dimensional overlap graphs. More generally, the algorithm works for any graph provided with a decomposition tree constructed using size-O( p n polylog n) separators.
Wafer-Scale Integration of Systolic Arrays
- IEEE Transactions on Computers
, 1985
"... Abstract-VLSI technologists are fast developing wafer-scale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind wafer-scale integration is to assemble an entire system (or network of chips) on a single wafer, thus avoiding the costs and performance lo ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Abstract-VLSI technologists are fast developing wafer-scale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind wafer-scale integration is to assemble an entire system (or network of chips) on a single wafer, thus avoiding the costs and performance loss associated with individual packaging of chips. A major problem with assembling a large system of microprocessors on a single wafer, however, is that some of the processors, or cells, on the wafer are likely to be defective. In the paper, we describe practical procedures for integrating "around " such faults. The procedures are designed to minimize the length of the longest wire in the system, thus minimizing the communication time between cells. Although the underlying network problems are NP-complete, we prove that the procedures are reliable by assuming a probabilistic model of cell failure. We also discuss applications of the work to problems in VLSI layout theory, graph theory, fault-tolerant systems, planar geometry, and the probabilistic analysis of algorithms. Index Terms- Channel width, fault-tolerant systems, matching, probabilistic analysis, spanning tree, systolic arrays, traveling salesman problem, tree of meshes, VLSI, wafer-scale integration, wire length. I.
Locally connected VLSI architectures for the Viterbi algorithm
- IEEE Journal on Selected Areas in Communications
, 1988
"... Abstract-The Viterbi algorithm is a well-established technique for channel and source decoding in high performance digital communica-tion systems. Implementations of the Viterbi algorithm on three types of locally connected processor arrays are described. This restriction is motivated by the fact th ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract-The Viterbi algorithm is a well-established technique for channel and source decoding in high performance digital communica-tion systems. Implementations of the Viterbi algorithm on three types of locally connected processor arrays are described. This restriction is motivated by the fact that both the cost and performance metrics of VLSI favor architectures in which on-chip interprocessor communi-cation is localized. Each of the structures presented can accommodate arbitrary alphabet sizes and algorithm memory lengths. The relative performance tradeoff s available to the designer are discussed in the context of previous work. I.
Edge-Disjoint Spanning Trees On The Star Network With Applications To Fault Tolerance
- IEEE Trans. Computers
, 1993
"... Data communication and fault tolerance are important issues in multiprocessor systems. One way to achieve fault tolerant communication is by exploiting and effectively utilizing the disjoint paths that exist between pairs of source, destination nodes. In this paper we construct a structure, called t ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Data communication and fault tolerance are important issues in multiprocessor systems. One way to achieve fault tolerant communication is by exploiting and effectively utilizing the disjoint paths that exist between pairs of source, destination nodes. In this paper we construct a structure, called the multiple edgedisjoint spanning trees, on the star network, denoted by Sn . This is used for the derivation of an optimal single node broadcasting algorithm, which offers a speed up of n \Gamma 1 compared to the straightforward single node broadcasting algorithm that uses a single breadth first spanning tree. It is also used for the derivation of fault tolerant communication algorithms. As a result, fault tolerant algorithms are presented for four basic communication problems: the problem of a single node sending the same message to all other nodes or single node broadcasting, the problem of simultaneous single node broadcasting from all nodes or multinode broadcasting, the problem of a s...
Embedding Mesh of Trees in the Hypercube
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1991
"... Embedding one architecture in another is useful in providing architectural abstractions between different topologies. Through such embeddings the algorithms originally developed for one architecture can he directly mapped to another architecture. This paper describes methods for embedding one-, two- ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Embedding one architecture in another is useful in providing architectural abstractions between different topologies. Through such embeddings the algorithms originally developed for one architecture can he directly mapped to another architecture. This paper describes methods for embedding one-, two-, and three-dimensional mesh of trees in the hypercube. Similar methods may be used for embedding higher-dimensional mesh of trees. This embedding has significant practical importance in enhancing the capabilities of the hypercuhe since mesh of trees enable extremely fast parallel computation.
Optimal Communication Primitives On The Generalized Hypercube Network
- Journal of Parallel and Distributed Computing
, 1994
"... Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on the generalized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spann ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on the generalized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spanning graphs with special properties to fit various communication needs, are constructed on the network. The importance of these spanning graphs is demonstrated with the development of optimal algorithms for four fundamental communication problems, namely, the single node and multinode broadcasting and the single node and multinode scattering, on the generalized hypercube network. Broadcasting is the distribution of the same group of messages from a source processor to all other processors, and scattering is the distribution of distinct groups of messages from a source processor to each other processor. We consider broadcasting and scattering from a single processor of the network (single nod...
Optimal Communication Channel Utilization for Matrix Transposition and Related Permutations on Binary Cubes
- Discrete Applied Mathematics
, 1992
"... We present optimal schedules for permutations in which each node sends one or several unique messages to every other node. With concurrent communication on all channels of every node in binary cube networks, the number of element transfers in sequence for K elements per node is K 2 , irrespective ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We present optimal schedules for permutations in which each node sends one or several unique messages to every other node. With concurrent communication on all channels of every node in binary cube networks, the number of element transfers in sequence for K elements per node is K 2 , irrespective of the number of nodes over which the data set is distributed. For a succession of s permutations within disjoint subcubes of d dimensions each, our schedules yield min( K 2 + (s \Gamma 1)d; (s + 3)d; K 2 + 2d) exchanges in sequence. The algorithms can be organized to avoid indirect addressing in the internode data exchanges, a property that increases the performance on some architectures. For message passing communication libraries, we present a blocking procedure that minimizes the number of block transfers while preserving the utilization of the communication channels. For schedules with optimal channel utilization, the number of block transfers for a binary d-cube is d. The maximum ...

