Results 1  10
of
30
Special Purpose Parallel Computing
 Lectures on Parallel Computation
, 1993
"... A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing ..."
Abstract

Cited by 77 (5 self)
 Add to MetaCart
A vast amount of work has been done in recent years on the design, analysis, implementation and verification of special purpose parallel computing systems. This paper presents a survey of various aspects of this work. A long, but by no means complete, bibliography is given. 1. Introduction Turing [365] demonstrated that, in principle, a single general purpose sequential machine could be designed which would be capable of efficiently performing any computation which could be performed by a special purpose sequential machine. The importance of this universality result for subsequent practical developments in computing cannot be overstated. It showed that, for a given computational problem, the additional efficiency advantages which could be gained by designing a special purpose sequential machine for that problem would not be great. Around 1944, von Neumann produced a proposal [66, 389] for a general purpose storedprogram sequential computer which captured the fundamental principles of...
Optimal Communication Algorithms On Star Graphs Using Spanning Tree Constructions
 Journal of Parallel and Distributed Computing
, 1993
"... In this paper we consider three fundamental communicationproblems on the star interconnection network: the problem of simultaneous broadcasting of the same message from every node to all other nodes, or multinode broadcasting, the problem of a single node sending distinct messages to each one of th ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
In this paper we consider three fundamental communicationproblems on the star interconnection network: the problem of simultaneous broadcasting of the same message from every node to all other nodes, or multinode broadcasting, the problem of a single node sending distinct messages to each one of the other nodes, or single node scattering, and finally the problem of each node sending distinct messages to every other node, or total exchange. All of these problems are studied under two different assumptions: the assumption that each node can transmit a message of fixed length to one of its neighbors and simultaneously it can receive a message of fixed length from one of its neighbors (not necessarily the same one) at each time step, or single link availability (SLA), and the assumption that each node can exchange messages of fixed length with all of its neighbors at each time step, or multiple link availability (MLA). In both cases communication is assumed to be bidirectional. The cases ...
WaferScale Integration of Systolic Arrays
 IEEE Transactions on Computers
, 1985
"... AbstractVLSI technologists are fast developing waferscale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind waferscale integration is to assemble an entire system (or network of chips) on a single wafer, thus avoiding the costs and performance lo ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
AbstractVLSI technologists are fast developing waferscale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind waferscale integration is to assemble an entire system (or network of chips) on a single wafer, thus avoiding the costs and performance loss associated with individual packaging of chips. A major problem with assembling a large system of microprocessors on a single wafer, however, is that some of the processors, or cells, on the wafer are likely to be defective. In the paper, we describe practical procedures for integrating "around " such faults. The procedures are designed to minimize the length of the longest wire in the system, thus minimizing the communication time between cells. Although the underlying network problems are NPcomplete, we prove that the procedures are reliable by assuming a probabilistic model of cell failure. We also discuss applications of the work to problems in VLSI layout theory, graph theory, faulttolerant systems, planar geometry, and the probabilistic analysis of algorithms. Index Terms Channel width, faulttolerant systems, matching, probabilistic analysis, spanning tree, systolic arrays, traveling salesman problem, tree of meshes, VLSI, waferscale integration, wire length. I.
A LinearProcessor PolylogTime Algorithm for Shortest Paths in Planar Graphs
, 1993
"... We give an algorithm requiring polylog time and a linear number of processors to solve singlesource shortest paths in directed planar graphs, boundedgenus graphs, and 2dimensional overlap graphs. More generally, the algorithm works for any graph provided with a decomposition tree constructed using ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
We give an algorithm requiring polylog time and a linear number of processors to solve singlesource shortest paths in directed planar graphs, boundedgenus graphs, and 2dimensional overlap graphs. More generally, the algorithm works for any graph provided with a decomposition tree constructed using sizeO( p n polylog n) separators.
Locally connected VLSI architectures for the Viterbi algorithm
 IEEE Journal on Selected Areas in Communications
, 1988
"... AbstractThe Viterbi algorithm is a wellestablished technique for channel and source decoding in high performance digital communication systems. Implementations of the Viterbi algorithm on three types of locally connected processor arrays are described. This restriction is motivated by the fact th ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
AbstractThe Viterbi algorithm is a wellestablished technique for channel and source decoding in high performance digital communication systems. Implementations of the Viterbi algorithm on three types of locally connected processor arrays are described. This restriction is motivated by the fact that both the cost and performance metrics of VLSI favor architectures in which onchip interprocessor communication is localized. Each of the structures presented can accommodate arbitrary alphabet sizes and algorithm memory lengths. The relative performance tradeoff s available to the designer are discussed in the context of previous work. I.
Optimal Communication Primitives On The Generalized Hypercube Network
 Journal of Parallel and Distributed Computing
, 1994
"... Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on the generalized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spann ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Efficient interprocessor communication is crucial to increasing the performance of parallel computers. In this paper, a special framework is developed on the generalized hypercube, a network that is currently receiving considerable attention. Using this framework as the basic tool, a number of spanning graphs with special properties to fit various communication needs, are constructed on the network. The importance of these spanning graphs is demonstrated with the development of optimal algorithms for four fundamental communication problems, namely, the single node and multinode broadcasting and the single node and multinode scattering, on the generalized hypercube network. Broadcasting is the distribution of the same group of messages from a source processor to all other processors, and scattering is the distribution of distinct groups of messages from a source processor to each other processor. We consider broadcasting and scattering from a single processor of the network (single nod...
Embedding Mesh of Trees in the Hypercube
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1991
"... Embedding one architecture in another is useful in providing architectural abstractions between different topologies. Through such embeddings the algorithms originally developed for one architecture can he directly mapped to another architecture. This paper describes methods for embedding one, two ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Embedding one architecture in another is useful in providing architectural abstractions between different topologies. Through such embeddings the algorithms originally developed for one architecture can he directly mapped to another architecture. This paper describes methods for embedding one, two, and threedimensional mesh of trees in the hypercube. Similar methods may be used for embedding higherdimensional mesh of trees. This embedding has significant practical importance in enhancing the capabilities of the hypercuhe since mesh of trees enable extremely fast parallel computation.
Edgedisjoint spanning trees on the star network with applications to fault tolerance
 IEEE Trans. Computers
, 1996
"... Data communication and fault tolerance are important issues in multiprocessor systems. One way to achieve fault tolerant communication isby exploiting and e ectively utilizing the disjoint paths that exist between pairs of source, destination nodes. In this paper we construct a structure, called the ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Data communication and fault tolerance are important issues in multiprocessor systems. One way to achieve fault tolerant communication isby exploiting and e ectively utilizing the disjoint paths that exist between pairs of source, destination nodes. In this paper we construct a structure, called the multiple edgedisjoint spanning trees, on the star network, denoted by Sn. This is used for the derivation of an optimal single node broadcasting algorithm, which o ers a speed up of n, 1 compared to the straightforward single node broadcasting algorithm that uses a single breadth rst spanning tree. It is also used for the derivation of fault tolerant communication algorithms. As a result, fault tolerant algorithms are presented for four basic communication problems: the problem of a single node sending the same message to all other nodes or single node broadcasting, the problem of simultaneous single node broadcasting from all nodes or multinode broadcasting, the problem of a single node sending distinct messages to each one of the other nodes or single node scattering and nally the problem of simultaneous single node scattering from all nodes or total exchange. Fault tolerance is achieved by sending multiple copies of the message through a number of disjoint paths. These algorithms operate successfully in the presence of up to n, 1 faulty nodes or edges in the system. They also o er the exibility of controlling the degree of fault tolerance, depending on how reliable the network is. As pointed out in [28], the importance of these algorithms lies in the fact that no knowledge of the faulty nodes or edges is required in advance. All of the algorithms presented make the assumption that each node can exchange messages of xed length with all of its neighbors simultaneously at each time step, i.e. the allport communication assumption, and that communication is bidirectional.