Results 1 
9 of
9
Performance Analysis of kary ncube Interconnection Networks
 IEEE Transactions on Computers
, 1988
"... VLSI communication networks are wire limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes communication networks of varying dimension under the assumption of constant wi ..."
Abstract

Cited by 296 (16 self)
 Add to MetaCart
VLSI communication networks are wire limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes communication networks of varying dimension under the assumption of constant wire bisection. Expressions for the latency, average case throughput, and hotspot throughput of kary n cube networks with constant bisection are derived that agree closely with experimental measurements. It is shown that lowdimensional networks (e.g., tori) have lower latency and higher hotspot throughput than highdimensional networks (e.g., binary ncubes) with the same bisection width. Keywords Communication networks, interconnection networks, concurrent computing, messagepassing multiprocessors, parallel processing, VLSI. 1 Introduction The critical component of a concurrent computer is its communication network. Many algorithms are communication rather than processing limited. Fi...
Implementation of a Portable Nested DataParallel Language
 Journal of Parallel and Distributed Computing
, 1994
"... This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel alg ..."
Abstract

Cited by 177 (26 self)
 Add to MetaCart
This paper gives an overview of the implementation of Nesl, a portable nested dataparallel language. This language and its implementation are the first to fully support nested data structures as well as nested dataparallel function calls. These features allow the concise description of parallel algorithms on irregular data, such as sparse matrices and graphs. In addition, they maintain the advantages of dataparallel languages: a simple programming model and portability. The current Nesl implementation is based on an intermediate language called Vcode and a library of vector routines called Cvl. It runs on the Connection Machine CM2, the Cray YMP C90, and serial machines. We compare initial benchmark results of Nesl with those of machinespecific code on these machines for three algorithms: leastsquares linefitting, median finding, and a sparsematrix vector product. These results show that Nesl's performance is competitive with that of machinespecific codes for regular dense da...
CollectionOriented Languages
 PROCEEDINGS OF THE IEEE
, 1991
"... Several programming languages arising from widely diverse practical and theoretical considerations share a common highlevel feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and th ..."
Abstract

Cited by 51 (5 self)
 Add to MetaCart
Several programming languages arising from widely diverse practical and theoretical considerations share a common highlevel feature: their basic data type is an aggregate of other more primitive data types and their primitive functions operate on these aggregates. Examples of such languages (and the collections they support) are FORTRAN 90 (arrays), APL (arrays), Connection Machine LISP (xectors), PARALATION LISP (paralations), and SETL (sets). Acting on large collections of data with a single operation is the hallmark of dataparallel programming and massively parallel computers. These languages  which we call collectionoriented  are thus ideal for use with massively parallel machines, even though many of them were developed before parallelism and associated considerations became important. This paper examines collections and the operations that can be performed on them in a languageindependent manner. It also critically reviews and compares a variety of collectionoriented languages...
Reconfiguration With Time Division Multiplexed MINs for Multiprocessor Communications
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, timedivision multiplexed multistage interconnection networks (TDMMINs) are proposed for multiprocessor communications. Connections required by an application are partitioned into a number of subsets called mappings, such that connections in each mapping can be established in a MI ..."
Abstract

Cited by 35 (29 self)
 Add to MetaCart
In this paper, timedivision multiplexed multistage interconnection networks (TDMMINs) are proposed for multiprocessor communications. Connections required by an application are partitioned into a number of subsets called mappings, such that connections in each mapping can be established in a MIN without conflict. Switch settings for establishing connections in each mapping are determined and stored in shift registers. By repeatedly changing switch settings, connections in each mapping are established for a time slot in a roundrobin fashion. Thus, all connections required by an application may be established in a MIN in a timedivision multiplexed way. TDMMINs can emulate a completely connected network using N time slots. It can also emulate regular networks such as rings, meshes, CubeConnectedCycles (CCC), binary trees and n dimensional hypercubes using 2, 4, 3, 4 and n time slots, respectively. The problem of partitioning an arbitrary set of requests into a minimal ...
An Algebraic Theory for Modeling Multistage Interconnection Networks
 Journal of Information Science and Engineering
, 1993
"... We use an algebraic theory based on tensor products to model multistage interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on sharedmemory vector multiprocessors. In this paper, we focus on the modeling of multistage int ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
We use an algebraic theory based on tensor products to model multistage interconnection networks. This algebraic theory has been used for designing and implementing block recursive numerical algorithms on sharedmemory vector multiprocessors. In this paper, we focus on the modeling of multistage interconnection networks. The tensor product representations of the baseline network, the reverse baseline network, the indirect binary ncube network, the generalized cube network, the omega network, and the flip network are given. We present the use of this theory for specifying and verifying network properties such as network partitioning and topological equivalence. Algorithm mapping using tensor product formulation is demonstrated by mapping the matrix transposition algorithm onto multistage interconnection networks. Keywords: Tensor product, parallel architecture, multistage interconnection network, partitionability, topological equivalence, algorithm mapping. 1 Introduction Tensor prod...
Design of an opticallyinterconnected multiprocessor
 InProceedings of MPPOI '98
, 1998
"... This paper presents the design of an optically interconnected multiprocessor. The design is oriented to applications where the performance isbandwidth limited inconventional multiprocessors. The system utilizes boardlevel polymer waveguides to reduce manufacturing costs. The processor interconnecti ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
This paper presents the design of an optically interconnected multiprocessor. The design is oriented to applications where the performance isbandwidth limited inconventional multiprocessors. The system utilizes boardlevel polymer waveguides to reduce manufacturing costs. The processor interconnection network, called Gemini, has a Banyan topology and is composed of dual optical and electronic networks. The optical data paths (circuit switched) are used for passing large data blocks and the matched electrical data paths (packet switched) are used forcontrol of the optical interconnect and for short data messages. 1
High Performance Interconnection Networks
, 2002
"... The thesis is concerned with the design of high performance interconnection networks for use predominantly in parallel computing systems and wide area networks. The most important indicating a combined measure of hardware complexity and worstcast message routing complexity. Furthermore, a high perf ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The thesis is concerned with the design of high performance interconnection networks for use predominantly in parallel computing systems and wide area networks. The most important indicating a combined measure of hardware complexity and worstcast message routing complexity. Furthermore, a high performance network should also have the properties of regular and planar topology, high bisection width and routing simplicity. Specifically, the following problems are studied: (i) constructing the largest possible networks that simultaneously exhibit a number of other properties including a small number of edges, high bisection width and planarity; and (ii) implementing high performance communication networks on a scale comparable to that of the Internet. With respect to specific technology, the thesis addresses the following two questions: (i) exactly how can optical internetworking be achieved on a world wide scale so as to maximize performance and (ii) just how big can an optical internetwork be, given the present/future technological limits and performance constraints.
The WNetwork: A New LowCost FaultTolerant Multistage Interconnection Network
, 1992
"... This paper presents the WNetwork, a lowcost faulttolerant MIN which is wellsuited to a multiprocessor intended for highperformance scientific computations. It provides complete tolerance of single faults without any loss of access or performance, yet is substantialy cheaper than previous designs ..."
Abstract
 Add to MetaCart
This paper presents the WNetwork, a lowcost faulttolerant MIN which is wellsuited to a multiprocessor intended for highperformance scientific computations. It provides complete tolerance of single faults without any loss of access or performance, yet is substantialy cheaper than previous designs. The redundancy cost is 22% for a 64 2 64 network made of 2 2 2 switch chips, and only 13% for a 256 2 256 network made of 4 2 4 switch chips. This is a significant reduction compared with the previous designs, in which the lowest redundancy costs for corresponding networks are 54% and 67%, respectively, when all hardware costs and performance degradations are taken into account. For a slight additional increase in hardware, the WNetwork can be enhanced with extra ports for replacing faulty processors with spare units. i ontents Int t n 1.1 Multistage Interconnection Networks : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.2 Fault Tolerance : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.3 Synopsis : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 t t nt s s ns 3.1 MultipleCopy Networks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 3.2 ExtraStage Networks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 3.3 RedundantLink Networks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 3.4 RedundantSwitch Networks : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 3.5 Comparison : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10  t 4.1 Design : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 4.2 Redundancy Costs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ...
System Design for a ComputationalRAM LogicInMemory ParallelProcessing Machine
, 1999
"... Integrating several 1bit processing elements at the sense amplifiers of a standard RAM improves the performance of massivelyparallel applications because of the inherent parallelism and high data bandwidth inside the memory chip. However, implementing such a logicinmemory system on a host comput ..."
Abstract
 Add to MetaCart
Integrating several 1bit processing elements at the sense amplifiers of a standard RAM improves the performance of massivelyparallel applications because of the inherent parallelism and high data bandwidth inside the memory chip. However, implementing such a logicinmemory system on a host computer poses several challenges because of the small bandwidth at the host system buses, and the different data formats used on the two systems. In this thesis, solutions to these system design issues, including control of the processing elements, interface to the host, data transposition, and application programming, are considered. A minimalhardware controller provides high utilization of processing elements while using a simple and generalpurpose architecture. A bufferbased host interface unit enhances external data transfers, and minimizes the effect of the host on the performance of the logicinmemory system. A parallel arraybased cornerturning scheme reduces the time to convert data...