Results 1  10
of
77
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract

Cited by 72 (11 self)
 Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compiletime heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamiclevel scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graphanalysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
A Unified Theory Of Interconnection Network Structure
 Theoretical Computer Science
, 1986
"... The relationship between the topology of interconnection networks and their functional properties is examined. Graph theoretical characterizations are derived for delta networks, which have a simple routing scheme, and for bidelta networks, which have the delta property in both directions. Delta net ..."
Abstract

Cited by 44 (0 self)
 Add to MetaCart
The relationship between the topology of interconnection networks and their functional properties is examined. Graph theoretical characterizations are derived for delta networks, which have a simple routing scheme, and for bidelta networks, which have the delta property in both directions. Delta networks are shown to have a recursive structure. Bidelta networks are shown to have a unique topology. The definition of bidelta network is used to derive in a uniform manner the labeling schemes that define the omega networks, indirect binary cube networks, flip networks, baseline networks, modified data manipulators, and two new networks; these schemes are generalized to arbitrary radices. The labeling schemes are used to characterize networks with simple routing. In another paper, we characterize the networks with optimal performance/cost ratio. Only the multistage shuffleexchange networks have both optimal performance/cost ratio and simple routing. This helps explain why few fundamentally...
Reconfiguration With Time Division Multiplexed MINs for Multiprocessor Communications
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, timedivision multiplexed multistage interconnection networks (TDMMINs) are proposed for multiprocessor communications. Connections required by an application are partitioned into a number of subsets called mappings, such that connections in each mapping can be established in a MI ..."
Abstract

Cited by 36 (29 self)
 Add to MetaCart
In this paper, timedivision multiplexed multistage interconnection networks (TDMMINs) are proposed for multiprocessor communications. Connections required by an application are partitioned into a number of subsets called mappings, such that connections in each mapping can be established in a MIN without conflict. Switch settings for establishing connections in each mapping are determined and stored in shift registers. By repeatedly changing switch settings, connections in each mapping are established for a time slot in a roundrobin fashion. Thus, all connections required by an application may be established in a MIN in a timedivision multiplexed way. TDMMINs can emulate a completely connected network using N time slots. It can also emulate regular networks such as rings, meshes, CubeConnectedCycles (CCC), binary trees and n dimensional hypercubes using 2, 4, 3, 4 and n time slots, respectively. The problem of partitioning an arbitrary set of requests into a minimal ...
Permutation capability of optical multistage interconnection network
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING” , PP.60,
, 2000
"... In this paper, we study optical multistage interconnection networks (MINs). Advances in electrooptic technologies have made optical communication a promising networking choice to meet the increasing demands for high channel bandwidth and low communication latency of highperformance computing/commu ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
In this paper, we study optical multistage interconnection networks (MINs). Advances in electrooptic technologies have made optical communication a promising networking choice to meet the increasing demands for high channel bandwidth and low communication latency of highperformance computing/communication applications. Although optical MINs hold great promise and have demonstrated advantages over their electronic counterpart, they also hold their own challenges. Due to the unique properties of optics, crosstalk in optical switches should be avoided to make them work properly. Most of the research work described in the literature are for electronic MINs, and hence, crosstalk is not considered. In this paper, we introduce a new concept, semipermutation, to analyze the permutation capability of optical MINs under the constraint of avoiding crosstalk, and apply it to two examples of optical MINs, banyan network and Benes network. For the blocking banyan network, we show that not all semipermutationsare realizable in one pass, and give the number of realizable semipermutations. For the rearrangeable Benes network, we show that any semipermutation is realizable in one pass and any permutation is realizable in two passes under the constraint of avoiding crosstalk. A routing algorithmfor realizing a semipermutationin a Benes network is also presented. Withthe speed and bandwidthprovided by current optical technology, an optical MIN clearly demonstrates a superior overall performance over its electronic MIN counterpart.
Generalized connection networks for parallel pro. cessor intercommunication
 IEEE Trans. Comput
, 1978
"... AbstractA generalized connection network (GCN) is a switching network with N inputs and N outputs that can be set to pass any of the NN mappings of inputs onto outputs. This paper demonstrates an intimate connection between the problems of GCN construction, message routing on SIMD computers, and &q ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
AbstractA generalized connection network (GCN) is a switching network with N inputs and N outputs that can be set to pass any of the NN mappings of inputs onto outputs. This paper demonstrates an intimate connection between the problems of GCN construction, message routing on SIMD computers, and "resource partitioning." A GCN due to Ofman [7] is here improved to use less than 7.6N log N contact pairs, making it the minimal known construction. Any GCN construction leads to a new algorithm for the broadcast of messages among processing elements of an SIMD computer when each processing element is to receive one message. Previous approaches to message broadcasting have not handled the problem in its full generality. The algorithm arising from this paper's GCN takes 8 log N (or 13N 112) routing steps on an N element processor of the perfect shuffle (or meshtype) variety. If each resource in a multiprocessing environment is assigned one output of a GCN, private buses may be provided for any number of disjoint subsets of the resources. The partitioning construction derived from this paper's GCN has 5.7N log N switches, providing an alternative to "banyan networks " with O(N log N) switches but incomplete functionality. Index TermsArray processors, connection networks, message broadcasting, parallel algorithms, parallel processing, resource partitioning, SIMD machines.
Performance Evaluation of SwitchBased Wormhole Networks
, 1997
"... Multistage interconnection networks (MINs) are a popular class of switchbased network architectures for constructing scalable parallel computers. Four wormhole MINs built from k k switches, where k = 2 j for some j, are considered in this paper: traditional MINs (TMINs), dilated MINs (DMINs), ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Multistage interconnection networks (MINs) are a popular class of switchbased network architectures for constructing scalable parallel computers. Four wormhole MINs built from k k switches, where k = 2 j for some j, are considered in this paper: traditional MINs (TMINs), dilated MINs (DMINs), MINs with virtual channels (VMINs), and bidirectional MINs (BMINs). The first three MINs are unidirectional networks, and we show that the cube interconnection pattern can provide contentionfree and channelbalanced partitioning of binary cube clusters. BMINs based on butterfly interconnection are essentially a fat tree, and their routing properties are described. Performance comparison among these four networks using simulation experiments is presented with respect to different network traffic patterns. Both DMINs (dilation two) and BMINs have a similar hardware complexity. We conclude that a twodilated MIN outperforms the corresponding BMIN (or fat tree) for most of the traffic condi...
On the Communication Throughput of Buffered Multistage Interconnection Networks
 PROC. OF THE 8TH ANNUAL ACM SYMPOSIUM ON PARALLEL ALGORITHMS AND ARCHITECTURES
, 1996
"... Multistage networks (MIN) are used as interconnection structure in a large number of applications. Their performance is mainly determined by their communication throughput which, in most cases, has to be investigated by timeconsuming simulations or approximated by simple models. In this paper, we i ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Multistage networks (MIN) are used as interconnection structure in a large number of applications. Their performance is mainly determined by their communication throughput which, in most cases, has to be investigated by timeconsuming simulations or approximated by simple models. In this paper, we investigate the steady state throughput of single buffered multistage interconnection networks using the so called relaxed blocking model, where a message is deleted, if the receiving buffer is occupied. We derive upper and lower bounds on the throughput of MINs of arbitrary height and show that the throughput of singlebuffered networks is an order of magnitude higher than the throughput of nonbuffered MINs. In detail we show, that the throughput is \Theta(n= p log n) if n is the size of the network. Because the timedynamic of finite buffered MINs defies each marcov or semimarcov approach, we analyze the the equilibriumsituation of the network and give tight upper and lower bounds on t...
The Representation Of Multistage Interconnection Networks in Queuing Models of Parallel Systems
 Journal of the ACM
, 1990
"... Abstract. A major component of a parallel machine is its interconnection network (IN), which provides concurrent communication between the processing elements. It is common to use a multistage interconnection network (MIN) that is constructed using crossbar switches and introduces contention not onl ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Abstract. A major component of a parallel machine is its interconnection network (IN), which provides concurrent communication between the processing elements. It is common to use a multistage interconnection network (MIN) that is constructed using crossbar switches and introduces contention not only for destination addresses but also for internal links. Both types of contention are increased when nonlocal communication across a MIN becomes concentrated on a certain destination address, the hotspot. This paper considers analytical models of asynchronous, circuitswitched INS in which partial paths are held during path building, beginning with a single crossbar and extending recursively to MINs. Since a path must be held between source and destination processors before data can be transmitted, switching networks are passive resources and queuing networks that include them do not therefore have productform solutions. Using decomposition techniques, the flowequivalent server (PBS) that represents a bank of devices transmitting through a switching network is determined, under mild approximating assumptions. In the case of a full crossbar, the FES can be solved directly and the result can be applied recursively to model the MIN. Two cases are considered: one in which there is uniform routing and the other where there is a hotspot at one of the output pins. Validation with respect to simulation for MINs with up to six stages (64way switching) indicates a high degree of accuracy in the models.
Multicommodity Flows in Simple Multistage Networks
 Networks
, 1994
"... . In this paper we consider the integral multicommodity flow problem on directed graphs underlying two classes of multistage interconnection networks. In one direction, we consider 3stage networks. Using existing results on (g; f)factors of bipartite graphs, we show sufficient and necessary condit ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
. In this paper we consider the integral multicommodity flow problem on directed graphs underlying two classes of multistage interconnection networks. In one direction, we consider 3stage networks. Using existing results on (g; f)factors of bipartite graphs, we show sufficient and necessary conditions for the existence of a solution when the network has at most 2 secondary switches. In contrast, the problem is shown to be NPcomplete if the network has 3 or more secondaries. In a second direction, we introduce a recursive class of networks that includes multistage hypercubic networks (such as the omega network, the indirect binary ncube, and the generalized cube network) as a proper subset. Networks in the new class may have arbitrary number of stages. Moreover, each stage may contain identical switches of any arbitrary size. The notion of extrastage networks is extended to the new class, and the problem is shown to have polynomial time solutions on rstage networks where r = 3, o...