Results 1 -
5 of
5
Limits on Interconnection Network Performance
- IEEE Transactions on Parallel and Distributed Systems
, 1991
"... As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models networ ..."
Abstract
-
Cited by 166 (4 self)
- Add to MetaCart
As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models network latency, taking both switch and wire delays into account. A simple closed form expression for contention in buffered, direct networks is derived and is found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints (such as fixed bisection width, fixed channel width, and fixed node size) and under different workload parameters (such as packet size, degree of communication locality, and network request rate) reveals that performance is highly sensitive to these constraints and workloads. A twodimensional network has the lowest latency only when switch delays and network contention are ignored, but...
Load Balancing Using Time Series Analysis for Soft Real Time Systems with Statistically Periodic Loads
, 1993
"... lhn'.'n _'¢W4, V, ld*., t t_"rll I ..."
The NuMesh: A Modular, Scalable Communications Substrate
- In International Conference on Supercomputing
, 1993
"... Many standardized hardware communication interfaces offer runtime flexibility and configurability at the cost of efficiency. An alternate approach is the use of a highly-efficient, minimal communication element, with as much communication decision-making as possible done at compile time. NuMesh is a ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Many standardized hardware communication interfaces offer runtime flexibility and configurability at the cost of efficiency. An alternate approach is the use of a highly-efficient, minimal communication element, with as much communication decision-making as possible done at compile time. NuMesh is a packaging and interconnect technology supporting high-bandwidth systolic communications on a 3D nearest-neighbor lattice; our goal is to combine Lego-like modularity with supercomputer performance. To date, the primary focus of the project has been the class of applications whose static communication patterns can be precompiled into independent and carefully choreographed finite state machines running on each node. Several extensions of the NuMesh to more general communication paradigms have been implemented, and the issues involved are under active exploration. This paper presents an overview of our approach, as well as an introduction to our current-generation prototype. We also discuss o...
Run-Time Thread Management for Large-Scale Distributed-Memory Multiprocessors
, 1993
"... Effective thread management is crucial to achieving good performance on large-scale distributed-memory multiprocessors that support dynamic threads. For a given parallel computation with some associated task graph, a thread-management algorithm produces a running schedule as output, subject to the p ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Effective thread management is crucial to achieving good performance on large-scale distributed-memory multiprocessors that support dynamic threads. For a given parallel computation with some associated task graph, a thread-management algorithm produces a running schedule as output, subject to the precedence constraints imposed by the task graph and the constraints imposed by the interprocessor communications network. Optimal thread management is an NP-hard problem, even given full a priori knowledge of the entire task graph and assuming a highly simplified architecture abstraction. Thread management is even more difficult for dynamic data-dependent computations which must use online algorithms because their task graphs are not known a priori. This thesis investigates online thread-management algorithms and presents XTM, an online thread-management system for large-scale distributed-memory multiprocessors. XTM has been implemented for the MIT Alewife Multiprocessor. Simulation results...
Load Balancing: A Programmer's Approach or The Impact of Task-Length Parameters on the Load Balancing Performance of Parallel Programs.
"... We consider the problem of dynamic load balancing in an n processor parallel system. The scheduling process of a parallel program is modeled by randomly throwing weighted balls into n holes. For a given program A, the ball weights (task lengths) are chosen according to a probability distribution D( ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the problem of dynamic load balancing in an n processor parallel system. The scheduling process of a parallel program is modeled by randomly throwing weighted balls into n holes. For a given program A, the ball weights (task lengths) are chosen according to a probability distribution D(A), for which we know only some of the following parameters: the expectation ¯, variance oe 2 , maximum M and minimum m. According to these parameters, we derive an upper bound for the number of tasks to be generated by A in order to achieve a load balancing ratio for which the run-time is optimal up to a factor (1 + ffl) 2 for any 0 ! ffl 0:5, with very high probability. Using the derived relations, the programmer may control the load-balancing of his program by tuning the global parameters of the generated tasks. This can be done regardless of the underlying scheduler used by the parallel machine. We also give experimental results of marine-life simulation in support of our claims. ...

