Results 1 - 10
of
23
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks
, 2003
"... Originally developed to connect processors and memories in multicomputers, prior research and design of interconnection networks have focused largely on performance. As these networks get deployed in a wide range of new applications, where power is becoming a key design constraint, we need to seriou ..."
Abstract
-
Cited by 64 (10 self)
- Add to MetaCart
Originally developed to connect processors and memories in multicomputers, prior research and design of interconnection networks have focused largely on performance. As these networks get deployed in a wide range of new applications, where power is becoming a key design constraint, we need to seriously consider power efficiency in designing interconnection networks. As the demand for network bandwidth increases, communication links, already a significant consumer of power now, will take up an ever larger portion of total system power budget. In this paper, we motivate the use of dynamic voltage scaling (DVS) for links, where the frequency and voltage of links are dynamically adjusted to minimize power consumption. We propose a history-based DVS policy that judiciously adjusts link frequencies and voltages based on past utilization. Our approach realizes up to 6.3X power savings (4.6X on average). This is accompanied by a moderate impact on performance (15.2% increase in average latency before network saturation and 2.5% reduction in throughput.) To the best of our knowledge, this is the first study that targets dynamic power optimization of interconnection networks.
Routing in Multi-hop Packet Switching Networks: Gbps Challenge
- IEEE Network
, 1995
"... The paper is a survey of networking solutions that have been proposed for high-speed packet-switched applications. Using these solutions as examples, we identify the specific problems resulting from very high transmission rates and explain how these problems influence the design of high-speed networ ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
The paper is a survey of networking solutions that have been proposed for high-speed packet-switched applications. Using these solutions as examples, we identify the specific problems resulting from very high transmission rates and explain how these problems influence the design of high-speed networks and protocols. We conclude that the solutions based on deflection routing are the most promising ones and we suggest a number of directions for their evolution. 1 Introduction Not so long ago, computer networks with high transmission rates (e.g. several Mb/s) were naturally confined to local domains. Although such (and higher) transmission rates were available in telephony on long distances, they were used on a point-to-point basis. Concepts of highly-connected fast networks spanning geographical areas larger than the acreage typically covered by a single institution are relatively new and, besides the emerging atm technology, there are no standard commercially available solutions that c...
MPI and Java-MPI: Contrasts and Comparisons of Low-Level Communication Performance
- In Supercomputing
, 1999
"... Java is receiving increasing attention as the most popular platform for distributed and collaborative computing. However, it is still subject to significant performance drawbacks in comparison to other programming languages such as C and Fortran. This paper represents the current status of our on ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Java is receiving increasing attention as the most popular platform for distributed and collaborative computing. However, it is still subject to significant performance drawbacks in comparison to other programming languages such as C and Fortran. This paper represents the current status of our ongoing project which intends to conduct a detailed experimental evaluation on the suitability of Java in these environments, with particular focus on its messagepassing performance for one-to-one as well as one-to-many and many-to-many data exchange patterns. We also emphasize both methodology and evaluation guidelines in order to ensure reproducibility, sound interpretation, and comparative analysis of performance results. Some of the important parameters which characterize the communication performance of MPI and Java-MPI such as latency, asymptotic bandwidth and N-half are investigated. In addition, we introduce two different types of pipeline effects -- intra-message and inter-mess...
Tailoring Router Architectures to Performance Requirements in Cut-Through Networks
, 1999
"... Message-passing parallel machines have emerged as a cost-effective platform for exploiting concurrency in a variety of applications. These multicomputer systems employ a wide range of policies for routing, switching, arbitration, queueing, and flow control, implemented in the router hardware that co ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Message-passing parallel machines have emerged as a cost-effective platform for exploiting concurrency in a variety of applications. These multicomputer systems employ a wide range of policies for routing, switching, arbitration, queueing, and flow control, implemented in the router hardware that connects an individual processing node to the interconnection fabric and manages traffic flowing through the node en route to other destinations. To address the requirements of emerging applications, we develop new techniques for designing and evaluating new router architectures that tailor network policies to application characteristics. These results facilitate the development of effective support for communication in real-time systems and local area networks, as well as more traditional multicomputer domains like high-speed scientific computing. Most modern routers employ cut-through switching schemes, such as virtual cut-through and wormhole switching, that permit an arriving packet to proceed directly to an idle outgoing link. We develop analytical models for evaluating cut-through routing algorithms with different degrees of adaptivity. The analytical results permit an efficient evaluation of large networks, while detailed comparisons with simulation results characterize the subtle effects of the simplifying assumptions in the analysis; in particular, cut-through networks introduce unique dependencies between adjacent nodes. Additional simulation experiments show that the network topologies, routing algorithms, and traffic patterns in modern multicomputers exacerbate these effects. Based on these results, we present a routing algorithm that capitalizes on inter-node dependencies to improve network performance.
Design-Space Exploration of Power-Aware On/Off Interconnection Networks
"... With power a major limiting factor in the design of scalable interconnected systems, power-aware networks will become inherent components of single-chip and multi-chip systems. As communication links consume significant power regardless of utilization, we propose and investigate power-aware networks ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
With power a major limiting factor in the design of scalable interconnected systems, power-aware networks will become inherent components of single-chip and multi-chip systems. As communication links consume significant power regardless of utilization, we propose and investigate power-aware networks whose links are turned on and off in response to bursts and dips in traffic. We explore the design space of such on/off networks, outlining a 5-step design methodology along with solutions at each step that can form the building blocks of numerous designs. Two specific designs targeting links with substantially different on/off times are then presented and evaluated. Our simulations show that up to 54.4% power savings can be achieved along with at most 7.5% increase in latency.
Compile-Time Scheduling of Dataflow Program graphs with Dynamic Constructs
- University of California, Berkeley
, 1992
"... by ..."
A Theory for Total Exchange in Multidimensional Interconnection Networks
- IEEE Transactions on Parallel and Distributed Systems
, 1998
"... Total exchange (or multiscattering) is one of the important collective communication problems in multiprocessor interconnection networks. It involves the dissemination of distinct messages from every node to every other node. We present a novel theory for solving the problem in any multidimensional ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Total exchange (or multiscattering) is one of the important collective communication problems in multiprocessor interconnection networks. It involves the dissemination of distinct messages from every node to every other node. We present a novel theory for solving the problem in any multidimensional (cartesian product) network. These networks have been adopted as cost-effective interconnection structures for distributed-memory multiprocessors. We construct a general algorithm for single-port networks and provide conditions under which it behaves optimally. It is seen that many of the popular topologies, including hypercubes, k-ary n-cubes and general tori satisfy these conditions. The algorithm is also extended to homogeneous networks with 2 k dimensions and with multiport capabilities. Optimality conditions are also given for this model. Keywords: Collective communications, interconnection networks, multidimensional networks, packet-switched networks, total exchange This research w...
A Scalable VLSI MIMD Routing Cell
- In DMCC-6 Conference Proceedings
, 1991
"... It is a well known fact that full custom designed computer architectures can achieve much higher performance for specific applications than general purpose computers. This performance has to be paid for: a long design trajectory results in a high cost-performance ratio. Current VLSI design and compi ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
It is a well known fact that full custom designed computer architectures can achieve much higher performance for specific applications than general purpose computers. This performance has to be paid for: a long design trajectory results in a high cost-performance ratio. Current VLSI design and compilation tools however, make semi-custom designs feasible with greatly reduced costs and time to market. This paper presents a scalable and flexible communication processor for message passing MIMD systems. This communication processor is implemented as a parametrisized VLSI routing cell in a VLSI compilation system. This cell fits into the SCARCE RISC processor framework [1], which is an architectural framework for automatic generation of application specific processors. By use of application analysis, the cell is tuned to the specific requirements during silicon compilation time. This approach is new, in that it avoids the general performance penalty paid for required flexibility. Keyword...
Simulative Performance Analysis of Distributed Switching Fabrics for SCI-based Systems
- Microprocessors and Microsystems, Vol.24, No.1
, 2000
"... This document is a copy of the final draft of the paper prepared for the NSA. The final typeset version of the paper as it appeared in the refereed publication can be obtained from the journal volume or conference proceedings in which it appears. ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
This document is a copy of the final draft of the paper prepared for the NSA. The final typeset version of the paper as it appeared in the refereed publication can be obtained from the journal volume or conference proceedings in which it appears.

