Results 1 - 10
of
69
Performance Analysis of k-ary n-cube Interconnection Networks
- IEEE Transactions on Computers
, 1988
"... VLSI communication networks are wire limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes communication networks of varying dimension under the assumption of constant wi ..."
Abstract
-
Cited by 271 (14 self)
- Add to MetaCart
VLSI communication networks are wire limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes communication networks of varying dimension under the assumption of constant wire bisection. Expressions for the latency, average case throughput, and hot-spot throughput of k-ary n- cube networks with constant bisection are derived that agree closely with experimental measurements. It is shown that low-dimensional networks (e.g., tori) have lower latency and higher hot-spot throughput than high-dimensional networks (e.g., binary n-cubes) with the same bisection width. Keywords Communication networks, interconnection networks, concurrent computing, message-passing multiprocessors, parallel processing, VLSI. 1 Introduction The critical component of a concurrent computer is its communication network. Many algorithms are communication rather than processing limited. Fi...
Limits on Interconnection Network Performance
- IEEE Transactions on Parallel and Distributed Systems
, 1991
"... As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models networ ..."
Abstract
-
Cited by 166 (4 self)
- Add to MetaCart
As the performance of interconnection networks becomes increasingly limited by physical constraints in high-speed multiprocessor systems, the parameters of high-performance network design must be reevaluated, starting with a close examination of assumptions and requirements. This paper models network latency, taking both switch and wire delays into account. A simple closed form expression for contention in buffered, direct networks is derived and is found to agree closely with simulations. The model includes the effects of packet size and communication locality. Network analysis under various constraints (such as fixed bisection width, fixed channel width, and fixed node size) and under different workload parameters (such as packet size, degree of communication locality, and network request rate) reveals that performance is highly sensitive to these constraints and workloads. A twodimensional network has the lowest latency only when switch delays and network contention are ignored, but...
The NP-completeness column: an ongoing guide
- Journal of Algorithms
, 1985
"... This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NP-completeness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’ ’ W. H. Freeman & Co ..."
Abstract
-
Cited by 164 (0 self)
- Add to MetaCart
This is the nineteenth edition of a (usually) quarterly column that covers new developments in the theory of NP-completeness. The presentation is modeled on that used by M. R. Garey and myself in our book ‘‘Computers and Intractability: A Guide to the Theory of NP-Completeness,’ ’ W. H. Freeman & Co., New York, 1979 (hereinafter referred to as ‘‘[G&J]’’; previous columns will be referred to by their dates). A background equivalent to that provided by [G&J] is assumed, and, when appropriate, cross-references will be given to that book and the list of problems (NP-complete and harder) presented there. Readers who have results they would like mentioned (NP-hardness, PSPACE-hardness, polynomial-time-solvability, etc.) or open problems they would like publicized, should
A Framework For Solving Vlsi Graph Layout Problems
- JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1984
"... This paper introduces a new divide-and-conquer framework for VLSI graph layout. Universally close upper and lower bounds are obtained for important cost functions such as layout area and propagation delay. The framework is also effectively used to design regular and configurable layouts, to assemble ..."
Abstract
-
Cited by 120 (3 self)
- Add to MetaCart
This paper introduces a new divide-and-conquer framework for VLSI graph layout. Universally close upper and lower bounds are obtained for important cost functions such as layout area and propagation delay. The framework is also effectively used to design regular and configurable layouts, to assemble large networks of processors using restructurable chips, and to configure networks around faulty processors. It is also shown how good graph partitioning heuristics may be used to develop a provably good layout strategy.
Randomized routing and sorting on fixed-connection networks
- Journal of Algorithms
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixed-connection networks. Its basis is a randomized on-line algorithm for scheduling any set of N packets whose paths have congestion c on any bounded-degree leveled network with depth L in O(c + L + log N) steps ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
This paper presents a general paradigm for the design of packet routing algorithms for fixed-connection networks. Its basis is a randomized on-line algorithm for scheduling any set of N packets whose paths have congestion c on any bounded-degree leveled network with depth L in O(c + L + log N) steps, using constant-size queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffle-exchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an area-universal network: an N-node network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
The Power of Reconfiguration
, 1998
"... This paper concerns the computational aspects of the reconfigurable network model. The computational power of the model is investigated under several network topologies and assuming several variants of the model. In particular, it is shown that there are reconfigurable machines based on simple netwo ..."
Abstract
-
Cited by 80 (7 self)
- Add to MetaCart
This paper concerns the computational aspects of the reconfigurable network model. The computational power of the model is investigated under several network topologies and assuming several variants of the model. In particular, it is shown that there are reconfigurable machines based on simple network topologies, that are capable of solving large classes of problems in constant time. These classes depend on the kinds of switches assumed for the network nodes. Reconfigurable networks are also compared with various other models of parallel computation, like PRAM's and Branching Programs. Part of this work is to be presented at the 18th International Colloquium on Automata, Languages, and Programming (ICALP), July 1991, Madrid. y Department of Computer Science, The Hebrew University, Jerusalem 91904, Israel. E-mail: yosi@humus.huji.ac.il, Supported by Eshcol Fellowship. z Department of Applied Mathematics and Computer Science, The Weizmann Institute, Rehovot 76100, Israel. E-mail: p...
The Computational Complexity of Universal Hashing
- Theoretical Computer Science
, 2002
"... Any implementation of Carter-Wegman universal hashing from n-bit strings to m-bit strings requires a time-space tradeoff of TS = Ω(nm). The bound holds in the general boolean branching program model, and thus in essentially any model of computation. As a corollary, computing a+b*c in any field ..."
Abstract
-
Cited by 54 (2 self)
- Add to MetaCart
Any implementation of Carter-Wegman universal hashing from n-bit strings to m-bit strings requires a time-space tradeoff of TS = Ω(nm). The bound holds in the general boolean branching program model, and thus in essentially any model of computation. As a corollary, computing a+b*c in any field F requires a quadratic time-space tradeoff, and the bound holds for any representation of the elements of the field. Other lower bounds on the...
Randomized Routing on Fat-Trees
- Advances in Computing Research
, 1996
"... Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the ..."
Abstract
-
Cited by 47 (10 self)
- Add to MetaCart
Fat-trees are a class of routing networks for hardware-efficient parallel computation. This paper presents a randomized algorithm for routing messages on a fat-tree. The quality of the algorithm is measured in terms of the load factor of a set of messages to be routed, which is a lower bound on the time required to deliver the messages. We show that if a set of messages has load factor on a fat-tree with n processors, the number of delivery cycles (routing attempts) that the algorithm requires is O(+lg n lg lg n) with probability 1 \Gamma O(1=n). The best previous bound was O( lg n) for the off-line problem in which the set of messages is known in advance. In the context of a VLSI model that equates hardware cost with physical volume, the routing algorithm can be used to demonstrate that fat-trees are universal routing networks. Specifically, we prove that any routing network can be efficiently simulated by a fat-tree of comparable hardware cost. 1 Introduction Fat-trees constitute...
Analysis of Power Consumption on Switch Fabrics in Network Routers
- In Proc. Design Automation Conference
, 2002
"... In this paper, we introduce a framework to estimate the power consumption on switch fabrics in network routers. We propose different modeling methodologies for node switches, internal buffers and interconnect wires inside switch fabric architectures. A simulation platform is also implemented to trac ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
In this paper, we introduce a framework to estimate the power consumption on switch fabrics in network routers. We propose different modeling methodologies for node switches, internal buffers and interconnect wires inside switch fabric architectures. A simulation platform is also implemented to trace the dynamic power consumption with bit-level accuracy. Using this framework, four switch fabric architectures are analyzed under different traffic throughput and different numbers of ingress/egress ports. This framework and analysis can be applied to the architectural exploration for low power high performance network router designs.
Horizons of Parallel Computation
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1993
"... This paper considers the ultimate impact of fundamental physical limitations---notably, speed of light and device size---on parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
This paper considers the ultimate impact of fundamental physical limitations---notably, speed of light and device size---on parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the consequences of the accomplished attainment of the physical bounds. The main result is that scalability holds only for neighborly interconnections, such as the square mesh, of bounded-size synchronous modules, presumably of the area-universal type. We also discuss the ultimate infeasibility of latencyhiding, the violation of intuitive maximal speedups, and the emerging novel processor-time tradeoffs.

