## On the Fault Tolerance of Some Popular Bounded-Degree Networks (1992)

### Cached

### Download Links

- [ftp.cs.princeton.edu]
- [www.cs.cmu.edu]
- [www.cs.umass.edu]
- [www.cs.umass.edu]
- [www-2.cs.cmu.edu]
- [www.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | SIAM Journal on Computing |

Citations: | 46 - 8 self |

### BibTeX

@INPROCEEDINGS{Leighton92onthe,

author = {Tom Leighton and Bruce Maggs and R. Sitaraman},

title = {On the Fault Tolerance of Some Popular Bounded-Degree Networks},

booktitle = {SIAM Journal on Computing},

year = {1992},

pages = {542--552}

}

### OpenURL

### Abstract

In this paper, we analyze the ability of several bounded-degree networks that are commonly used for parallel computation to tolerate faults. Among other things, we show that an N-node butterfly containing N 1\Gammaffl worst-case faults (for any constant ffl ? 0) can emulate a fault-free butterfly of the same size with only constant slowdown. Similar results are proved for the shuffleexchange graph. Hence, these networks become the first connected boundeddegree networks known to be able to sustain more than a constant number of worst-case faults without suffering more than a constant-factor slowdown in performance. We also show that an N-node butterfly whose nodes fail with some constant probability p can emulate a fault-free version of itself with a slowdown of 2 O(log N) , which is a very slowly increasing function of N . The proofs of these results combine the technique of redundant computation with new algorithms for (packet) routing around faults in hypercubic networks. Tech...

### Citations

1374 |
Introduction to Parallel Algorithms and Architectures: Arrays
- LEIGHTON
- 1992
(Show Context)
Citation Context ...e with N 1\Gammaffl worst-case faults can emulate any normal algorithm with constant slowdown. (The set of normal algorithms include FFT, bitonic 3 sort, and other important ascend-descend algorithms =-=[Lei92]-=-.) Previously, such results were known only for hypercubes containing log O(1) N faults [AL91, BCS90, BCS92]. Whether or not an N-node hypercube can tolerate more than log O(1) N faults with constant ... |

488 | Efficient dispersal of information for security, load balancing, and fault tolerance - Rabin - 1989 |

282 |
Probabilistic Construction of Deterministic Algorithms: Approximating Packing Integer Programs
- RAGHAVAN
- 1986
(Show Context)
Citation Context ...sing the fact that algorithm RANDSET will find them with non-zero probability. In this section we construct a deterministic algorithm that always finds these paths using the technique due to Raghavan =-=[Rag88]-=- and Spencer [Spe87] to remove the randomness. Further, like Waksman's algorithm for finding the switch settings in a fault-free Benes network with N input nodes, the algorithm runs in O(N log N) time... |

240 |
Fat-trees: Universal networks for hardware-efficient supercomputing
- Leiserson
- 1985
(Show Context)
Citation Context ...EGREE NETWORKS7 and size with only constant slowdown. We first consider embedding a complete binary tree in a complete binary tree with faults only at its leaves. This result also holds for fat-trees =-=[21, 33]-=- with faults at the leaves. We use this result to find reconfigurations of butterflies and meshes of trees in which faults may occur at any node. The main result of this section is a proof that an N-n... |

217 |
A scheme for fast parallel communication
- Valiant
- 1982
(Show Context)
Citation Context ...des in the butterfly. Note that this is optimal within constant factors since N faults in level log N=2 can bisect the butterfly into many small components. We start by describing Valiant's algorithm =-=[Val82]-=- for permutation routing in a butterfly without faults. It will be convenient for us to view the packets in this scheme as being routed on a bigger network with 4 log N + 1 levels. Between level 0 and... |

104 | Packet routing and job-shop scheduling in O(congestion+dilation) steps
- Leighton, Maggs, et al.
- 1994
(Show Context)
Citation Context ...g is the length of the longest path. The load, congestion, and dilation of the embedding determine the time required to emulate each step of G on H. In particular, Leighton, Maggs, and Rao have shown =-=[30]-=- that if there is an embedding of G in H with load l, congestion c, and dilation d, then H can emulate any computation on G with slowdown O(l + c + d). In this paper, we are most interested in embeddi... |

89 | Randomized routing and sorting on fixed-connection networks
- Leighton, Maggs, et al.
- 1994
(Show Context)
Citation Context ...ng as the (leveled) paths of the packets have congestion O(logN ), a Ranade-type queuing protocol can be used to route the packets in O(log N ) steps using constant-size queues, with high probability =-=[29]. Therefor-=-e it is sufficient to derive high probabilty bounds on the congestion of the paths in a routing scheme. Our goal is to identify a large set of "good" nodes in a faulty butterfly between whic... |

77 | A permutation network
- Waksman
- 1968
(Show Context)
Citation Context ... output node. For a full permutation, there are 2N input-output pairs which require paths to be routed between them. The standard algorithm for setting the switches in a Benes network, due to Waksman =-=[Wak68]-=-, uses bipartite graph matching to split the set of 2N pairs into two sets of N pairs which are each then routed recursively in one of the smaller sub-Benes networks. We now present Waksman's algorith... |

76 |
Universal packet routing algorithms, in
- Leighton, Maggs, et al.
- 1988
(Show Context)
Citation Context ... is the length of the longest path. The load, congestion, and dilation of the embedding determine the time required to emulate each step of G on H . In particular, Leighton, Maggs, and Rao have shown =-=[LMR88]-=- that if there is an embedding of G in H with congestion c, load l, and dilation d, then H can emulate any computation on G with slowdown O(c + l + d). In this paper, we will be most interested in emb... |

63 | On Line Algorithms for Path Selection in a Nonblocking Network - Avora, Leighton, et al. - 1990 |

51 | Randomized routing on fat-trees
- Greenberg, Leiserson
- 1989
(Show Context)
Citation Context ...EGREE NETWORKS7 and size with only constant slowdown. We first consider embedding a complete binary tree in a complete binary tree with faults only at its leaves. This result also holds for fat-trees =-=[21, 33]-=- with faults at the leaves. We use this result to find reconfigurations of butterflies and meshes of trees in which faults may occur at any node. The main result of this section is a proof that an N-n... |

45 | Work-preserving emulations of fixedconnection networks
- Koch, Leighton, et al.
- 1989
(Show Context)
Citation Context ... ensuring that replicated computations stay consistent (and accurate) over time. This technique was previously used in the context of (fault-free) work-preserving emulations of one network by another =-=[19, 26, 38, 39, 40, 47]-=-. The techniques developed in Section 4 also have applications for hypercubes. For example, in Section 4.4, we use them to show than an N-node hypercube with N 1\Gammaffl worst-case faults can emulate... |

37 | The Extra Stage Cube: A Fault Tolerant Interconnection Network for Supersystems - Adams, Siegel - 1982 |

37 |
Lecture Notes on Randomized Algorithms
- Raghavan
- 1990
(Show Context)
Citation Context ...is at most 2 log N \Gammal =(3N=5). Therefore the average number of packets passing through a node in Stage 1 is at most 2 l log N2 log N \Gammal 5=(3N) which is 5 log N=3. We can use Chernoff bounds =-=[Rag90]-=- to show that the number of packets through s in Stage 1 is O(log N) with probability at least 1 \Gamma o(1=N k ). The calculation for a node in Stage 2 is exactly analagous. Thus the congestion is O(... |

31 | Asymptotically tight bounds for computing with faulty arrays of processors
- Kaklamanis, Karlin, et al.
- 1990
(Show Context)
Citation Context ...embedding of an N-node (2 or 3-dimensional) array into an array of the same size containing more than a constant number of worst-case faults must have more than constant load, congestion, or dilation =-=[22, 25, 31]-=-, and 2. the N-node hypercube can be reconfigured around log O(1) N worst-case faults with constant load, congestion, and dilation [2, 12]. The embeddings that we use in Section 2 are level-preserving... |

28 | Fast Computations using Faulty Hypercubes - Hastad, Leighton, et al. - 1989 |

27 | Fast algorithms for routing around faults in multibutterflies and randomly-wired splitter networks
- Leighton, Maggs
- 1992
(Show Context)
Citation Context ...t it is possible to route any permutation between the nodes in these rows in O(log N) steps, with high 15 probability. This is comparable to the results for fault-tolerant routing in a multi-butterfly=-=[LM92]-=-. A special case of this result is that when f equals ffN (for any ff ! 2=9) we can route arbitrary permutations between a majority of nodes in the butterfly. Note that this is optimal within constant... |

26 | Fault Tolerant Sorting Networks - Assaf, Upfal - 1991 |

25 |
The complexity of switching networks
- PIPPENGER
- 1973
(Show Context)
Citation Context ...hare the same input switch (or output switch). The result is a 2-regular N \Theta N bipartite graph. The second step is to split the edges of this graph into two perfect matchings, M 0 and M 1 . (See =-=[42]-=- for a nice proof that such a split is possible.) Next, we pick a binary value for random variable X at random. If X = 0 then we recursively route the paths in matching M 0 through the top sub-Benes n... |

24 | Reconfiguring a hypercube in the presence of faults - Hastad, Leighton, et al. - 1987 |

20 | Designing fault-tolerant systems using automorphisms - Dutt, Hayes - 1991 |

20 |
On the computational equivalence of hypercube-derived networks
- Schwabe
- 1990
(Show Context)
Citation Context ...r 1sjsffl log N=4 \Gamma 1. Thus T 0 = P 0!iT l i can be no more than (16=ffl + 1)T + 4 log N which is O(T + log N ). We can extend these results to the shuffle-exchange network using Schwabe's proof =-=[Sch90]-=- that an N-node butterfly can emulate an N-node shuffleexchange network with constant slowdown, and vice versa. Theorem 4.3.3 Any computation on a fault-free N-node shuffle-exchange network G which ta... |

19 | Fault-tolerant meshes with small degree - Bruck, Cypher, et al. - 1993 |

18 |
Coding theory, hypercube embeddings, and fault tolerance
- Aiello, Leighton
- 1991
(Show Context)
Citation Context ... faults must have more than constant load or dilation [GE84, KKL + 90], and 2. the N-node hypercube can be reconfigured around log O(1) N worstcase faults with constant load, congestion, and dilation =-=[AL91]-=-. The embeddings that we use in Section 2 are level-preserving, i.e., nodes in a particular level of the fault-free network are mapped to nodes on the same level of the faulty network. We take a signi... |

18 | Gamal, â€śConfiguration of VLSI array in the presence of defects - Greene, E - 1984 |

18 | Leiserson, Wafer-scale integration for systolic arrays
- Leighton, E
- 1982
(Show Context)
Citation Context ...embedding of an N-node (2 or 3-dimensional) array into an array of the same size containing more than a constant number of worst-case faults must have more than constant load, congestion, or dilation =-=[22, 25, 31]-=-, and 2. the N-node hypercube can be reconfigured around log O(1) N worst-case faults with constant load, congestion, and dilation [2, 12]. The embeddings that we use in Section 2 are level-preserving... |

15 | On Designing and Reconfiguring k-fault-tolerant Tree Architectures - Dutt, Hayes - 1990 |

15 |
Multi-scale self-simulation: a technique for reconfiguring arrays with faults
- Cole, Maggs, et al.
- 1993
(Show Context)
Citation Context ...e concerning the fault-tolerance of communication networks. We do not have the space to review all of this literature here, but we would like to cite the papers that are most relevant. In particular, =-=[2, 5, 9, 14, 23, 24, 25, 35, 44, 52]-=- show how to reconfigure a network with faults so that it can emulate a fault-free network of the same type and size. A fault-tolerant area-universal network is presented in [53]. References [4, 10, 1... |

14 | Simple algorithms for routing on butterfly networks with bounded queues
- Maggs, Sitaraman
- 1999
(Show Context)
Citation Context ...acket routing In this section we show how to route packets in an N-input butterfly network with f worst-case faults. A widely studied class of routing problems is the permutation routing problem (see =-=[MS92]-=- for references to permutation routing algorithms). A permutation routing problem between a set of nodes in the network has exactly one packet originating at every node in this set. Each such packet n... |

13 | Routing on butterfly networks with random faults - Cole, Maggs, et al. - 1995 |

12 | Tolerating faults in hypercubes using subcube partitioning
- Bruck, Cypher, et al.
- 1992
(Show Context)
Citation Context ...dge for communication at every time step and further all of the edges used in a time step belong to the same dimension of the hypercube are called leveled algorithms (also known as regular algorithms =-=[BCS92]-=-). A useful subclass of leveled algorithms are normal algorithms . A normal algorithm has the additional restriction that the dimensions used in consecutive time steps are consecutive. Many algorithms... |

12 | Fast fault-tolerant parallel communication and on-line maintenance using information dispersal - Lyuu - 1989 |

11 |
Randomized routing on fat-trees
- Greenberg, Leiserson
- 1989
(Show Context)
Citation Context ...rsion of itself with only constant slowdown. We will first consider embedding a complete binary tree in a complete binary tree with faults only at its leaves. This result will also hold for fat-trees =-=[GL89]-=- with faults at the leaves. We use this result to find reconfigurations of butterflies and meshes of trees in which faults may occur at any node. The primary result of this section is that an N-node b... |

10 | Fault tolerant graphs, perfect hash functions and disjoint paths - Ajtai, Alon, et al. - 1992 |

9 |
auf der Heide. Efficiency of universal parallel computers
- Meyer
- 1983
(Show Context)
Citation Context ... ensuring that replicated computations stay consistent (and accurate) over time. This technique was previously used in the context of (fault-free) work-preserving emulations of one network by another =-=[19, 26, 38, 39, 40, 47]-=-. The techniques developed in Section 4 also have applications for hypercubes. For example, in Section 4.4, we use them to show than an N-node hypercube with N 1\Gammaffl worst-case faults can emulate... |

9 | Time-optimal simulations of networks by universal parallel computers
- Heide, Wanka
- 1989
(Show Context)
Citation Context ... ensuring that replicated computations stay consistent (and accurate) over time. This technique was previously used in the context of (fault-free) work-preserving emulations of one network by another =-=[19, 26, 38, 39, 40, 47]-=-. The techniques developed in Section 4 also have applications for hypercubes. For example, in Section 4.4, we use them to show than an N-node hypercube with N 1\Gammaffl worst-case faults can emulate... |

8 |
Layered Cross Product - A technique to construct interconnection networks
- Even, Litman
- 1992
(Show Context)
Citation Context ...roof follows from Theorems 2.1.2 and 2.2.1. The results of this subsection can also be formulated using the fact that the butterfly and the mesh of trees can be expressed as the Layered Cross Product =-=[EL92]-=- of two complete binary trees (or variations thereof) [Aie92]. 2.3 Limitations on level-preserving embeddings We do not know whether or not Theorems 2.1.2, 2.2.2, and 2.2.3 can be improved if the leve... |

8 |
Encoding Graphs in Graphs
- Fellows
- 1985
(Show Context)
Citation Context |

7 | Running algorithms efficiently on faulty hypercubes - Bruck, Cypher, et al. - 1990 |

7 |
Efficient self-embedding of butterfly networks with random faults
- Tamaki
- 1998
(Show Context)
Citation Context ...pen questions. Until very recently, no results along these lines were known for the butterfly (unless routing is allowed through faulty nodes [Ann89], which simplifies matters substantially) . Tamaki =-=[Tam92a]-=- has recently discovered an emulation scheme with slowdown O((log log N) 8:2 ). He has also introduced a class of boundeddegree networks called cube-connected arrays [Tam92b], and showed that an N-nod... |

7 |
Optimal embeddings of butterfly-like graphs in the hypercube
- Greenberg, Heath, et al.
- 1990
(Show Context)
Citation Context ... butterfly can emulate any normal algorithm of an N-node hypercube with constant slowdown. We also require the following well known result concerning the embedding of a butterfly in a hypercube. (See =-=[GHR90]-=- for the stronger result that the butterfly is a subgraph of the hypercube). Lemma 4.4.2 An N-node butterfly can be embedded in an N-node hypercube with constant load, congestion, and dilation. Theore... |

6 |
Fault tolerance in hypercube-derivative networks
- Annexstein
- 1989
(Show Context)
Citation Context ... with only constant expected slowdown remain interesting open questions. Until very recently, no results along these lines were known for the butterfly (unless routing is allowed through faulty nodes =-=[Ann89]-=-, which simplifies matters substantially) . Tamaki [Tam92a] has recently discovered an emulation scheme with slowdown O((log log N) 8:2 ). He has also introduced a class of boundeddegree networks call... |

6 | Embedding complete binary trees in faulty hypercubes - Wang, Cypher, et al. - 1991 |

5 |
Fault-tolerant circuit-switching networks
- Pippenger, Lin
- 1994
(Show Context)
Citation Context ...resented in [53]. References [4, 10, 11, 16, 17] show how to design a network H that contains G as a subnetwork even if H contains some faults. Algorithms for routing messages around faults appear in =-=[1, 6, 8, 15, 24, 25, 28, 34, 36, 41, 43, 44, 49]-=-. The fault-tolerance of sorting networks is studied in [7, 32]. Finally, [12, 56, 57] show how to perform certain computations in hypercubes containing faults. 1.5. Network definitions. In this secti... |

4 | Fault-tolerant meshes with minimal numbers of spares. Proceedings of the 3rd IEEE Symposium on Parallel and Distributed ProSince a t-DFT matric graph for Dn(k) is also a t-EFT cessing - Bruck, Cypher, et al. - 1991 |

4 | Highly fault-tolerant sorting circuits - Leighton, Ma, et al. - 1991 |

4 | Expected fault-tolerance of hypercubes - Livingston, Stout |

4 | On a class of rearrangeable switching networks--part II: Enumeration studies and fault diagnosis - Opferman, Tsao-Wu - 1971 |

4 |
Efficient Embeddings and Simulations for Hypercubic Networks
- Schwabe
- 1991
(Show Context)
Citation Context ...n additional property of normal algorithms is that they can be emulated efficiently by bounded-degree networks such as the shuffle-exchange network and the butterfly. We state a result due to Schwabe =-=[Sch91]-=- to this effect. Lemma 4.4.1 An N-node butterfly can emulate any normal algorithm of an N-node hypercube with constant slowdown. We further require a well known result of the butterfly in the hypercub... |

3 | Asymptotically Optimal PRAM Emulation on Faulty Hypercubes - Aumann, Ben-Or - 1991 |