## An Algebraic Approach to Practical and Scalable Overlay Network Monitoring (2004)

### Cached

### Download Links

Venue: | IN ACM SIGCOMM |

Citations: | 82 - 9 self |

### BibTeX

@INPROCEEDINGS{Chen04analgebraic,

author = {Yan Chen and David Bindel and Hanhee Song and Randy Katz},

title = {An Algebraic Approach to Practical and Scalable Overlay Network Monitoring},

booktitle = {IN ACM SIGCOMM},

year = {2004},

pages = {55--66},

publisher = {ACM Press}

}

### Years of Citing Articles

### OpenURL

### Abstract

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with n end hosts, existing systems either require O(n²) measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract [1] briefly proposes an algebraic approach that selectively monitors k linearly independent paths that can fully describe all the O(n²) paths. The loss rates and latency of these k paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal. In this

### Citations

2186 | Random early detection gateways for congestion avoidance
- Floyd, Jacobson
- 1993
(Show Context)
Citation Context ...sity of traffic and links makes large and long-lasting spatial link loss dependence unlikely in a real network such as the Internet [15]. Furthermore, the introduction of Random Early Detection (RED) =-=[16]-=- policies in routers will help break such dependence. In addition to [15], formula (1) has also been proven useful in many other link/path loss inference works [10, 9, 17, 14]. Our Internet experiment... |

1253 | On power-law relationships of the internet topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ...ILITY ANALYSIS An overlay monitoring system is scalable only when the size of the basis set, k, grows relatively slowly as a function of n. Given that the Internet has moderate hierarchical structure =-=[22, 23]-=-, we proved that the number of end hosts is no less than half of the total number of nodes in the Internet. Furthermore, we proved that when all the end hosts are on the overlay network, k = O(n) [1].... |

625 | Measuring isp topologies with rocketfuel
- Spring, Mahajan, et al.
(Show Context)
Citation Context ...to rank deficiency of the path matrix for overlay networks. As an example, consider an overlay within a single AS. The AS with the largest number of links (exclusive of customer and peering links) in =-=[26]-=-14000 12000 12000 original measurement regression on n regression on nlogn regression on n 1.25 regression on n 1.5 regression on n 1.75 10000 original measurement regression on n regression on nlogn... |

531 | Parallel numerical linear algebra
- Demmel, Heath, et al.
- 1993
(Show Context)
Citation Context ... hour [20], so these measurements need not be taken simultaneously. Given measured values for ¯ b, we compute a solution xG using the QR decomposition we constructed during measurement path selection =-=[18, 21]-=-. We choose the unique solution xG with minimum possible norm by imposing the constraint xG = ¯ G T y where y = R −1 R −T¯ b.OncewehavexG, wecan compute b = GxG, and from there infer the loss rates of... |

528 | Predicting Internet network distance with coordinates-based approaches
- Ng, Zhang
(Show Context)
Citation Context ...ntally deployable. However, existing network distance estimation systems are insufficient for this end. These existing systems can be categorized as general metric systems [3] and latencyonly systems =-=[4, 5, 6, 7]-=-. Systems in the former category can measure any metric, but require O(n 2 )measurements where n is the number of end hosts, and thus lack scalability. On the other hand, the latency estimation system... |

331 | Heuristics for internet map discovery
- Govindan, Tangmunarunkit
- 2000
(Show Context)
Citation Context ...us System) hierarchy. We experiment with three types of BRITE [24] routerlevel topologies - Barabasi-Albert, Waxman and hierarchical models - as well as with a real router topology with 284,805 nodes =-=[25]-=-. For hierarchical topologies, BRITE first generates an autonomous system (AS) level topology with a Barabasi-Albert model or a Waxman model. Then for each AS, BRITE generates the router-level topolog... |

252 | Multicast-based Inference of Network-internal Loss Characteristics
- Caceres, Duffield, et al.
- 1999
(Show Context)
Citation Context ...loss is independent among links. Caceres et al. argue that the diversity of traffic and links makes large and long-lasting spatial link loss dependence unlikely in a real network such as the Internet =-=[15]-=-. Furthermore, the introduction of Random Early Detection (RED) [16] policies in routers will help break such dependence. In addition to [15], formula (1) has also been proven useful in many other lin... |

247 | On the constancy of internet path properties
- Zhang, Duffield, et al.
- 2001
(Show Context)
Citation Context ...anges or link failures can affect multiple paths in G. Previous studies have shown that end-to-end Internet paths generally tend to be stable for significant lengths of time, e.g., for at least a day =-=[29, 30]-=-. So we can incrementally measure the topology to detect changes. Each end host measures the paths to all other end hosts daily, and for each end host, such measurement load can be evenly distributed ... |

144 | Detecting Shared Congestion of Flows Via End-to-end Measurement - Rubenstein, Kurose, et al. - 2000 |

110 | Internet tomography
- Coates, Nowak, et al.
(Show Context)
Citation Context ...ion for its members. Similarly, the coordinates assigned to each end host in the coordinate-based approaches cannot embed any congestion/failure information. Network tomography has been well studied (=-=[8]-=- provides a good survey). Most tomography systems assume limited measurements are available (often in a multicast tree-like structure), and try to infer link characteristics [9, 10] or shared congesti... |

98 |
et al. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
- Barrett
- 1994
(Show Context)
Citation Context ... implementation is not optimized: we can speed up node deletion by processing several paths simultaneously, and we can speed up path addition and deletion with iterative methods such as CGNE or GMRES =-=[32]-=-. Since the time to add/delete apathisO(k 2 ), and to add/delete a node is O(nk 2 ), we expect our updating scheme to be substantially faster than the O(n 2 k 2 ) cost of re-initialization for larger ... |

76 | Server-based inference of Internet link lossiness
- Padmanabhan, Qiu, et al.
(Show Context)
Citation Context ...has been well studied ([8] provides a good survey). Most tomography systems assume limited measurements are available (often in a multicast tree-like structure), and try to infer link characteristics =-=[9, 10]-=- or shared congestion [11] in the middle of the network. However, the problem is under-constrained: there exist unidenOverlay Network Operation Center End hosts Figure 1: Architecture of a TOM system.... |

51 | Computing the unmeasured: an algebraic approach to internet mapping
- Shavitt, Sun, et al.
- 2004
(Show Context)
Citation Context ...ot concerned about the characteristics of individual links, and we do not restrict the paths we measure. Shavitt, et al. also use algebraic tools to compute distances that are not explicitly measured =-=[12]-=-. Given certain “Tracer” stations deployed and some direct measurements among the Tracers, they search for path or path segments whose loss rates can be inferred from these measurements. Thus their fo... |

36 | Matrix Algorithms I: Basic Decompositions - Stewart - 1998 |

31 |
et al, “Resilient Overlay Networks
- Andersen
- 2001
(Show Context)
Citation Context ...hich is accurate and incrementally deployable. However, existing network distance estimation systems are insufficient for this end. These existing systems can be categorized as general metric systems =-=[3]-=- and latencyonly systems [4, 5, 6, 7]. Systems in the former category can measure any metric, but require O(n 2 )measurements where n is the number of end hosts, and thus lack scalability. On the othe... |

30 | Tomography-based overlay network monitoring
- Chen, Bindel, et al.
(Show Context)
Citation Context ...twork with n end hosts, existing systems either require O(n 2 ) measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract =-=[1]-=- briefly proposes an algebraic approach that selectively monitors k linearly independent paths that can fully describe all the O(n 2 ) paths. The loss rates and latency of these k paths can be used to... |

27 |
Scriptroute: A facility for distributed internet measurement
- Spring, Wetherall, et al.
- 2003
(Show Context)
Citation Context ... paths is 0.16 second. 8.2.2 Topology error handling The limitation of traceroute, which we use to measure the topology among the end hosts, led to many topology measurement inaccuracies. As found in =-=[34]-=-, many of the routers on the paths among PlanetLab nodes have aliases. We did not use sophisticated techniques to resolve these aliases. Thus, the topology we have is far from accurate. Furthermore, i... |

25 | On the Cost-Quality Tradeoff in Topology-Aware Overlay Path probing
- Tang, Mckinley
(Show Context)
Citation Context ...ch is not applicable for loss rate because it is difficult to estimate link-by-link loss rates from end-to-end measurement. A similar approach was taken for selecting paths to measure overlay network =-=[14]-=-. The minimal set cover selected can only gives bounds for metrics like latency, and there is no guarantee as to how far the bounds are from the real values. Furthermore, none of the existing work exa... |

25 |
End-to-end internet packet dynamics
- Paxon
- 1999
(Show Context)
Citation Context ...e loss rate of the link. For a Gilbert model, the link fluctuates between a good state (no packet dropped) and a bad state (all packets dropped). According to Paxon’s observed measurement of Internet =-=[31]-=-, the probability of remaining in bad state is set to be 35% as in [10]. Thus, the Gilbert model is more likely to generate bursty losses than the Bernoulli model. The other state transition probabili... |

21 |
End-to-end routing behavior in the internet
- Paxon
- 1997
(Show Context)
Citation Context ...anges or link failures can affect multiple paths in G. Previous studies have shown that end-to-end Internet paths generally tend to be stable for significant lengths of time, e.g., for at least a day =-=[29, 30]-=-. So we can incrementally measure the topology to detect changes. Each end host measures the paths to all other end hosts daily, and for each end host, such measurement load can be evenly distributed ... |

14 |
et al., “IDMaps: A Global Internet Host Distance Estimation Service
- Francis
- 2001
(Show Context)
Citation Context ...ntally deployable. However, existing network distance estimation systems are insufficient for this end. These existing systems can be categorized as general metric systems [3] and latencyonly systems =-=[4, 5, 6, 7]-=-. Systems in the former category can measure any metric, but require O(n 2 )measurements where n is the number of end hosts, and thus lack scalability. On the other hand, the latency estimation system... |

14 |
On the origin of power laws
- Medina, Matta, et al.
- 2000
(Show Context)
Citation Context ... for reasonably large n (e.g, 100). We explain it based on the power-law degree distribution of the Internet topology and the AS (Autonomous System) hierarchy. We experiment with three types of BRITE =-=[24]-=- routerlevel topologies - Barabasi-Albert, Waxman and hierarchical models - as well as with a real router topology with 284,805 nodes [25]. For hierarchical topologies, BRITE first generates an autono... |

13 |
et al. Network topology generators: Degree-based vs structural
- Tangmunarunkit
- 2002
(Show Context)
Citation Context ...ILITY ANALYSIS An overlay monitoring system is scalable only when the size of the basis set, k, grows relatively slowly as a function of n. Given that the Internet has moderate hierarchical structure =-=[22, 23]-=-, we proved that the number of end hosts is no less than half of the total number of nodes in the Internet. Furthermore, we proved that when all the end hosts are on the overlay network, k = O(n) [1].... |

11 |
et al., Topologically-Aware Overlay Construction and Server Selection
- Ratnasamy, Handley, et al.
- 2002
(Show Context)
Citation Context ...ntally deployable. However, existing network distance estimation systems are insufficient for this end. These existing systems can be categorized as general metric systems [3] and latencyonly systems =-=[4, 5, 6, 7]-=-. Systems in the former category can measure any metric, but require O(n 2 )measurements where n is the number of end hosts, and thus lack scalability. On the other hand, the latency estimation system... |

10 |
Characterizing the Internet hierarchy from multiple vantage points
- Subrmanian, Agarwal, et al.
- 2002
(Show Context)
Citation Context .... Thus given y is normally much less than n and can be viewed as a constant, only O(n) paths need to be measured for the O(n 2 ) cross-AS paths. Now consider an overlay on multiple ASes. According to =-=[27]-=-, there are only 20 ASes (tier-1 providers) which form the dense core of the Internet. These ASes are connected almost as a clique, while the rest of the ASes have far less dense peering connectivity.... |

9 |
et al., “On the constancy of Internet path properties
- Zhang
(Show Context)
Citation Context ...o the underdetermined linear system ¯ GxG = ¯ b. The vector ¯ b comes from measurements of the paths. Zhang et al. report that path loss rates remain operationally stable in the time scale of an hour =-=[20]-=-, so these measurements need not be taken simultaneously. Given measured values for ¯ b, we compute a solution xG using the QR decomposition we constructed during measurement path selection [18, 21]. ... |

6 | Steps toward an iterative rank-revealing method
- Meyer, Pierce
- 1995
(Show Context)
Citation Context ...asurement errors. Both simulation and real Internet implementation yield promising results. For even more efficient monitored path selection, we plan to investigate the use of iterative methods [32], =-=[35]-=- both to select rows and to compute loss rate vectors. In our preliminary experiments, the path matrix G has been wellconditioned, which suggests that iterative methods may converge quickly. 10. ACKNO... |

4 |
Toward a Scalable, Adaptive and Network-aware Content Distribution Network
- Chen
- 2003
(Show Context)
Citation Context ...router-level topologies with another Barabasi-Albert model or Waxman model. So there are four types of possible topologies. We show one of them as an example because they all have similar trends (see =-=[2]-=- for complete results). We randomly select end hosts which have the least degree (i.e., leaf nodes) to form an overlay network. We test by linear regression of k on O(n), O(n log n), O(n 1.25 ), O(n 1... |

3 |
Towsley et al., Multicastbased loss inference with missing data
- Duffield, Horowitz, et al.
(Show Context)
Citation Context ...ion of Random Early Detection (RED) [16] policies in routers will help break such dependence. In addition to [15], formula (1) has also been proven useful in many other link/path loss inference works =-=[10, 9, 17, 14]-=-. Our Internet experiments also show that the link loss dependence has little effect on the accuracy of (1). We take logarithms on both sides of (1). Then by defining acolumnvectorx∈ s with elements x... |

2 |
et al., “On the stability of network distance estimation
- Chen
- 2002
(Show Context)
Citation Context |

2 |
Ozmutlu et al., “Managing end-to-end network performance via optimized monitoring strategies
- C
(Show Context)
Citation Context ... is not on Tracer/path selection. Recently, Ozmutlu, et al. selected a minimal subset of paths to cover all links for monitoring, assuming link-by-link latency is available via end-to-end measurement =-=[13]-=-. But the link-by-link latency obtained from traceroute is often inaccurate. And their approach is not applicable for loss rate because it is difficult to estimate link-by-link loss rates from end-to-... |