## Using structure indices for efficient approximation of network properties (2006)

Venue: | Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining |

Citations: | 13 - 1 self |

### BibTeX

@INPROCEEDINGS{Rattigan06usingstructure,

author = {Matthew J. Rattigan},

title = {Using structure indices for efficient approximation of network properties},

booktitle = {Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining},

year = {2006},

pages = {357--366},

publisher = {ACM}

}

### OpenURL

### Abstract

Statistics on networks have become vital to the study of relational data drawn from areas including bibliometrics, fraud detection, bioinformatics, and the Internet. Calculating many of the most important measures—such as betweenness centrality, closeness centrality, and graph diameter—requires identifying short paths in these networks. However, finding these short paths can be intractable for even moderate-size networks. We introduce the concept of a network structure index (NSI), a composition of (1) a set of annotations on every node in the network and (2) a function that uses the annotations to estimate graph distance between pairs of nodes. We present several varieties of NSIs, examine their time and space complexity, and analyze their performance on synthetic and real data sets. We show that creating an NSI for a given network enables extremely efficient and accurate estimation of a wide variety of network statistics on that network.

### Citations

2073 | On the evolution of random graphs
- Erdős, Rényi
- 1960
(Show Context)
Citation Context ...icate pathologically poor search performance. r i=1 We evaluate the NSIs from Section 2 on synthetic graphs of 10,000 nodes generated using three models: random networks as defined by Erdős and Rényi =-=[5]-=-, rewired lattices defined by Watts and Strogatz [24], and the Forest Fire graph model recently introduced by Leskovic [14]. (See Appendix A for more detail on the network generation procedures.) In F... |

895 |
Collective dynamics of /`small-world/' networks." Nature 393(6684
- Watts, Strogatz
- 1998
(Show Context)
Citation Context ...ortunately, this strategy performs ! rather poorly in practice. Many of today’s “small-world” data sets are characterized by small diameters due to the existence of “short cut” links in the graph [11]=-=[24]-=-. As a result, a found path that passes through a landmark often forms two sides of a triangle, resulting in artificially long paths.s2.4 ZONES The ZONE NSI utilizes multiple dimensions, where each di... |

687 | The small-world phenomenon: An algorithmic perspective
- KLEINBERG
- 2000
(Show Context)
Citation Context ... Unfortunately, this strategy performs ! rather poorly in practice. Many of today’s “small-world” data sets are characterized by small diameters due to the existence of “short cut” links in the graph =-=[11]-=-[24]. As a result, a found path that passes through a landmark often forms two sides of a triangle, resulting in artificially long paths.s2.4 ZONES The ZONE NSI utilizes multiple dimensions, where eac... |

629 | Centrality in social networks: Conceptual clarification
- Freeman
- 1979
(Show Context)
Citation Context ...nness centrality—the proportion of all shortest paths in the network that run through a given node—and closeness centrality—the average distance from the given node to every other node in the network =-=[8]-=-. For example, centrality measures can help evaluate whether Mr. Bacon lies near the center of the Hollywood universe or Marc Maier Knowledge Discovery Laboratory Department of Computer Science Univer... |

560 | Predicting Internet Network Distance with Coordinates-Based Approaches
- Ng, Zhang
- 2002
(Show Context)
Citation Context ...I does not perform well in practice, as we show in Section 3. 2.3 LANDMARKS Previous work in network path finding has utilized a system of network landmarks to efficiently navigate graph structure [3]=-=[16]-=-. With this technique, we randomly designate a small number of nodes in the network to serve as navigational beacons. Then, we annotate nodes in the graph by flooding out from each landmark and record... |

539 | Learning probabilistic relational models
- Getoor, Friedman, et al.
- 2001
(Show Context)
Citation Context ...t of Computer Science University of Massachusetts Amherst jensen@cs.umass.edu whether he is near the periphery. Several researchers have used such measures to construct statistical models of networks =-=[9]-=-[15]. Recent work in knowledge discovery has begun to study very large networks, often comprising millions of nodes. Given networks of this size, even the most efficient algorithms for calculating net... |

337 | Graphs over time: densification laws, shrinking diameters and possible explanations
- Leskovec, Kleinberg, et al.
- 2005
(Show Context)
Citation Context ...s generated using three models: random networks as defined by Erdős and Rényi [5], rewired lattices defined by Watts and Strogatz [24], and the Forest Fire graph model recently introduced by Leskovic =-=[14]-=-. (See Appendix A for more detail on the network generation procedures.) In Figure 5, we compare the performance of DEGREE, LANDMARK, ZONE, and DTZ when implemented with increasing numbers of dimensio... |

335 | A Faster algorithm for Betweenness Centrality
- Brandes
- 2001
(Show Context)
Citation Context ...able. For example, the most efficient known algorithms for calculating betweenness centrality and closeness centrality are O(ne+n 2 logn), where n and e are the number of nodes and edges in the graph =-=[2]-=-. Ad hoc calculations that use basic path finding can have even higher complexity, as they require bidirectional breadth-first search. Figure 1: The average number of nodes explored by bidirectional b... |

258 |
Search in power-law networks
- Adamic, Lukose, et al.
- 2001
(Show Context)
Citation Context ...Is is provided by previous work that has shown that path finding can be surprisingly efficient in a network that exhibits homophily, the tendency of neighboring nodes to have similar attribute values =-=[1]-=-. Unfortunately, many networks do not “naturally” have attributes that exhibit homophily. However, we can synthetically generate and annotate any arbitrary graph with such an attribute and use it for ... |

231 |
Navigation in a small world
- Kleinberg
(Show Context)
Citation Context ...e.g., actors that sit between winners of Academy Awards for best picture and the IMDb’s “Bottom 100,” the worst 100 movies as voted by users of the Internet Movie Database). 5. RELATED WORK Kleinberg =-=[10]-=-[11] demonstrates the notion of similarity-based navigation in small-world networks. He demonstrates how the presence of network homophily can provide a gradient that guides search using local informa... |

168 | Virtual landmarks for the internet
- Tang, Crovella
- 2003
(Show Context)
Citation Context ...s/Internet community as a basis for determining network latency between hosts on the Internet. Most of the Internet coordinate approaches attempt to minimize network latency through extensions of GNP =-=[22]-=-[19][18]. Kleinberg provides a theoretical analysis and framework of all beacon-based strategies, such as GNP and others [12]. This mostly describes the effectiveness of triangulation (determining pos... |

152 | Meridian: A Lightweight Network Location Service without Virtual Coordinates
- Wong, Slivkins, et al.
- 2005
(Show Context)
Citation Context ... beacon-based approaches. Other strategies in the Internet domain have attempted to create network overlay structures, such as a rings-based approach that does not rely on selection of landmark nodes.=-=[26]-=- This concept has recently been explored theoretically as a technique for distance estimation and nearest neighbor searches by Slivkins [20] and Krauthgamer [13]. However, it is unclear how accurately... |

150 | Lighthouses for Scalable Distributed Location
- Pias, Crowcroft, et al.
- 2007
(Show Context)
Citation Context ...et community as a basis for determining network latency between hosts on the Internet. Most of the Internet coordinate approaches attempt to minimize network latency through extensions of GNP [22][19]=-=[18]-=-. Kleinberg provides a theoretical analysis and framework of all beacon-based strategies, such as GNP and others [12]. This mostly describes the effectiveness of triangulation (determining positions o... |

147 | Identity and search in social networks
- Watts, Dodds, et al.
(Show Context)
Citation Context ...sence of network homophily can provide a gradient that guides search using local information. Watts investigated a similar approach by constructing a hierarchical model from which to derive homophily.=-=[23]-=- In this paper, we present methods for creating such homophily in domains that may lack local information. We detail a number of ways in which this information can be obtained for both synthetic and r... |

136 | Big-bang simulation for embedding network distances in euclidean space
- Shavitt, Tankel
- 2003
(Show Context)
Citation Context ...ternet community as a basis for determining network latency between hosts on the Internet. Most of the Internet coordinate approaches attempt to minimize network latency through extensions of GNP [22]=-=[19]-=-[18]. Kleinberg provides a theoretical analysis and framework of all beacon-based strategies, such as GNP and others [12]. This mostly describes the effectiveness of triangulation (determining positio... |

127 | Navigating nets: simple algorithms for proximity search
- Krauthgamer, Lee
- 2004
(Show Context)
Citation Context ...ot rely on selection of landmark nodes.[26] This concept has recently been explored theoretically as a technique for distance estimation and nearest neighbor searches by Slivkins [20] and Krauthgamer =-=[13]-=-. However, it is unclear how accurately any of these strategies perform on domains other than the Internet or for the purposes of approximating network statistics. Additionally, our current work focus... |

119 |
Fast discovery of connection subgraphs
- Faloutsos, McCurley, et al.
- 2004
(Show Context)
Citation Context ...hors have pioneered work in this area by identifying efficient methods for finding connection subgraphs— sets of short paths between nodes—and for approximating the size of the neighborhood of a node.=-=[6]-=-[17] NSIs may provide an alternative way of representing much of the information needed for both of these tasks. 7. ACKNOWLEGEMENTS This research is supported by Lawrence Livermore National Laboratory... |

102 | ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs
- Palmer, Gibbons, et al.
- 2002
(Show Context)
Citation Context ...s have pioneered work in this area by identifying efficient methods for finding connection subgraphs— sets of short paths between nodes—and for approximating the size of the neighborhood of a node.[6]=-=[17]-=- NSIs may provide an alternative way of representing much of the information needed for both of these tasks. 7. ACKNOWLEGEMENTS This research is supported by Lawrence Livermore National Laboratory and... |

76 |
Triangulation and Embedding using Small Sets of Beacons
- Slivkins, Kleinberg, et al.
- 2004
(Show Context)
Citation Context ...e approaches attempt to minimize network latency through extensions of GNP [22][19][18]. Kleinberg provides a theoretical analysis and framework of all beacon-based strategies, such as GNP and others =-=[12]-=-. This mostly describes the effectiveness of triangulation (determining positions of uncertain nodes) in beacon-based approaches. Other strategies in the Internet domain have attempted to create netwo... |

67 | Dependency networks for relational data
- Neville, Jensen
- 2004
(Show Context)
Citation Context ...f Computer Science University of Massachusetts Amherst jensen@cs.umass.edu whether he is near the periphery. Several researchers have used such measures to construct statistical models of networks [9]=-=[15]-=-. Recent work in knowledge discovery has begun to study very large networks, often comprising millions of nodes. Given networks of this size, even the most efficient algorithms for calculating network... |

66 | Distance estimation and object location via rings of neighbors
- Slivkins
- 2005
(Show Context)
Citation Context ... approach that does not rely on selection of landmark nodes.[26] This concept has recently been explored theoretically as a technique for distance estimation and nearest neighbor searches by Slivkins =-=[20]-=- and Krauthgamer [13]. However, it is unclear how accurately any of these strategies perform on domains other than the Internet or for the purposes of approximating network statistics. Additionally, o... |

25 |
Algorithm 97
- Floyd
- 1962
(Show Context)
Citation Context ...into this table. While this strategy yields optimal results when searching for paths, in many cases it may be infeasible in terms of annotation complexity—the Floyd-Warshall algorithm runs in O(n 3 ) =-=[7]-=-, while more complex approaches using fast matrix multiplication can reduce the exponent to 2.376 [4]. Furthermore, APSP requires O(n 2 ) to store the distances themselves. Although APSP may seem triv... |

19 | Decentralized search in networks using homophily and degree disparity
- SIMSEK, AL
- 2005
(Show Context)
Citation Context ...date NSIs when nodes and links are added to the network so that dynamic graphs can be successfully indexed. Finally, we are investigating how to apply our own recent developments in network searching =-=[21]-=- to more effectively use NSI annotations to find short paths. We are actively exploring additional applications of network structure indices. Two of the most promising directions are finding connectio... |

3 | A graph search heuristic for shortest distance paths
- Chow
- 2004
(Show Context)
Citation Context ... NSI does not perform well in practice, as we show in Section 3. 2.3 LANDMARKS Previous work in network path finding has utilized a system of network landmarks to efficiently navigate graph structure =-=[3]-=-[16]. With this technique, we randomly designate a small number of nodes in the network to serve as navigational beacons. Then, we annotate nodes in the graph by flooding out from each landmark and re... |

2 |
Matrix muliplication via arithmetic progressions
- Coppersmith, Winograd
- 1990
(Show Context)
Citation Context ...t may be infeasible in terms of annotation complexity—the Floyd-Warshall algorithm runs in O(n 3 ) [7], while more complex approaches using fast matrix multiplication can reduce the exponent to 2.376 =-=[4]-=-. Furthermore, APSP requires O(n 2 ) to store the distances themselves. Although APSP may seem trivial, the use of structure indices is a general approach, not specific to a single implementation or a... |