Results 1  10
of
141
Efficient Identification of Web Communities
 In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2000
"... We de ne a community on the web as a set of sites that have more links (in either direction) to members of the community than to nonmembers. Members of such a community can be eciently identi ed in a maximum ow / minimum cut framework, where the source is composed of known members, and the sink c ..."
Abstract

Cited by 228 (12 self)
 Add to MetaCart
We de ne a community on the web as a set of sites that have more links (in either direction) to members of the community than to nonmembers. Members of such a community can be eciently identi ed in a maximum ow / minimum cut framework, where the source is composed of known members, and the sink consists of wellknown nonmembers. A focused crawler that crawls to a xed depth can approximate community membership by augmenting the graph induced by the crawl with links to a virtual sink node. The effectiveness of the approximation algorithm is demonstrated with several crawl results that identify hubs, authorities, web rings, and other link topologies that are useful but not easily categorized. Applications of our approach include focused crawlers and search engines, automatic population of portal categories, and improved ltering.
A Survey and Comparison of PeertoPeer Overlay Network Schemes
 IEEE Communications Surveys and Tutorials
, 2005
"... Abstract — Over the Internet today, computing and communications environments are significantly more complex and chaotic than classical distributed systems, lacking any centralized organization or hierarchical control. There has been much interest in emerging PeertoPeer (P2P) network overlays beca ..."
Abstract

Cited by 147 (0 self)
 Add to MetaCart
Abstract — Over the Internet today, computing and communications environments are significantly more complex and chaotic than classical distributed systems, lacking any centralized organization or hierarchical control. There has been much interest in emerging PeertoPeer (P2P) network overlays because they provide a good substrate for creating largescale data sharing, content distribution and applicationlevel multicast applications. These P2P networks try to provide a long list of features such as: selection of nearby peers, redundant storage, efficient search/location of data items, data permanence or guarantees, hierarchical naming, trust and authentication, and, anonymity. P2P networks potentially offer an efficient routing architecture that is selforganizing, massively scalable, and robust in the widearea, combining fault tolerance, load balancing and explicit notion of locality. In this paper, we present a survey and comparison of various Structured and Unstructured P2P networks. We categorize the various schemes into these two groups in the design spectrum and discuss the applicationlevel network performance of each group.
Statistical properties of community structure in large social and information networks
"... A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structur ..."
Abstract

Cited by 120 (10 self)
 Add to MetaCart
A large body of work has been devoted to identifying community structure in networks. A community is often though of as a set of nodes that has more connections between its members than to the remainder of the network. In this paper, we characterize as a function of size the statistical and structural properties of such sets of nodes. We define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales, and we study over 70 large sparse realworld networks taken from a wide range of application domains. Our results suggest a significantly more refined picture of community structure in large realworld networks than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually “blend in ” with the rest of the network and thus become less “communitylike.” This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have served as testbeds of community detection algorithms. We have found, however, that a generative model, in which new edges are added via an iterative “forest fire” burning process, is able to produce graphs exhibiting a network community structure similar to our observations.
The LargeScale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth
 Cognitive Science
"... We present statistical analyses of the largescale structure of three types of semantic networks: word associations, WordNet, and Roget's thesaurus. We show that they have a smallworld structure, characterized by sparse connectivity, short average pathlengths between words, and strong local clu ..."
Abstract

Cited by 118 (1 self)
 Add to MetaCart
We present statistical analyses of the largescale structure of three types of semantic networks: word associations, WordNet, and Roget's thesaurus. We show that they have a smallworld structure, characterized by sparse connectivity, short average pathlengths between words, and strong local clustering. In addition, the distributions of the number of connections follow power laws that indicate a scalefree pattern of connectivity, with most nodes having relatively few connections joined together through a small number of hubs with many connections. These regularities have also been found in certain other complex natural networks, such as the world wide web, but they are not consistent with many conventional models of semantic organization, based on inheritance hierarchies, arbitrarily structured networks, or highdimensional vector spaces. We propose that these structures reflect the mechanisms by which semantic networks grow. We describe a simple model for semantic growth, in which each new word or concept is connected to an existing network by differentiating the connectivity pattern of an existing node. This model generates appropriate smallworld statistics and powerlaw connectivity distributions, and also suggests one possible mechanistic basis for the effects of learning history variables (ageofacquisition, usage frequency) on behavioral performance in semantic processing tasks.
Automatic Multimedia Crossmodal Correlation Discovery
, 2004
"... Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graph ..."
Abstract

Cited by 100 (16 self)
 Add to MetaCart
Given an image (or video clip, or audio song), how do we automatically assign keywords to it? The general problem is to find correlations across the media in a collection of multimedia objects like video clips, with colors, and/or motion, and/or audio, and/or text scripts. We propose a novel, graphbased approach, "MMG", to discover such crossmodal correlations. Our
Computing communities in large networks using random walks
 J. of Graph Alg. and App. bf
, 2004
"... Dense subgraphs of sparse graphs (communities), which appear in most realworld complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advan ..."
Abstract

Cited by 94 (2 self)
 Add to MetaCart
Dense subgraphs of sparse graphs (communities), which appear in most realworld complex networks, play an important role in many contexts. Computing them however is generally expensive. We propose here a measure of similarities between vertices based on random walks which has several important advantages: it captures well the community structure in a network, it can be computed efficiently, and it can be used in an agglomerative algorithm to compute efficiently the community structure of a network. We propose such an algorithm, called Walktrap, which runs in time O(mn 2) and space O(n 2) in the worst case, and in time O(n 2 log n) and space O(n 2) in most realworld cases (n and m are respectively the number of vertices and edges in the input graph). Extensive comparison tests show that our algorithm surpasses previously proposed ones concerning the quality of the obtained community structures and that it stands among the best ones concerning the running time.
Ranking on Data Manifolds
 Advances in Neural Information Processing Systems 16
, 2004
"... The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. ..."
Abstract

Cited by 93 (1 self)
 Add to MetaCart
The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.
Characterization of complex networks: A survey of measurements
 Advances in Physics
"... Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of mea ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
Each complex network (or class of networks) presents specific topological features which characterize its connectivity and highly influence the dynamics and function of processes executed on the network. The analysis, discrimination, and synthesis of complex networks therefore rely on the use of measurements capable of expressing the most relevant topological features. This article presents a survey of such measurements. It includes general considerations about complex network characterization, a brief review of the principal models, and the presentation of the main existing measurements organized into classes. Special attention is given to relating complex network analysis with the areas of pattern recognition and feature selection, as well as on surveying some concepts and measurements from traditional graph theory which are potentially useful for complex network research. Depending on the network and the analysis task one has in mind, a specific set of features may be chosen. It is hoped that the present survey will help the
PowerLaws and the ASlevel Internet Topology
 IEEE/ACM Transactions on Networking
, 2003
"... In this paper, we study and characterize the topology of the Internet at the Autonomous System level. First, we show that the topology can be described efficiently with powerlaws. The elegance and simplicity of the powerlaws provide a novel perspective into the seemingly uncontrolled Internet struc ..."
Abstract

Cited by 87 (10 self)
 Add to MetaCart
In this paper, we study and characterize the topology of the Internet at the Autonomous System level. First, we show that the topology can be described efficiently with powerlaws. The elegance and simplicity of the powerlaws provide a novel perspective into the seemingly uncontrolled Internet structure. Second, we show that powerlaws appear consistently over the last 5 years. We also observe that the powerlaws hold even in the most recent and more complete topology [10] with correlation coefficient above 99% for the degree powerlaw. In addition, we study the evolution of the powerlaw exponents over the 5 year interval and observe a variation for the degree based powerlaw of less than 10%. Third, we provide relationships between the exponents and other topological metrics.
Community structure in large networks: Natural cluster sizes and the absence of large welldefined clusters
, 2008
"... A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins wit ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
A large body of work has been devoted to defining and identifying clusters or communities in social and information networks, i.e., in graphs in which the nodes represent underlying social entities and the edges represent some sort of interaction between pairs of nodes. Most such research begins with the premise that a community or a cluster should be thought of as a set of nodes that has more and/or better connections between its members than to the remainder of the network. In this paper, we explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. Rather than defining a procedure to extract sets of nodes from a graph and then attempt to interpret these sets as a “real ” communities, we employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the “best ” possible community—according to the conductance measure—over a wide range of size scales. We study over 100 large realworld networks, ranging from traditional and online social networks, to technological and information networks and