Results 1  10
of
209
The linkprediction problem for social networks
 J. AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY
, 2007
"... Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the linkprediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a ne ..."
Abstract

Cited by 906 (6 self)
 Add to MetaCart
Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the linkprediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a network. Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.
Algorithms for estimating relative importance in networks
 In Proceedings of KDD 2003
, 2003
"... Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which e ..."
Abstract

Cited by 138 (4 self)
 Add to MetaCart
(Show Context)
Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which entities are most important in the network relative to a particular individual or set of individuals? ” We investigate the problem of answering such queries in this paper, focusing in particular on defining and computing the importance of nodes in a graph relative to one or more root nodes. We define a general framework and a number of different algorithms, building on ideas from social networks, graph theory, Markov models, and Web graph analysis. We experimentally evaluate the different properties of these algorithms on toy graphs and demonstrate how our approach can be used to study relative importance in realworld networks including a network of interactions among September 11th terrorists, a network of collaborative research in biotechnology among companies and universities, and a network of coauthorship relationships among computer science researchers.
Graph mining: laws, generators, and algorithms
 ACM COMPUT SURV (CSUR
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in ..."
Abstract

Cited by 132 (7 self)
 Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs? ” To answer this, we must first understand what patterns are common in realworld graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
New perspectives and methods in link prediction
 In KDD
, 2010
"... This paper examines important factors for link prediction in networks and provides a general, highperformance framework for the prediction task. Link prediction in sparse networks presents a significant challenge due to the inherent disproportion of links that can form to links that do form. Previo ..."
Abstract

Cited by 91 (10 self)
 Add to MetaCart
(Show Context)
This paper examines important factors for link prediction in networks and provides a general, highperformance framework for the prediction task. Link prediction in sparse networks presents a significant challenge due to the inherent disproportion of links that can form to links that do form. Previous research has typically approached this as an unsupervised problem. While this is not the first work to explore supervised learning, many factors significant in influencing and guiding classification remain unexplored. In this paper, we consider these factors by first motivating the use of a supervised framework through a careful investigation of issues such as network observationalperiod, generality of existing methods, variance reduction, topological causes and degrees of imbalance, and sampling approaches. We also present an effective flowbased predicting algorithm, offer formal bounds on imbalance in sparse network link prediction, and employ an evaluation method appropriate for the observed imbalance. Our careful consideration of the above issues ultimately leads to a completely general framework that outperforms unsupervised link prediction methods by more than 30 % AUC.
Effects of missing data in social networks
 Social Networks
, 2003
"... We perform sensitivity analyses to assess the impact of missing data on the structural properties of social networks. The social network is conceived of as being generated by a bipartite graph, in which actors are linked together via multiple interaction contexts or affiliations. We discuss three pr ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
(Show Context)
We perform sensitivity analyses to assess the impact of missing data on the structural properties of social networks. The social network is conceived of as being generated by a bipartite graph, in which actors are linked together via multiple interaction contexts or affiliations. We discuss three principal missing data mechanisms: network boundary specification (noninclusion of actors or affiliations), survey nonresponse, and censoring by vertex degree (fixed choice design), examining their impact on the scientific collaboration network from the Los Alamos Eprint Archive as well as random bipartite graphs. The simulation results show that network boundary specification and fixed choice designs can dramatically alter estimates of networklevel statistics. The observed clustering and assortativity coefficients are overestimated via omission of affiliations or fixed choice thereof, and underestimated via actor nonresponse, which results in inflated measurement error. We also find that social networks with multiple interaction contexts may have certain interesting properties due to the presence of overlapping cliques. In particular, assortativity by degree does not necessarily improve network robustness to random omission of nodes as predicted by current theory.
Parallel algorithms for evaluating centrality indices in realworld networks
 In Proceedings of the International Conference on Parallel Processing (ICPP
, 2006
"... This paper discusses fast parallel algorithms for evaluating several centrality indices frequently used in complex network analysis. These algorithms have been optimized to exploit properties typically observed in realworld large scale networks, such as the low average distance, high local density, ..."
Abstract

Cited by 54 (11 self)
 Add to MetaCart
(Show Context)
This paper discusses fast parallel algorithms for evaluating several centrality indices frequently used in complex network analysis. These algorithms have been optimized to exploit properties typically observed in realworld large scale networks, such as the low average distance, high local density, and heavytailed power law degree distributions. We test our implementations on real datasets such as the web graph, proteininteraction networks, movieactor and citation networks, and report impressive parallel performance for evaluation of the computationally intensive centrality metrics (betweenness and closeness centrality) on highend shared memory symmetric multiprocessor and multithreaded architectures. To our knowledge, these are the first parallel implementations of these widelyused social network analysis metrics. We demonstrate that it is possible to rigorously analyze networks three orders of magnitude larger than instances that can be handled by existing network analysis (SNA) software packages. For instance, we compute the exact betweenness centrality value for each vertex in a large US patent citation network (3 million patents, 16 million citations) in 42 minutes on 16 processors, utilizing 20GB RAM of the IBM p5 570. Current SNA packages on the other hand cannot handle graphs with more than hundred thousand edges. 1
BRAHMS: A WorkBench RDF Store and High Performance Memory System for Semantic Association Discovery
 In ISWC
"... Abstract. Discovery of semantic associations in Semantic Web ontologies is an important task in various analytical activities. Several query languages and storage systems have been designed and implemented for storage and retrieval of information in RDF ontologies. However, they are inadequate for s ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
(Show Context)
Abstract. Discovery of semantic associations in Semantic Web ontologies is an important task in various analytical activities. Several query languages and storage systems have been designed and implemented for storage and retrieval of information in RDF ontologies. However, they are inadequate for semantic association discovery. In this paper we present the design and implementation of BRAHMS, an efficient RDF storage system, specifically designed to support fast semantic association discovery in large RDF bases. We present memory usage and timing results of several tests performed with BRAHMS and compare them to similar tests performed using Jena, Sesame, and Redland, three of the wellknown RDF storage systems. Our results show that BRAHMS handles basic association discovery well, while the RDF query languages and even the lowlevel APIs in the other three tested systems are not suitable for the implementation of semantic association discovery algorithms. 1
CrimeNet explorer: a framework for criminal network knowledge discovery
 ACM Transactions on Information Systems (TOIS
, 2005
"... Knowledge about the structure and organization of criminal networks is important for both crime investigation and the development of effective strategies to prevent crimes. However, except for network visualization, criminal network analysis remains primarily a manual process. Existing tools do not ..."
Abstract

Cited by 43 (7 self)
 Add to MetaCart
Knowledge about the structure and organization of criminal networks is important for both crime investigation and the development of effective strategies to prevent crimes. However, except for network visualization, criminal network analysis remains primarily a manual process. Existing tools do not provide advanced structural analysis techniques that allow extraction of network knowledge from large volumes of criminaljustice data. To help law enforcement and intelligence agencies discover criminal network knowledge efficiently and effectively, in this research we proposed a framework for automated network analysis and visualization. The framework included four stages: network creation, network partition, structural analysis, and network visualization. Based upon it, we have developed a system called CrimeNet Explorer that incorporates several advanced techniques: a concept space approach, hierarchical clustering, social network analysis methods, and multidimensional scaling. Results from controlled experiments involving student subjects demonstrated that our system could achieve higher clustering recall and precision than did untrained subjects when detecting subgroups from criminal networks. Moreover, subjects identified central members and interaction patterns between groups significantly faster with the help of structural analysis functionality than with only visualization functionality. No significant gain in
Approximating Betweenness Centrality
, 2007
"... Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationallyexpensive to exactly determine betweenness; currently the fastestknown algorithm by Brandes requires O(nm) time for unweighted graphs and O(nm + n 2 log n) time for weighted ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
Betweenness is a centrality measure based on shortest paths, widely used in complex network analysis. It is computationallyexpensive to exactly determine betweenness; currently the fastestknown algorithm by Brandes requires O(nm) time for unweighted graphs and O(nm + n 2 log n) time for weighted graphs, where n is the number of vertices and m is the number of edges in the network. These are also the worstcase time bounds for computing the betweenness score of a single vertex. In this paper, we present a novel approximation algorithm for computing betweenness centrality of a given vertex, for both weighted and unweighted graphs. Our approximation algorithm is based on an adaptive sampling technique that significantly reduces the number of singlesource shortest path computations for vertices with high centrality. We conduct an extensive experimental study on realworld graph instances, and observe that our random sampling algorithm gives very good betweenness approximations for biological networks, road networks and web crawls.
Clique relaxations in social network analysis: The maximum kplex problem
, 2006
"... This paper introduces and studies the maximum kplex problem, which arises in social network analysis, but can also be used in several other important application areas, including wireless networks, telecommunications, and graphbased data mining. We establish NPcompleteness of the decision version ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
This paper introduces and studies the maximum kplex problem, which arises in social network analysis, but can also be used in several other important application areas, including wireless networks, telecommunications, and graphbased data mining. We establish NPcompleteness of the decision version of the problem on arbitrary graphs. An integer programming formulation is presented and basic polyhedral study of the problem is carried out. A branchandcut implementation is discussed and computational test results on the proposed benchmark instances and reallife scalefree graphs are also provided.