Results 1 
4 of
4
Panther: Fast Topk Similarity Search on Large Networks
, 2015
"... Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scal ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scale up to handle large networks (with billions of nodes). In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. The algorithm is based on a novel idea of random path. Specifically, given a network, we perform R random walks, each starting from a randomly picked vertex and walking T steps. Theoretically, the algorithm guarantees that the sampling size R = O(2ε−2 log2 T) depends on the errorbound ε, the confidence level (1 − δ), and the path length T of each random walk. We perform extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return topk similar vertices for any vertex in a network 300 × faster than the stateoftheart methods. We also use two applications—identity resolution and structural hole spanner finding—to evaluate the accuracy of the estimated similarities. Our results demonstrate that the proposed algorithm achieves clearly better performance than several alternative methods.
Panther: Fast Topk Similarity Search on Large Networks
 KDD'15
, 2015
"... Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scal ..."
Abstract
 Add to MetaCart
(Show Context)
Estimating similarity between vertices is a fundamental issue in network analysis across various domains, such as social networks and biological networks. Methods based on common neighbors and structural contexts have received much attention. However, both categories of methods are difficult to scale up to handle large networks (with billions of nodes). In this paper, we propose a sampling method that provably and accurately estimates the similarity between vertices. The algorithm is based on a novel idea of random path. Specifically, given a network, we perform R random walks, each starting from a randomly picked vertex and walking T steps. Theoretically, the algorithm guarantees that the sampling size R = O(2ε−2 log2 T) depends on the errorbound ε, the confidence level (1 − δ), and the path length T of each random walk. We perform extensive empirical study on a Tencent microblogging network of 1,000,000,000 edges. We show that our algorithm can return topk similar vertices for any vertex in a network 300 × faster than the stateoftheart methods. We also use two applications—identity resolution and structural hole spanner finding—to evaluate the accuracy of the estimated similarities. Our results demonstrate that the proposed algorithm achieves clearly better performance than several alternative methods.
Scalable Link Prediction in Dynamic Networks via NonNegative Matrix Factorization
"... We study temporal link prediction problem, where, given past interactions, our goal is to predict new interactions. We propose a dynamic link prediction method based on nonnegative matrix factorization. This method assumes that interactions are more likely between users that are similar to each oth ..."
Abstract
 Add to MetaCart
(Show Context)
We study temporal link prediction problem, where, given past interactions, our goal is to predict new interactions. We propose a dynamic link prediction method based on nonnegative matrix factorization. This method assumes that interactions are more likely between users that are similar to each other in the latent space representation. We propose a global optimization algorithm to effectively learn the temporal latent space with quadratic convergence rate and bounded error. In addition, we propose two alternative algorithms with local and incremental updates, which provide much better scalability without deteriorating prediction accuracy. We evaluate our model on a number of realworld dynamic networks and demonstrate that our model significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power. 1.
Role Discovery in Networks
"... Abstract Roles represent nodelevel connectivity patterns such as starcenter, staredge nodes, nearcliques or nodes that act as bridges to different regions of the graph. Intuitively, two nodes belong to the same role if they are structurally similar. Roles have been mainly of interest to sociolo ..."
Abstract
 Add to MetaCart
Abstract Roles represent nodelevel connectivity patterns such as starcenter, staredge nodes, nearcliques or nodes that act as bridges to different regions of the graph. Intuitively, two nodes belong to the same role if they are structurally similar. Roles have been mainly of interest to sociologists, but more recently, roles have become increasingly useful in other domains. Traditionally, the notion of roles were defined based on graph equivalences such as structural, regular, and stochastic equivalences. We briefly revisit the notions and instead propose a more general formulation of roles based on the similarity of a feature representation (in contrast to the graph representation). This leads us to propose a taxonomy of two general classes of techniques for discovering roles which includes (i) graphbased roles and (ii) featurebased roles. This survey focuses primarily on featurebased roles. In particular, we also introduce a flexible framework for discovering roles using the notion of structural similarity on a featurebased representation. The framework consists of two fundamental components: (1) role feature construction and (2) role assignment using the learned feature representation. We discuss the relevant decisions for discovering featurebased roles and highlight the advantages and disadvantages of the many techniques that can be used for this purpose. Finally, we discuss potential applications and future directions and challenges.