Results 1  10
of
35
Fast computation of simrank for static and dynamic information networks
 IN: EDBT
, 2010
"... Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influ ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
Information networks are ubiquitous in many applications and analysis on such networks has attracted significant attention in the academic communities. One of the most important aspects of information network analysis is to measure similarity between nodes in a network. SimRank is a simple and influential measure of this kind, based on a solid theoretical “random surfer ” model. Existing work computes SimRank similarity scores in an iterative mode. We argue that the iterative method can be infeasible and inefficient when, as in many realworld scenarios, the networks change dynamically and frequently. We envision noniterative method to bridge the gap. It allows users not only to update the similarity scores incrementally, but also to derive similarity scores for an arbitrary subset of nodes. To enable the noniterative computation, we propose to rewrite the SimRank equation into a noniterative form by using the Kronecker product and vectorization operators. Based on this, we develop a family of novel approximate SimRank computation algorithms for static and dynamic information networks, and give their corresponding theoretical justification and analysis. The noniterative method supports efficient processing of various node analysis including similarity tracking and centrality tracking on evolving information networks. The effectiveness and efficiency of our proposed methods are evaluated on synthetic and real data sets.
Parallel simrank computation on large graphs with iterative aggregation
 KDD'10
, 2010
"... Recently there has been a lot of interest in graphbased analysis. One of the most important aspects of graphbased analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing method ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
(Show Context)
Recently there has been a lot of interest in graphbased analysis. One of the most important aspects of graphbased analysis is to measure similarity between nodes in a graph. SimRank is a simple and influential measure of this kind, based on a solid graph theoretical model. However, existing methods on SimRank computation suffer from two limitations: 1) the computing cost can be very high in practice; and 2) they can only be applied on static graphs. In this paper, we exploit the inherent parallelism and high memory bandwidth of graphics processing units (GPU) to accelerate the computation of SimRank on large graphs. Furthermore, based on the observation that SimRank is essentially a firstorder Markov Chain, we propose to utilize the iterative aggregation techniques for uncoupling Markov chains to compute SimRank scores in parallel for large graphs. The iterative aggregation method can be applied on dynamic graphs. Moreover, it can handle not only the linkupdating problem but also the nodeupdating problem. Extensive experiments on synthetic and real data sets verify that the proposed methods are efficient and effective.
Link Prediction on Evolving Data using Matrix and Tensor Factorizations
 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS
, 2009
"... The data in many disciplines such as social networks, web analysis, etc. is linkbased, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the l ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
The data in many disciplines such as social networks, web analysis, etc. is linkbased, and the link structure can be exploited for many different data mining tasks. In this paper, we consider the problem of temporal link prediction: Given link data for time periods 1 through T, can we predict the links in time period T +1? Specifically, we look at bipartite graphs changing over time and consider matrix and tensorbased methods for predicting links. We present a weightbased method for collapsing multiyear data into a single matrix. We show how the wellknown Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural threedimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrixand tensorbased techniques are effective for temporal link prediction despite the inherent difficulty of the problem.
Community Detection Using a Measure of Global Influence
"... Abstract. The growing popularity of online social networks gave researchers access to large amount of network data and renewed interest in methods for automatic community detection. Existing algorithms, including the popular modularityoptimization methods, look for regions of the network that are b ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The growing popularity of online social networks gave researchers access to large amount of network data and renewed interest in methods for automatic community detection. Existing algorithms, including the popular modularityoptimization methods, look for regions of the network that are better connected internally, e.g., have higher than expected number of edges within them. We believe, however, that edges do not give the true measure of network connectivity. Instead, we argue that influence, which we define as the number of paths, of any length, that exist between two nodes, gives a better measure of network connectivity. We use the influence metric to partition a network into groups or communities by looking for regions of the network where nodes have more influence over each other than over nodes outside the community. We evaluate our approach on several networks and show that it often outperforms the edgebased modularity algorithm. Key words: community, social networks, influence, modularity 1
NonNegative Residual Matrix Factorization with Application to Graph Anomaly Detection ∗
"... Given an IP sourcedestination traffic network, how do we spot misbehavioral IP sources (e.g., portscanner)? How do we find strange users in a usermovie rating graph? Moreover, how can we present the results intuitively so that it is relatively easier for data analysts to interpret? We propose Nr ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
(Show Context)
Given an IP sourcedestination traffic network, how do we spot misbehavioral IP sources (e.g., portscanner)? How do we find strange users in a usermovie rating graph? Moreover, how can we present the results intuitively so that it is relatively easier for data analysts to interpret? We propose NrMF, a nonnegative residual matrix factorization framework, to address such challenges. We present an optimization formulation as well as an effective algorithm to solve it. Our method can naturally capture abnormal behaviors on graphs. In addition, the proposed algorithm is linear wrt the size of the graph therefore it is suitable for large graphs. The experimental results on several data sets validate its effectiveness as well as efficiency. 1
ProximityBased Anomaly Detection using Sparse Structure Learning
"... We consider the task of performing anomaly detection in highly noisy multivariate data. In many applications involving realvalued timeseries data, such as physical sensor data and economic metrics, discovering changes and anomalies in the way variables depend on one another is of particular import ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We consider the task of performing anomaly detection in highly noisy multivariate data. In many applications involving realvalued timeseries data, such as physical sensor data and economic metrics, discovering changes and anomalies in the way variables depend on one another is of particular importance. Our goal is to robustly compute the “correlation anomaly ” score of each variable by comparing the test data with reference data, even when some of the variables are highly correlated (and thus collinearity exists). To remove seeming dependencies introduced by noise, we focus on the most significant dependencies for each variable. We perform this “neighborhood selection ” in an adaptive manner by fitting a sparse graphical Gaussian model. Instead of traditional covariance selection procedures, we solve this problem as maximum likelihood estimation of the precision matrix (inverse covariance matrix) under the L1 penalty. Then the anomaly score for each variable is computed by evaluating the distances between the fitted conditional distributions within the Markov blanket for that variable, for the (two) data sets to be compared. Using realworld data, we demonstrate that our matrixbased sparse structure learning approach successfully detects correlation anomalies under collinearities and heavy noise. 1
Real Time Discovery of Dense Clusters in Highly Dynamic Graphs: Identifying Real World Events in Highly Dynamic Environments
"... Due to their real time nature, microblog streams are a rich source of dynamic information, for example, about emerging events. Existing techniques for discovering such events from a microblog stream in real time (such as Twitter trending topics), have several lacunae when used for discovering emergi ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Due to their real time nature, microblog streams are a rich source of dynamic information, for example, about emerging events. Existing techniques for discovering such events from a microblog stream in real time (such as Twitter trending topics), have several lacunae when used for discovering emerging events; extant graph based event detection techniques are not practical in microblog settings due to their complexity; and conventional techniques, which have been developed for blogs, webpages, etc., involving the use of keyword search, are only useful for finding information about known events. Hence, in this paper, we present techniques to discover events that are unraveling in microblog message streams in real time so that such events can be reported as soon as they occur. We model the problem as discovering dense clusters in highly dynamic graphs. Despite many recent advances in graph analysis, ours is the first technique to identify dense clusters in massive and highly dynamic graphs in real time. Given the characteristics of microblog streams, in order to find clusters without missing any events, we propose and exploit a novel graph property which we call shortcycle property. Our algorithms find these clusters efficiently in spite of rapid changes to the microblog streams. Further we present a novel ranking function to identify the important events. Besides proving the correctness of our algorithms we show their practical utility by evaluating them using real world microblog data. These demonstrate our technique’s ability to discover, with high precision and recall, emerging events in high intensity data streams in real time. Many recent web applications create data which can be represented as massive dynamic graphs. Our technique can be easily extended to discover, in real time, interesting patterns in such graphs. 1.
Spotting significant changing subgraphs in evolving graphs
 In Proc. 2008 Int. Conf. Data Mining (ICDM’08
, 2008
"... Graphs are popularly used to model structural relationships between objects. In many application domains such as social networks, sensor networks and telecommunication, graphs evolve over time. In this paper, we study a new problem of discovering the subgraphs that exhibit significant changes in evo ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Graphs are popularly used to model structural relationships between objects. In many application domains such as social networks, sensor networks and telecommunication, graphs evolve over time. In this paper, we study a new problem of discovering the subgraphs that exhibit significant changes in evolving graphs. This problem is challenging since it is hard to define changing regions that are closely related to the actual changes (i.e., additions/deletions of edges/nodes) in graphs. We formalize the problem, and design an efficient algorithm that is able to identify the changing subgraphs incrementally. Our experimental results on real datasets show that our solution is very efficient and the resultant subgraphs are of high quality. 1.
Facilitating RealTime Graph Mining
"... Realtime data processing is increasingly gaining momentum as the preferred method for analytical applications. Many of these applications are built on top of large graphs with hundreds of millions of vertices and edges. A fundamental requirement for realtime processing is the ability to do increme ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Realtime data processing is increasingly gaining momentum as the preferred method for analytical applications. Many of these applications are built on top of large graphs with hundreds of millions of vertices and edges. A fundamental requirement for realtime processing is the ability to do incremental processing. However, graph algorithms are inherently difficult to compute incrementally due to data dependencies. At the same time, devising incremental graph algorithms is a challenging programming task. This paper introduces GraphInc, a system that builds on top of the Pregel model and provides efficient incremental processing of graphs. Importantly, GraphInc supports incremental computations automatically, hiding the complexity from the programmers. Programmers write graph analytics in the Pregel model without worrying about the continuous nature of the data. GraphInc integrates new data in realtime in a transparent manner, by automatically identifying opportunities for incremental processing. We discuss the basic mechanisms of GraphInc and report on the initial evaluation of our approach.
General
"... The data in many disciplines such as social networks, Web analysis, etc. is linkbased, and the link structure can be exploited for many different data mining tasks. In this article, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
The data in many disciplines such as social networks, Web analysis, etc. is linkbased, and the link structure can be exploited for many different data mining tasks. In this article, we consider the problem of temporal link prediction: Given link data for times 1 through T, can we predict the links at time T + 1? If our data has underlying periodic structure, can we predict out even further in time, i.e., links at time T + 2, T + 3, etc.? In this article, we consider bipartite graphs that evolve over time and consider matrixand tensorbased methods for predicting future links. We present a weightbased method for collapsing multiyear data into a single matrix. We show how the wellknown Katz method for link prediction can be extended to bipartite graphs and, moreover, approximated in a scalable way using a truncated singular value decomposition. Using a CANDECOMP/PARAFAC tensor decomposition of the data, we illustrate the usefulness of exploiting the natural threedimensional structure of temporal link data. Through several numerical experiments, we demonstrate that both matrix and tensorbased techniques are effective for temporal link prediction despite the inherent difficulty of the problem. Additionally, we show that tensorbased