Results 1  10
of
27
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 189 (19 self)
 Add to MetaCart
(Show Context)
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
The slashdot zoo: Mining a social network with negative edges
 In WWW
, 2009
"... christian.bauckhage ..."
(Show Context)
An experimental investigation of graph kernels on a collaborative recommendation task
 Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
(Show Context)
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commutetime kernel, the randomwalkwithrestart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commutetime kernel, the Markov diffusion kernel, and the crossentropy diffusion matrix. The kernelonagraph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborativerecommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commutetime and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1
Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization
"... We study the application of spectral clustering, prediction and visualization methods to graphs with negatively weighted edges. We show that several characteristic matrices of graphs can be extended to graphs with positively and negatively weighted edges, giving signed spectral clustering methods, s ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
(Show Context)
We study the application of spectral clustering, prediction and visualization methods to graphs with negatively weighted edges. We show that several characteristic matrices of graphs can be extended to graphs with positively and negatively weighted edges, giving signed spectral clustering methods, signed graph kernels and network visualization methods that apply to signed graphs. In particular, we review a signed variant of the graph Laplacian. We derive our results by considering random walks, graph clustering, graph drawing and electrical networks, showing that they all result in the same formalism for handling negatively weighted edges. We illustrate our methods using examples from social networks with negative edges and bipartite rating graphs. 1
A family of dissimilarity measures between nodes generalizing both the shortestpath and the commutetime distances
 in Proceedings of the 14th SIGKDD International Conference on Knowledge Discovery and Data Mining
"... This work introduces a new family of linkbased dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortestpath (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortestpath d ..."
Abstract

Cited by 25 (11 self)
 Add to MetaCart
(Show Context)
This work introduces a new family of linkbased dissimilarity measures between nodes of a weighted directed graph. This measure, called the randomized shortestpath (RSP) dissimilarity, depends on a parameter θ and has the interesting property of reducing, on one end, to the standard shortestpath distance when θ is large and, on the other end, to the commutetime (or resistance) distance when θ is small (near zero). Intuitively, it corresponds to the expected cost incurred by a random walker in order to reach a destination node from a starting node while maintaining a constant entropy (related to θ) spread in the graph. The parameter θ is therefore biasing gradually the simple random walk on the graph towards the shortestpath policy. By adopting a statistical physics approach and computing a sum over all the possible paths (discrete path integral), it is shown that the RSP dissimilarity from every node to a particular node of interest can be computed efficiently by solving two linear systems of n equations, where n is the number of nodes. On the other hand, the dissimilarity between every couple of nodes is obtained by inverting an n × n matrix. The proposed measure can be used for various graph mining tasks such as computing betweenness centrality, finding dense communities, etc, as shown in the experimental section.
Semisupervised Classification from Discriminative Random Walks ⋆
"... Abstract This paper describes a novel technique, called Dwalks, to tackle semisupervised classification problems in large graphs. We introduce here a betweenness measure based on passage times during random walks of bounded lengths. Such walks are further constrained to start and end in nodes with ..."
Abstract

Cited by 17 (6 self)
 Add to MetaCart
(Show Context)
Abstract This paper describes a novel technique, called Dwalks, to tackle semisupervised classification problems in large graphs. We introduce here a betweenness measure based on passage times during random walks of bounded lengths. Such walks are further constrained to start and end in nodes within the same class, defining a distinct betweenness for each class. Unlabeled nodes are classified according to the class showing the highest betweenness. Forward and backward recurrences are derived to efficiently compute the passage times. Dwalkscandealwithdirected or undirected graphs with a linear time complexity with respect to the number of edges, the maximum walk length considered and the number of classes. Experiments on various reallife databases show that Dwalks outperforms NetKit [5], the approach of Zhou and Schölkopf [15] and the regularized laplacian kernel [2]. The benefit of Dwalks is particularly noticeable when few labeled nodes are available. The computation time of Dwalks is also substantially lower in all cases. 1
An experimental investigation of kernels on graphs for collaborative . . .
 NEURAL NETWORKS
, 2012
"... ..."
A randomwalk based scoring algorithm with application to recommender systems for largescale ecommerce
 Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2006
"... Recommender systems are an emerging technology that helps consumers to find interesting products. A recommender system makes personalized product suggestions by extracting knowledge from the previous users interactions. In this paper, we present ”ItemRank”, a random–walk based scoring algorithm, whi ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Recommender systems are an emerging technology that helps consumers to find interesting products. A recommender system makes personalized product suggestions by extracting knowledge from the previous users interactions. In this paper, we present ”ItemRank”, a random–walk based scoring algorithm, which can be used to rank products according to expected user preferences, in order to recommend top– rank items to potentially interested users. We tested our algorithm on a standard database, the MovieLens data set, which contains data collected from a popular recommender system on movies, and we compared ItemRank with other stateoftheart ranking techniques (in particular the algorithms described in [1, 2]). Our experiments show that ItemRank performs better than the other algorithms we compared to and, at the same time, it is less complex than other proposed algorithms with respect to memory usage and computational cost too. The presentation of the method is accompanied by an analysis that helps to discover some intriguing properties of the MovieLens data set, that has been widely exploited as a benchmark for evaluating recently proposed approaches to recommender system (e.g. [1, 3]).
Algebraic Distance on Graphs
"... Measuring the connection strength between a pair of vertices in a graph is one of the most vital concerns in many graph applications. Simple measures such as edge weights may not be sufficient for capturing the local connectivity. In this paper, we consider a neighborhood of each graph vertex and pr ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Measuring the connection strength between a pair of vertices in a graph is one of the most vital concerns in many graph applications. Simple measures such as edge weights may not be sufficient for capturing the local connectivity. In this paper, we consider a neighborhood of each graph vertex and propagate a certain property value through direct neighbors. We present a measure of the connection strength (called the algebraic distance, see [21]) defined from an iterative process based on this consideration. The proposed measure is attractive in that the process is simple, linear, and easily parallelized. A rigorous analysis of the convergence property of the process confirms the underlying intuition that vertices are mutually reinforced and that the local neighborhoods play an important role in influencing the vertex connectivity. We demonstrate the practical effectiveness of the proposed measure through several combinatorial optimization problems on graphs and hypergraphs. 1
Regularized Laplacian estimation and fast eigenvector approximation
 In Advances in Neural Information Processing Systems 25: Proceedings of the 2011 Conference
, 2011
"... Recently, Mahoney and Orecchia demonstrated that popular diffusionbased procedures to compute a quick approximation to the first nontrivial eigenvector of a data graph Laplacian exactly solve certain regularized SemiDefinite Programs (SDPs). In this paper, we extend that result by providing a sta ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Recently, Mahoney and Orecchia demonstrated that popular diffusionbased procedures to compute a quick approximation to the first nontrivial eigenvector of a data graph Laplacian exactly solve certain regularized SemiDefinite Programs (SDPs). In this paper, we extend that result by providing a statistical interpretation of their approximation procedure. Our interpretation will be analogous to the manner in which `2regularized or `1regularized `2regression (often called Ridge regression and Lasso regression, respectively) can be interpreted in terms of a Gaussian prior or a Laplace prior, respectively, on the coefficient vector of the regression problem. Our framework will imply that the solutions to the MahoneyOrecchia regularized SDP can be interpreted as regularized estimates of the pseudoinverse of the graph Laplacian. Conversely, it will imply that the solution to this regularized estimation problem can be computed very quickly by running, e.g., the fast diffusionbased PageRank procedure for computing an approximation to the first nontrivial eigenvector of the graph Laplacian. Empirical results are also provided to illustrate the manner in which approximate eigenvector computation implicitly performs statistical regularization, relative to running the corresponding exact algorithm. 1