Results 1  10
of
11
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 128 (19 self)
 Add to MetaCart
(Show Context)
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
Graph nodes clustering with the sigmoid commutetime kernel: A . . .
 DATA & KNOWLEDGE ENGINEERING
, 2009
"... ..."
A LinkAnalysis Extension of Correspondence Analysis for Mining Relational Databases
"... Abstract—This work introduces a linkanalysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a randomwalk model through the database defining a Markov chain having as many states as elements ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Abstract—This work introduces a linkanalysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a randomwalk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain, is extracted by stochastic complementation [41]. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusionmap subspace [42] and visualizing the results. This twostep procedure reduces to simple correspondence analysis when only two tables are defined and to multiple correspondence analysis when the database takes the form of a simple starschema. On the other hand, a kernel version of the diffusionmap distance, generalizing the basic diffusionmap distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several datasets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs. I.
Constrained Text Clustering Using Word Trigrams
"... Abstract. In recent years there has emerged the field of Constrained Clustering, which proposes clustering algorithms which are able to accommodate domain information to obtain a better final grouping. This information is usually provided as pairwise constraints, whose acquisition from humans can be ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. In recent years there has emerged the field of Constrained Clustering, which proposes clustering algorithms which are able to accommodate domain information to obtain a better final grouping. This information is usually provided as pairwise constraints, whose acquisition from humans can be costly. In this paper we propose a novel method basedonwordngramstoautomaticallyextractpositiveconstraintsfrom text collections. Clustering experiments in text collections composed by different types of documents show that the constraints created with our method attain statistically significant improvements over the results obtained with constraints created using named entities and over the results of a highperforming nonconstrained algorithm.
unknown title
, 2006
"... A novel way of computing similarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes ..."
Abstract
 Add to MetaCart
(Show Context)
A novel way of computing similarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes
Linkbased Community Detection with the CommuteTime Kernel
, 2007
"... The main purpose of this work is to find communities in a weighted, undirected, graph by using kernelbased clustering methods, directly partitioning the graph according to a welldefined similarity measure between the nodes (a kernel on a graph). The algorithm is based on a twostep procedure. Firs ..."
Abstract
 Add to MetaCart
(Show Context)
The main purpose of this work is to find communities in a weighted, undirected, graph by using kernelbased clustering methods, directly partitioning the graph according to a welldefined similarity measure between the nodes (a kernel on a graph). The algorithm is based on a twostep procedure. First, the sigmoid commutetime kernel (KCT), providing a meaningful similarity measure between any couple of nodes, is computed from the adjacency matrix of the graph. Then, the nodes of the graph are clustered by performing a kernel clustering on this CT kernel matrix. For this purpose, simple, prototypebased, kernel versions of the kmeans, the fuzzy kmeans, the entropybased fuzzy kmeans, the gaussian mixtures model, as well as Ward’s hierarchical clustering, are introduced. The joint use of the CT kernel matrix and kernel clustering appears to be quite effective. Indeed, this methodology provides good results, outperforming the spherical kmeans and spectral clustering, on a document clustering problem involving the newsgroups database, where the set of documents is viewed as a graph. Finally, the links between the proposed hierarchical kernel clustering and spectral clustering are examined.
unknown title
, 2006
"... A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes ..."
Abstract
 Add to MetaCart
(Show Context)
A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes
Analyzing the Reduced Markov Chain with the Basic Diffusion Map 1
"... This work introduces a linkanalysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a randomwalk model through the database defining a Markov chain having as many states as elements in the da ..."
Abstract
 Add to MetaCart
This work introduces a linkanalysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a randomwalk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain is extracted by stochastic complementation. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusionmap subspace and visualizing the results. This twostep procedure reduces to simple correspondence analysis when only two tables are defined and to multiple correspondence analyses when the database takes the form of a simple star schema. On the other hand, a kernel version of the diffusionmap distance, generalizing the basic diffusionmap Distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several datasets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs. Keywords
Analyze Condensed MARKOV Link Chain with KDM PCA 1
"... This work introduces a linkanalysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a randomwalk model through the database defining a Markov chain having as many states as elements in the da ..."
Abstract
 Add to MetaCart
This work introduces a linkanalysis procedure for discovering relationships in a relational database or a graph, generalizing both simple and multiple correspondence analysis. It is based on a randomwalk model through the database defining a Markov chain having as many states as elements in the database. Suppose we are interested in analyzing the relationships between some elements (or records) contained in two different tables of the relational database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest and preserving the main characteristics of the initial chain is extracted by stochastic complementation. This reduced chain is then analyzed by projecting jointly the elements of interest in the diffusionmap subspace and visualizing the results. This twostep procedure reduces to simple correspondence analysis when only two tables are defined and to multiple correspondence analyses when the database takes the form of a simple star schema. On the other hand, a kernel version of the diffusionmap distance, generalizing the basic diffusionmap. Distance to directed graphs, is also introduced and the links with spectral clustering are discussed. Several datasets are analyzed by using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational databases or graphs.