Results 1  10
of
64
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 302 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 122 (18 self)
 Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering
 Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence
, 2004
"... This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a princi ..."
Abstract

Cited by 69 (18 self)
 Add to MetaCart
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a principal components analysis of the graph. It is based on a Markovchain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the Principal Components Analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.
Metrics on state spaces
 Doc. Math
, 1999
"... This article is dedicated to Richard V. Kadison in anticipation of his completing his seventyfifth circumnavigation of the sun. Abstract. In contrast to the usual Lipschitz seminorms associated to ordinary metrics on compact spaces, we show by examples that Lipschitz seminorms on possibly noncommu ..."
Abstract

Cited by 39 (4 self)
 Add to MetaCart
This article is dedicated to Richard V. Kadison in anticipation of his completing his seventyfifth circumnavigation of the sun. Abstract. In contrast to the usual Lipschitz seminorms associated to ordinary metrics on compact spaces, we show by examples that Lipschitz seminorms on possibly noncommutative compact spaces are usually not determined by the restriction of the metric they define on the state space, to the extreme points of the state space. We characterize the Lipschitz norms which are determined by their metric on the whole state space as being those which are lower semicontinuous. We show that their domain of Lipschitz elements can be enlarged so as to form a dual Banach space, which generalizes the situation for ordinary Lipschitz seminorms. We give a characterization of the metrics on state spaces which come from Lipschitz seminorms. The natural (broader) setting for these results is provided by the “function spaces” of Kadison. A variety of methods for constructing Lipschitz seminorms is indicated. In noncommutative geometry (based on C ∗algebras), the natural way to specify a metric is by means of a suitable “Lipschitz seminorm”. This idea was first suggested by Connes [C1] and developed further in [C2, C3]. Connes pointed out [C1, C2] that from a Lipschitz seminorm one obtains in a simple way an ordinary metric on the state space of the C ∗algebra. This metric generalizes the Monge–Kantorovich metric on probability measures [KA, Ra, RR]. In this article we make more precise the relationship between metrics on the state space and Lipschitz seminorms. Let ρ be an ordinary metric on a compact space X. The Lipschitz seminorm, Lρ, determined by ρ is defined on functions f on X by (0.1) Lρ(f) = sup{f(x) − f(y)/ρ(x, y) : x ̸ = y}.
The slashdot zoo: Mining a social network with negative edges
 In WWW
, 2009
"... christian.bauckhage ..."
Electricity based external similarity of categorical attributes
 In PAKDD 2003
, 2003
"... Abstract. Similarity or distance measures are fundamental and critical properties for data mining tools. Categorical attributes abound in databases. The Car Make, Gender, Occupation, etc. fields in a automobile insurance database are very informative. Sadly, categorical data is not easily amenable t ..."
Abstract

Cited by 32 (10 self)
 Add to MetaCart
Abstract. Similarity or distance measures are fundamental and critical properties for data mining tools. Categorical attributes abound in databases. The Car Make, Gender, Occupation, etc. fields in a automobile insurance database are very informative. Sadly, categorical data is not easily amenable to similarity computations. A domain expert might manually specify some or all of the similarity relationships, but this is errorprone and not feasible for attributes with large domains, nor is it useful for crossattribute similarities, such as between Gender and Occupation. External similarity functions define a similarity between, say, Car Makes by looking at how they cooccur with the other categorical attributes. We exploit a rich duality between random walks on graphs and electrical circuits to develop REP, an external similarity function. REP is theoretically grounded while the only prior work was adhoc. The usefulness of REP is shown in two experiments. First, we cluster categorical attribute values showing improved inferred relationships. Second, we use REP effectively as a nearest neighbour classifier. 1
Prediction on a graph with the perceptron
 in Neural Information Processing Systems
, 2006
"... We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for onlin ..."
Abstract

Cited by 25 (6 self)
 Add to MetaCart
We study the problem of online prediction of a noisy labeling of a graph with the perceptron. We address both label noise and concept noise. Graph learning is framed as an instance of prediction on a finite set. To treat label noise we show that the hinge loss bounds derived by Gentile [1] for online perceptron learning can be transformed to relative mistake bounds with an optimal leading constant when applied to prediction on a finite set. These bounds depend crucially on the norm of the learned concept. Often the norm of a concept can vary dramatically with only small perturbations in a labeling. We analyze a simple transformation that stabilizes the norm under perturbations. We derive an upper bound that depends only on natural properties of the graph – the graph diameter and the cut size of a partitioning of the graph – which are only indirectly dependent on the size of the graph. The impossibility of such bounds for the graph geodesic nearest neighbors algorithm will be demonstrated. 1
Clustering Using a Random Walk Based Distance Measure
, 2005
"... This work proposes a simple way to improve a clustering algorithm. ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
This work proposes a simple way to improve a clustering algorithm.
A Novel Way of Computing Dissimilarities between Nodes of a Graph, with Application to Collaborative Filtering
, 2004
"... This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. The model assigns transition probabilities to the links betw ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markovchain model of random walk through the database. The model assigns transition probabilities to the links between elements, so that a random walker can jump from element to element. A quantity, called the average firstpassage cost, computes the average cost incurred by a random walker for reaching element k for the first time when starting from element i.
Online Prediction on Large Diameter Graphs
"... We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacianbased algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when ta ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacianbased algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when tackling simple problems. We overcome this drawback by means of an efficient algorithm which achieves a logarithmic mistake bound. It is based on the notion of a spine, a path graph which provides a linear embedding of the original graph. In practice, graphs may exhibit cluster structure; thus in the last part, we present a modified algorithm which achieves the “best of both worlds”: it performs well locally in the presence of cluster structure, and globally on large diameter graphs. 1