Results 1  10
of
14
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 116 (14 self)
 Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
Diffusion maps and coarsegraining: A unified framework for dimensionality reduction, graph partitioning and data set parameterization
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2006
"... We provide evidence that nonlinear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to ..."
Abstract

Cited by 96 (5 self)
 Add to MetaCart
We provide evidence that nonlinear dimensionality reduction, clustering and data set parameterization can be solved within one and the same framework. The main idea is to define a system of coordinates with an explicit metric that reflects the connectivity of a given data set and that is robust to noise. Our construction, which is based on a Markov random walk on the data, offers a general scheme of simultaneously reorganizing and subsampling graphs and arbitrarily shaped data sets in high dimensions using intrinsic geometry. We show that clustering in embedding spaces is equivalent to compressing operators. The objective of data partitioning and clustering is to coarsegrain the random walk on the data while at the same time preserving a diffusion operator for the intrinsic geometry or connectivity of the data set up to some accuracy. We show that the quantization distortion in diffusion space bounds the error of compression of the operator, thus giving a rigorous justification for kmeans clustering in diffusion space and a precise measure of the performance of general clustering algorithms.
Commute Times for a Directed Graph using an Asymmetric Laplacian
"... The expected commute times for a strongly connected directed graph are related to an asymmetric Laplacian matrix as a direct extension to similar well known formulas for undirected graphs. We show the close relationships between the asymmetric Laplacian and the socalled Fundamental matrix. We give ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
The expected commute times for a strongly connected directed graph are related to an asymmetric Laplacian matrix as a direct extension to similar well known formulas for undirected graphs. We show the close relationships between the asymmetric Laplacian and the socalled Fundamental matrix. We give bounds for the commute times in terms of the stationary probabilities for a random walk over the graph together with the asymmetric Laplacian and show how this can be approximated by a symmetrized Laplacian derived from a related weighted undirected graph.
Spectral connectivity analysis
 Journal of the American Statistical Association
, 2010
"... Spectral kernel methods are techniques for transforming data into a coordinate system that efficiently reveals the geometric structure — in particular, the “connectivity”—of the data. These methods depend on certain tuning parameters. We analyze the dependence of the method on these tuning parameter ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Spectral kernel methods are techniques for transforming data into a coordinate system that efficiently reveals the geometric structure — in particular, the “connectivity”—of the data. These methods depend on certain tuning parameters. We analyze the dependence of the method on these tuning parameters. We focus on one particular technique—diffusion maps—but our analysis can be used for other methods as well. We identify the population quantities implicitly being estimated, we explain how these methods relate to classical kernel smoothing and we define an appropriate risk function for analyzing the estimators. We also show that, in some cases, fast rates of convergence are possible even in high dimensions. Key Words: graph Laplacian, kernels, manifold learning, spectral clustering, smoothing, diffusion maps
Combinatorial and algebraic tools for optimal multilevel algorithms
, 2007
"... This dissertation presents combinatorial and algebraic tools that enable the design of the first linear work parallel iterative algorithm for solving linear systems involving Laplacian matrices of planar graphs. The major departure of this work from prior suboptimal and inherently sequential approac ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
This dissertation presents combinatorial and algebraic tools that enable the design of the first linear work parallel iterative algorithm for solving linear systems involving Laplacian matrices of planar graphs. The major departure of this work from prior suboptimal and inherently sequential approaches is centered around: (i) the partitioning of planar graphs into fixed size pieces that share small boundaries, by means of a local ”bottomup ” approach that improves the customary ”topdown ” approach of recursive bisection, (ii) the replacement of monolithic global preconditioners by graph approximations that are built as aggregates of miniature preconditioners. In addition, we present extensions to the theory and analysis of Steiner tree preconditioners. We construct more general Steiner graphs that lead to natural linear time solvers for classes of graphs that are known a priori to have certain structural properties. We also present a graphtheoretic approach to classical algebraic multigrid algorithms. We show that their design can be
Neighbor Search with Global Geometry: A Minimax Message Passing Algorithm
"... Neighbor search is a fundamental task in machine learning, especially in classification and retrieval. Efficient nearest neighbor search methods have been widely studied, with their emphasis on data structures but most of them did not consider the underlying global geometry of a data set. Recent gra ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Neighbor search is a fundamental task in machine learning, especially in classification and retrieval. Efficient nearest neighbor search methods have been widely studied, with their emphasis on data structures but most of them did not consider the underlying global geometry of a data set. Recent graphbased semisupervised learning methods capture the global geometry, but suffer from scalability and parameter tuning problems. In this paper we present a (nearest) neighbor search method where the underlying global geometry is incorporated and the parameter tuning is not required. To this end, we introduce deterministic walks as a deterministic counterpart of Markov random walks, leading us to use the minimax distance as a global dissimilarity measure. Then we develop a message passing algorithm for efficient minimax distance calculation, which scales linearly in both time and space. Empirical study reveals the useful behavior of the method in image retrieval and semisupervised learning. 1.
A linear work, O(n^1/6) time, parallel algorithm for solving planar Laplacians
"... We present a linear work parallel iterative algorithm for solving linear systems involving Laplacians of planar graphs. In particular, if Ax = b, where A is the Laplacian of any planar graph with n nodes, the algorithm produces a vector ¯x such that x − ¯xA ≤ ɛ, in O(n 1/6+c log(1/ɛ)) parallel t ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We present a linear work parallel iterative algorithm for solving linear systems involving Laplacians of planar graphs. In particular, if Ax = b, where A is the Laplacian of any planar graph with n nodes, the algorithm produces a vector ¯x such that x − ¯xA ≤ ɛ, in O(n 1/6+c log(1/ɛ)) parallel time, doing O(n log(1/ɛ)) work, where c is any positive constant. One of the key ingredients of the solver, is an O(nk log 2 k) work, O(k log n) time, parallel algorithm for decomposing any embedded planar graph into components of size O(k) that are delimited by O(n / √ k) boundary edges. The result also applies to symmetric diagonally dominant matrices of planar structure.
Graphs and Networks Lecture 13 Effective Resistance
, 2010
"... We introduce the effective resistance between two vertices in a graph. It is a very useful measure of how close two vertices are. I becomes lower as one adds paths between vertices. Effective resistance proves very useful in many applications. Some cool theoretical papers that use it include [Tet91, ..."
Abstract
 Add to MetaCart
We introduce the effective resistance between two vertices in a graph. It is a very useful measure of how close two vertices are. I becomes lower as one adds paths between vertices. Effective resistance proves very useful in many applications. Some cool theoretical papers that use it include [Tet91, CRR + 96, KR93, GBS08, SS10]. Notable papers that use it in Machine Learning include [FPS05, YFD + 07]. Other treatments may be found in the books by Doyle and Snell [DS84] and Bollobas [Bol98]. 13.2 A Quick Review I want to quickly review the notation from last lecture. A potential/voltage on the vertices of a graph is denoted v. A flow on the edges of a graph is denoted i. The external flow at vertices is denoted i ext. The signed edgevertex adjacency matrix is U, the diagonal matix of edge weights is W, and the diagonal matrix of edge resistances is R. Finally, These satisfy the relations. L = u T W U. W = R −1 i = W U v i ext = U T i i ext = Lv. At the end of last class, I mentioned that the voltages v could be found from i ext by mutiplying i ext by the pseudoinverse of L, denoted L +. Let me quickly say a few words about the pseudoinverse of a symmetric matrix. For a symmetric matrix L, let Π denote the symmetric projection matrix onto the span of L. For a connected graph, this Π just projects a vector orthogonal to 1. Definition 13.2.1. The pseudoinverse of L, written L +, satisfies
Exploiting NonLinear Structure in Astronomical Data for Improved Statistical Inference
, 1111
"... Abstract Many estimation problems in astrophysics are highly complex, with highdimensional, nonstandard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transfo ..."
Abstract
 Add to MetaCart
Abstract Many estimation problems in astrophysics are highly complex, with highdimensional, nonstandard data objects (e.g., images, spectra, entire distributions, etc.) that are not amenable to formal statistical analysis. To utilize such data and make accurate inferences, it is crucial to transform the data into a simpler, reduced form. Spectral kernel methods are nonlinear data transformation methods that efficiently reveal the underlying geometry of observable data. Here we focus on one particular technique: diffusion maps or more generally spectral connectivity analysis (SCA). We give examples of applications in astronomy; e.g., photometric redshift estimation, prototype selection for estimation of star formation history, and supernova light curve classification. We outline some computational and statistical challenges that remain, and we discuss some promising future directions for astronomy and data mining. 1
Asymmetric Laplacian
"... The expected commute times for a strongly connected directed graph are related to an asymmetric Laplacian matrix as a direct extension to similar well known formulas for undirected graphs. We show the close relationships between the asymmetric Laplacian and the socalled Fundamental matrix. We give ..."
Abstract
 Add to MetaCart
The expected commute times for a strongly connected directed graph are related to an asymmetric Laplacian matrix as a direct extension to similar well known formulas for undirected graphs. We show the close relationships between the asymmetric Laplacian and the socalled Fundamental matrix. We give bounds for the commute times in terms of the stationary probabilities for a random walk over the graph together with the asymmetric Laplacian and show how this can be approximated by a symmetrized Laplacian derived from a related weighted undirected graph.