Results 1  10
of
106
Consistency of spectral clustering
, 2004
"... Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spe ..."
Abstract

Cited by 289 (15 self)
 Add to MetaCart
Consistency is a key property of statistical algorithms, when the data is drawn from some underlying probability distribution. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of a popular family of spectral clustering algorithms, which cluster the data with the help of eigenvectors of graph Laplacian matrices. We show that one of the two of major classes of spectral clustering (normalized clustering) converges under some very general conditions, while the other (unnormalized), is only consistent under strong additional assumptions, which, as we demonstrate, are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering in practical applications. We believe that methods used in our analysis will provide a basis for future exploration of Laplacianbased methods in a statistical setting.
Graph sparsification by effective resistances
 SIAM J. Comput
"... We present a nearlylinear time algorithm that produces highquality sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ǫ> 0, we produce a weighted subgraph H = (V, ˜ E, ˜w) of G such that  ˜ E  = O(n log n/ǫ 2) and for all vectors x ∈ R V (1 − ǫ) ∑ (x ..."
Abstract

Cited by 64 (4 self)
 Add to MetaCart
We present a nearlylinear time algorithm that produces highquality sparsifiers of weighted graphs. Given as input a weighted graph G = (V, E, w) and a parameter ǫ> 0, we produce a weighted subgraph H = (V, ˜ E, ˜w) of G such that  ˜ E  = O(n log n/ǫ 2) and for all vectors x ∈ R V (1 − ǫ) ∑ (x(u) − x(v)) 2 wuv ≤ ∑ (x(u) − x(v)) 2 ˜wuv ≤ (1 + ǫ) ∑ (x(u) − x(v)) 2 wuv. (1) uv∈E uv ∈ ˜ E This improves upon the sparsifiers constructed by Spielman and Teng, which had O(n log c n) edges for some large constant c, and upon those of Benczúr and Karger, which only satisfied (1) for x ∈ {0, 1} V. We conjecture the existence of sparsifiers with O(n) edges, noting that these would generalize the notion of expander graphs, which are constantdegree sparsifiers for the complete graph. A key ingredient in our algorithm is a subroutine of independent interest: a nearlylinear time algorithm that builds a data structure from which we can query the approximate effective resistance between any two vertices in a graph in O(log n) time. uv∈E
On Social Networks and Collaborative Recommendation
"... Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimediaenriched data that are enhanced both by explicit userprovided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
Social network systems, like last.fm, play a significant role in Web 2.0, containing large amounts of multimediaenriched data that are enhanced both by explicit userprovided annotations and implicit aggregated feedback describing the personal preferences of each user. It is also a common tendency for these systems to encourage the creation of virtual networks among their users by allowing them to establish bonds of friendship and thus provide a novel and direct medium for the exchange of data. We investigate the role of these additional relationships in developing a track recommendation system. Taking into account both the social annotation and friendships inherent in the social graph established among users, items and tags, we created a collaborative recommendation system that effectively adapts to the personal information needs of each user. We adopt the generic framework of Random Walk with Restarts in order to provide with a more natural and efficient way to represent social networks. In this work we collected a representative enough portion of the music social network last.fm, capturing explicitly expressed bonds of friendship of the user as well as social tags. We performed a series of comparison experiments between the Random Walk with Restarts model and a userbased collaborative filtering method using the Pearson Correlation similarity. The results show that the graph model system benefits from the additional information embedded in social knowledge. In addition, the graph model outperforms the standard collaborative filtering method.
The slashdot zoo: Mining a social network with negative edges
 In WWW
, 2009
"... christian.bauckhage ..."
Audience selection for online brand advertising: privacyfriendly social network targeting
 In KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2009
"... This paper describes and evaluates privacyfriendly methods for extracting quasisocial networks from browser behavior on usergenerated content sites, for the purpose of finding good audiences for brand advertising (as opposed to click maximizing, for example). Targeting socialnetwork neighbors re ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper describes and evaluates privacyfriendly methods for extracting quasisocial networks from browser behavior on usergenerated content sites, for the purpose of finding good audiences for brand advertising (as opposed to click maximizing, for example). Targeting socialnetwork neighbors resonates well with advertisers, and online browsing behavior data counterintuitively can allow the identification of good audiences anonymously. Besides being one of the first papers to our knowledge on data mining for online brand advertising, this paper makes several important contributions. We introduce a framework for evaluating brand audiences, in analogy to predictivemodeling holdout evaluation. We introduce methods for extracting quasisocial networks from data on visitations to social networking pages, without collecting any information on the identities of the browsers or the content of the socialnetwork pages. We introduce measures of brand proximity in the network, and show that audiences with high brand proximity indeed show substantially higher brand affinity. Finally, we provide evidence that the quasisocial network embeds a true social network, which along with results from social theory offers one explanation for the increases in audience brand affinity.
A Comprehensive Survey of Neighborhoodbased Recommendation Methods
, 2011
"... Among collaborative recommendation approaches, methods based on nearestneighbors still enjoy a huge amount of popularity, due to their simplicity, their efficiency, and their ability to produce accurate and personalized recommendations. This chapter presents a comprehensive survey of neighborhoodb ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Among collaborative recommendation approaches, methods based on nearestneighbors still enjoy a huge amount of popularity, due to their simplicity, their efficiency, and their ability to produce accurate and personalized recommendations. This chapter presents a comprehensive survey of neighborhoodbased methods for the item recommendation problem. In particular, the main benefits of such methods, as well as their principal characteristics, are described. Furthermore, this document addresses the essential decisions that are required while implementing a neighborhoodbased recommender system, and gives practical information on how to make such decisions. Finally, the problems of sparsity and limited coverage, often observed in large commercial recommender systems, are discussed, and a few solutions to overcome these problems are presented.
Learning Spectral Graph Transformations for Link Prediction
"... We present a unified framework for learning link prediction and edge weight prediction functions in large networks, based on the transformation of a graph’s algebraic spectrum. Our approach generalizes several graph kernels and dimensionality reduction methods and provides a method to estimate their ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
We present a unified framework for learning link prediction and edge weight prediction functions in large networks, based on the transformation of a graph’s algebraic spectrum. Our approach generalizes several graph kernels and dimensionality reduction methods and provides a method to estimate their parameters efficiently. We show how the parameters of these prediction functions can be learned by reducing the problem to a onedimensional regression problem whose runtime only depends on the method’s reduced rank and that can be inspected visually. We derive variants that apply to undirected, weighted, unweighted, unipartite and bipartite graphs. We evaluate our method experimentally using examples from social networks, collaborative filtering, trust networks, citation networks, authorship graphs and hyperlink networks. 1.
An experimental investigation of graph kernels on a collaborative recommendation task
 Proceedings of the 6th International Conference on Data Mining (ICDM 2006
, 2006
"... This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regul ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
This paper presents a survey as well as a systematic empirical comparison of seven graph kernels and two related similarity matrices (simply referred to as graph kernels), namely the exponential diffusion kernel, the Laplacian exponential diffusion kernel, the von Neumann diffusion kernel, the regularized Laplacian kernel, the commutetime kernel, the randomwalkwithrestart similarity matrix, and finally, three graph kernels introduced in this paper: the regularized commutetime kernel, the Markov diffusion kernel, and the crossentropy diffusion matrix. The kernelonagraph approach is simple and intuitive. It is illustrated by applying the nine graph kernels to a collaborativerecommendation task and to a semisupervised classification task, both on several databases. The graph methods compute proximity measures between nodes that help study the structure of the graph. Our comparisons suggest that the regularized commutetime and the Markov diffusion kernels perform best, closely followed by the regularized Laplacian kernel. 1
A SketchBased Distance Oracle for WebScale Graphs
"... We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer distance queries between pairs of nodes in real time. Since the standard shortest path algorithms are expensive, our approach moves t ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We study the fundamental problem of computing distances between nodes in large graphs such as the web graph and social networks. Our objective is to be able to answer distance queries between pairs of nodes in real time. Since the standard shortest path algorithms are expensive, our approach moves the timeconsuming shortestpath computation offline, and at query time only looks up precomputed values and performs simple and fast computations on these precomputed values. More specifically, during the offline phase we compute and store a small “sketch ” for each node in the graph, and at querytime we look up the sketches of the source and destination nodes and perform a simple computation using these two sketches to estimate the distance. Categories and Subject Descriptors G.2.2 [Graph Theory]: Graph algorithms, path and circuit problems