Results 1  10
of
383
The linkprediction problem for social networks
 J. AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY
, 2007
"... Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the linkprediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a ne ..."
Abstract

Cited by 903 (6 self)
 Add to MetaCart
(Show Context)
Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the linkprediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a network. Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.
Scaling Personalized Web Search
 In Proceedings of the Twelfth International World Wide Web Conference
, 2002
"... Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized ..."
Abstract

Cited by 409 (2 self)
 Add to MetaCart
Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized views of importance  for example, importance scores can be biased according to a userspecified set of initially interesting pages. Computing and storing all possible personalized views in advance is impractical, as is computing personalized views at query time, since the computation of each view requires an iterative computation over the web graph. We present new graphtheoretical results, and a new technique based on these results, that encode personalized views as partial vectors. Partial vectors are shared across multiple personalized views, and their computation and storage costs scale well with the number of views.
Fast random walk with restart and its applications
 In ICDM ’06: Proceedings of the 6th IEEE International Conference on Data Mining
, 2006
"... How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captionin ..."
Abstract

Cited by 179 (19 self)
 Add to MetaCart
(Show Context)
How closely related are two nodes in a graph? How to compute this score quickly, on huge, diskresident, real graphs? Random walk with restart (RWR) provides a good relevance score between two nodes in a weighted graph, and it has been successfully used in numerous settings, like automatic captioning of images, generalizations to the “connection subgraphs”, personalized PageRank, and many more. However, the straightforward implementations of RWR do not scale for large graphs, requiring either quadratic space and cubic precomputation time, or slow response time on queries. We propose fast solutions to this problem. The heart of our approach is to exploit two important properties shared by many real graphs: (a) linear correlations and (b) blockwise, communitylike structure. We exploit the linearity by using lowrank matrix approximation, and the community structure by graph partitioning, followed by the ShermanMorrison lemma for matrix inversion. Experimental results on the Corel image and the DBLP dabasets demonstrate that our proposed methods achieve significant savings over the straightforward implementations: they can save several orders of magnitude in precomputation and storage cost, and they achieve up to 150x speed up with 90%+ quality preservation. 1
Optimizing web search using social annotations
 IN: WWW ’07
, 2007
"... This paper explores the use of social annotations to improve web search. Nowadays, many services, e.g. del.icio.us, have been developed for web users to organize and share their favorite web pages on line by using social annotations. We observe that the social annotations can benefit web search in t ..."
Abstract

Cited by 169 (2 self)
 Add to MetaCart
(Show Context)
This paper explores the use of social annotations to improve web search. Nowadays, many services, e.g. del.icio.us, have been developed for web users to organize and share their favorite web pages on line by using social annotations. We observe that the social annotations can benefit web search in two aspects: 1) the annotations are usually good summaries of corresponding web pages; 2) the count of annotations indicates the popularity of web pages. Two novel algorithms are proposed to incorporate the above information into page ranking: 1) SocialSimRank (SSR) calculates the similarity between social annotations and web queries; 2) SocialPageRank (SPR) captures the popularity of web pages. Preliminary experimental results show that SSR can find the latent semantic association between queries and annotations, while SPR successfully measures the quality (popularity) of a web page from the web users ’ perspective. We further evaluate the proposed methods empirically with 50 manually constructed queries and 3000 autogenerated queries on a dataset crawled from del.icio.us. Experiments show that both SSR and SPR benefit web search significantly.
Truth discovery with multiple conflicting information providers on the web
 In Proc. 2007 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases (KDD’07
, 2007
"... The worldwide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting information on a subject, such as different specifications for the same prod ..."
Abstract

Cited by 129 (26 self)
 Add to MetaCart
The worldwide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites often provide conflicting information on a subject, such as different specifications for the same product. In this paper we propose a new problem called Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. Our experiments show that TruthFinder successfully finds true facts among conflicting information, and identifies trustworthy web sites better than the popular search engines.
Dooren, “Measure of similarity between graph vertices: Applications to synonym extraction and web searching
 SIAM Review, Society for Industrial and Applied Mathematics
, 2004
"... Abstract. We introduce a concept of similarity between vertices of directed graphs. Let G A and G B be two directed graphs with respectively n A and n B vertices. We define a n A × n B similarity matrix S whose real entry s ij expresses how similar vertex i (in G A ) is to vertex j (in G B ) : we s ..."
Abstract

Cited by 109 (4 self)
 Add to MetaCart
(Show Context)
Abstract. We introduce a concept of similarity between vertices of directed graphs. Let G A and G B be two directed graphs with respectively n A and n B vertices. We define a n A × n B similarity matrix S whose real entry s ij expresses how similar vertex i (in G A ) is to vertex j (in G B ) : we say that s ij is their similarity score. In the special case where G A = G B = G, the score s ij is the similarity score between the vertices i and j of G and the square similarity matrix S is the selfsimilarity matrix of the graph G. We point out that Kleinberg's "hub and authority" method to identify webpages relevant to a given query can be viewed as a special case of our definition in the case where one of the graphs has two vertices and a unique directed edge between them. In analogy to Kleinberg, we show that our similarity scores are given by the components of a dominant vector of a nonnegative matrix and we propose a simple iterative method to compute them. Potential applications of our similarity concept are manifold and we illustrate one application for the automatic extraction of synonyms in a monolingual dictionary.
A survey on pagerank computing
 Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract

Cited by 106 (0 self)
 Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Graph Clustering Based on Structural/Attribute Similarities
"... The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph cl ..."
Abstract

Cited by 99 (7 self)
 Add to MetaCart
(Show Context)
The goal of graph clustering is to partition vertices in a large graph into different clusters based on various criteria such as vertex connectivity or neighborhood similarity. Graph clustering techniques are very useful for detecting densely connected groups in a large graph. Many existing graph clustering methods mainly focus on the topological structure for clustering, but largely ignore the vertex properties which are often heterogenous. In this paper, we propose a novel graph clustering algorithm, SACluster, based on both structural and attribute similarities through a unified distance measure. Our method partitions a large graph associated with attributes into k clusters so that each cluster contains a densely connected subgraph with homogeneous attribute values. An effective method is proposed to automatically learn the degree of contributions of structural similarity and attribute similarity. Theoretical analysis is provided to show that SACluster is converging. Extensive experimental results demonstrate the effectiveness of SACluster through comparison with the stateoftheart graph clustering and summarization methods. 1.
Exploiting hierarchical domain structure to compute similarity
 ACM Trans. Inf. Syst
"... The notion of similarity between objects nds use in many contexts, e.g., in search engines, collaborative ltering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set intersection. Intersectionbased measures do not accurately ..."
Abstract

Cited by 86 (0 self)
 Add to MetaCart
The notion of similarity between objects nds use in many contexts, e.g., in search engines, collaborative ltering, and clustering. Objects being compared often are modeled as sets, with their similarity traditionally determined based on set intersection. Intersectionbased measures do not accurately capture similarity in certain domains, such as when the data is sparse or when there are known relationships between items within sets. We propose new measures that exploit a hierarchical domain structure in order to produce more intuitive similarity scores. We also extend our similarity measures to provide appropriate results in the presence of multisets (also handled unsatisfactorily by traditional measures), e.g., to correctly compute the similarity between customers who buy several instances of the same product (say milk), or who buy several products in the same category (say dairy products). We also provide an experimental comparison of our measures against traditional similarity measures, and describe an informal user study that evaluated how well our measures match human intuition. 1
Rankingbased clustering of heterogeneous information networks with star network schema
 In: Proc. 2009 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2009
, 2009
"... A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on ..."
Abstract

Cited by 85 (30 self)
 Add to MetaCart
(Show Context)
A heterogeneous information network is an information network composed of multiple types of objects. Clustering on such a network may lead to better understanding of both hidden structures of the network and the individual role played by every object in each cluster. However, although clustering on homogeneous networks has been studied over decades, clustering on heterogeneous networks has not been addressed until recently. A recent study proposed a new algorithm, RankClus, for clustering on bityped heterogeneous networks. However, a realworld network may consist of more than two types, and the interactions among multityped objects play a key role at disclosing the rich semantics that a network carries. In this paper, we study clustering of multityped heterogeneous networks with a star network schema and propose a novel algorithm, NetClus, that utilizes links across multityped objects to generate highquality netclusters. An iterative enhancement method is developed that leads to effective rankingbased clustering in such heterogeneous networks. Our experiments on DBLP data show that NetClus generates more accurate clustering results than the baseline topic model algorithm PLSA and the recently proposed algorithm, RankClus. Further, NetClus generates informative clusters, presenting good ranking and cluster membership information for each attribute object in each netcluster.