Authoritative Sources in a Hyperlinked Environment
 JOURNAL OF THE ACM
, 1999
Abstract
Cited by 2788
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authoritative ” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages ” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for linkbased analysis.
The linkprediction problem for social networks
 J. American Society for Information Science and Technology
Abstract
Cited by 499
Given a snapshot of a social network, can we infer which new interactions among its members are likely to occur in the near future? We formalize this question as the linkprediction problem, and we develop approaches to link prediction based on measures for analyzing the “proximity” of nodes in a network. Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures. 1
Focused crawling: a new approach to topicspecific Web resource discovery
, 1999
Abstract
Cited by 497
The rapid growth of the WorldWide Web poses unprecedented scaling challenges for generalpurpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a predefined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible adhoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more uptodate. To achieve such goaldirected crawling, we designed two hypertext mining programs that guide our crawler: a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, ...
Document Language Models, Query Models, and Risk Minimization for Information Retrieval
 In Proceedings of SIGIR’01
, 2001
A computational model of trust and reputation
 In Proceedings of the 35th Hawaii International Conference on System Science (HICSS
, 2002
Abstract
Cited by 131
Despite their many advantages, eBusinesses lag behind brick and mortar businesses in several fundamental respects. This paper concerns one of these: relationships based on trust and reputation. Recent studies on simple reputation systems for eBusinesses such as eBay have pointed to the importance of such rating systems for deterring moral hazard and encouraging trusting interactions. However, despite numerous studies on trust and reputation systems, few have taken studies across disciplines to provide an integrated account of these concepts and their relationships. This paper first surveys existing literatures on trust, reputation and a related concept: reciprocity. Based on sociological and biological understandings of these concepts, a computational model is proposed. This model can be implemented in a real system to consistently calculate agents ’ trust and reputation scores. 1.
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
Abstract
Cited by 122
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
Algorithms for estimating relative importance in networks
 In Proceedings of KDD 2003
, 2003
Abstract
Cited by 101
Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which entities are most important in the network relative to a particular individual or set of individuals? ” We investigate the problem of answering such queries in this paper, focusing in particular on defining and computing the importance of nodes in a graph relative to one or more root nodes. We define a general framework and a number of different algorithms, building on ideas from social networks, graph theory, Markov models, and Web graph analysis. We experimentally evaluate the different properties of these algorithms on toy graphs and demonstrate how our approach can be used to study relative importance in realworld networks including a network of interactions among September 11th terrorists, a network of collaborative research in biotechnology among companies and universities, and a network of coauthorship relationships among computer science researchers.
A survey on pagerank computing
 Internet Mathematics
, 2005
Abstract
Cited by 68
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Centrality in valued graphs: A measure of betweenness based on network flow
, 1991
Abstract
Cited by 59
A new measure of centrality, C,, is introduced. It is based on the concept of network flows. While conceptually similar to Freeman’s original measure, Ca, the new measure differs from the original in two important ways. First, C, is defined for both valued and nonvalued graphs. This makes C, applicable to a wider variety of network datasets. Second, the computation of C, is not based on geodesic paths as is C, but on all the independent paths between all pairs of points in the network.
Eigenvectorlike measures of centrality for asymmetric relations
 Social Networks
Abstract
Cited by 52
Eigenvectors of adjacency matrices are useful as measures of centrality or of status. However, they are misapplied to asymmetric networks in which some positions are unchosen. For these networks, an alternative measure of centrality is suggested that equals an eigenvector when eigenvectors can be used and provides meaningfully comparable results when they cannot. © 2001 Elsevier Science B.V. All rights reserved. JEL classification: C00 General mathematical and quantitative methods