Results 1  10
of
235
Deeper inside pagerank
 Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract

Cited by 207 (4 self)
 Add to MetaCart
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
The Queryflow Graph: Model and Applications
, 2008
"... Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to searchengine results. Mining the wealth of information available in ..."
Abstract

Cited by 111 (19 self)
 Add to MetaCart
Query logs record the queries and the actions of the users of search engines, and as such they contain valuable information about the interests, the preferences, and the behavior of the users, as well as their implicit feedback to searchengine results. Mining the wealth of information available in the query logs has many important applications including querylog analysis, user profiling and personalization, advertising, query recommendation, and more. In this paper we introduce the queryflow graph, a graph representation of the interesting knowledge about latent querying behavior. Intuitively, in the queryflow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same “search mission”. Any path over the queryflow graph may be seen as a searching behavior, whose likelihood is given by the strength of the edges along the path. The queryflow graph is an outcome of querylog mining and, at the same time, a useful tool for it. We propose a methodology that builds such a graph by mining time and textual information as well as aggregating queries from different users. Using this approach we build a realworld queryflow graph from a largescale query log and we demonstrate its utility in concrete applications, namely, finding logical sessions, and query recommendation. We believe, however, that the usefulness of the queryflow graph goes beyond these two applications.
Query Suggestions Using QueryFlow Graphs
"... The queryflow graph [Boldi et al., CIKM 2008] is an aggregated representation of the latent querying behavior contained in a query log. Intuitively, in the queryflow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same search mission. Any pat ..."
Abstract

Cited by 52 (10 self)
 Add to MetaCart
The queryflow graph [Boldi et al., CIKM 2008] is an aggregated representation of the latent querying behavior contained in a query log. Intuitively, in the queryflow graph a directed edge from query qi to query qj means that the two queries are likely to be part of the same search mission. Any
A reference collection for Web spam
 SIGIR Forum
, 2006
"... We describe the WEBSPAMUK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by ..."
Abstract

Cited by 70 (13 self)
 Add to MetaCart
We describe the WEBSPAMUK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges. 1
Layered Label Propagation: A MultiResolution CoordinateFree Ordering for Compressing Social Networks ABSTRACT
"... We continue the line of research on graph compression started in [6], but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical ..."
Abstract

Cited by 73 (3 self)
 Add to MetaCart
We continue the line of research on graph compression started in [6], but we move our focus to the compression of social networks in a proper sense (e.g., LiveJournal): the approaches that have been used for a long time to compress web graphs rely on a specific ordering of the nodes (lexicographical URL ordering) whose extension to general social networks is not trivial. In this paper, we propose a solution that mixes clusterings and orders, and devise a new algorithm, called Layered Label Propagation, that builds on previous work on scalable clustering and can be used to reorder very large graphs (billions of nodes). Our implementation uses task decomposition to perform aggressively on multicore architecture, making it possible to reorder graphs of more than 600 millions nodes in a few hours. Experiments performed on a wide array of web graphs and social networks show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. These improvements make it possible to analyse in main memory significantly larger graphs. Categories and Subject Descriptors E.4 [Coding and information theory]: Data compaction and compression;
Efficient semistreaming algorithms for local triangle counting in massive graphs
 in KDD’08, 2008
"... In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node v ∈ V in the graph. The problem of computing the global number of triangles in a graph ha ..."
Abstract

Cited by 70 (4 self)
 Add to MetaCart
In this paper we study the problem of local triangle counting in large graphs. Namely, given a large graph G = (V, E) we want to estimate as accurately as possible the number of triangles incident to every node v ∈ V in the graph. The problem of computing the global number of triangles in a graph has been considered before, but to our knowledge this is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs. The distribution of the local number of triangles and the related local clustering coefficient can be used in many interesting applications. For example, we show that the measures we compute can help to detect the presence of spamming activity in largescale Web graphs, as well as to provide useful features to assess content quality in social networks. For computing the local number of triangles we propose two approximation algorithms, which are based on the idea of minwise independent permutations (Broder et al. 1998). Our algorithms operate in a semistreaming fashion, using O(V ) space in main memory and performing O(log V ) sequential scans over the edges of the graph. The first algorithm we describe in this paper also uses O(E) space in external memory during computation, while the second algorithm uses only main memory. We present the theoretical analysis as well as experimental results in massive graphs demonstrating the practical efficiency of our approach. Luca Becchetti was partially supported by EU Integrated
PageRank as a Function of the Damping Factor
, 2005
"... PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor # that spreads uniformly part of the rank. The choice of # is eminently empirical, and in most cases the original suggestion # = 0.85 ..."
Abstract

Cited by 59 (10 self)
 Add to MetaCart
PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor # that spreads uniformly part of the rank. The choice of # is eminently empirical, and in most cases the original suggestion # = 0.85 by Brin and Page is still used. Recently, however, the behaviour of PageRank with respect to changes in # was discovered to be useful in linkspam detection [21]. Moreover, an analytical justification of the value chosen for # is still missing. In this paper, we give the first mathematical analysis of PageRank when # changes. In particular, we show that, contrarily to popular belief, for realworld graphs values of # close to 1 do not give a more meaningful ranking. Then, we give closedform formulae for PageRank derivatives of any order, and an extension of the Power Method that approximates them with convergence O for the kth derivative. Finally, we show a tight connection between iterated computation and analytical behaviour by proving that the kth iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree k. The latter result paves the way towards the application of analytical methods to the study of PageRank.
Four Degrees of Separation
, 2012
"... Frigyes Karinthy, in his 1929 short story “Láncszemek” (“Chains”) suggested that any two persons are distanced by at most six friendship links. 1 Stanley Milgram in his famous experiment [20, 23] challenged people to route postcards to a fixed recipient by passing them only through direct acquaintan ..."
Abstract

Cited by 54 (4 self)
 Add to MetaCart
Frigyes Karinthy, in his 1929 short story “Láncszemek” (“Chains”) suggested that any two persons are distanced by at most six friendship links. 1 Stanley Milgram in his famous experiment [20, 23] challenged people to route postcards to a fixed recipient by passing them only through direct acquaintances. The average number of intermediaries on the path of the postcards lay between 4.4 and 5.7, depending on the sample of people chosen. We report the results of the first worldscale socialnetwork graphdistance computations, using the entire Facebook network of active users ( ≈ 721 million users, ≈ 69 billion friendship links). The average distance we observe is 4.74, corresponding to 3.74 intermediaries or “degrees of separation”, showing that the world is even smaller than we expected, and
Symmetry Breaking in Anonymous Networks: Characterizations
, 1996
"... We characterize exactly the cases in which it is possible to elect a leader in an anonymous network of processors by a deterministic algorithm, and we show that for every network there is a weak election algorithm (i.e., if election is impossible all processors detect this fact in a distributed way) ..."
Abstract

Cited by 48 (10 self)
 Add to MetaCart
We characterize exactly the cases in which it is possible to elect a leader in an anonymous network of processors by a deterministic algorithm, and we show that for every network there is a weak election algorithm (i.e., if election is impossible all processors detect this fact in a distributed way). 1 Introduction We consider the problem of electing a leader in an anonymous network of processors. More precisely our model is that of a directed graph, with vertices corresponding to processors, and arcs to communication links (we freely interchange symmetric digraphs and undirected graphs). We make no assumption on the structure of the network: selfloops and parallel arcs are allowed. In particular, processors are anonymous: they do not have unique identifiers. We consider both synchronous and asynchronous processor activation models, and models with and without "port awareness" (local names for outgoing and/or for incoming arcs). We consider both unidirectional and bidirectional links. ...
Mainmemory triangle computations for very large (sparse (powerlaw)) graphs
 Theor. Comput. Sci
"... Finding, counting and/or listing triangles (three vertices with three edges) in massive graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed survey of proposed mainmemory solutions to thes ..."
Abstract

Cited by 43 (0 self)
 Add to MetaCart
Finding, counting and/or listing triangles (three vertices with three edges) in massive graphs are natural fundamental problems, which received recently much attention because of their importance in complex network analysis. We provide here a detailed survey of proposed mainmemory solutions to these problems, in an unified way. We note that previous authors paid surprisingly little attention to space complexity of mainmemory solutions, despite its both fundamental and practical interest. We therefore detail space complexities of known algorithms and discuss their implications. We also present new algorithms which are time optimal for triangle listing and beats previous algorithms concerning space needs. They have the additional advantage of performing better on powerlaw graphs, which we also detail. We finally show with an experimental study that these two algorithms perform very well in practice, allowing to handle cases which were previously out of reach. 1 Introduction. A triangle in an undirected graph is a set of three vertices such that each possible edge between them is present in the graph. Following classical conventions, we call finding, counting and listing the problems of
Results 1  10
of
235