Results 1 - 10
of
32
Link spam detection based on mass estimation
- In Proceedings of the 32nd International Conference on Very Large Databases. ACM
, 2006
"... Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page’s ranking. We discuss how to estimate spam mass and how the estimates c ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Link spamming intends to mislead search engines and trigger an artificially high link-based ranking of specific target web pages. This paper introduces the concept of spam mass, a measure of the impact of link spamming on a page’s ranking. We discuss how to estimate spam mass and how the estimates can help identifying pages that benefit significantly from link spamming. In our experiments on the host-level Yahoo! web graph we use spam mass estimates to successfully identify tens of thousands of instances of heavy-weight link spamming. 1.
Fast Parallel PageRank: A Linear System Approach
, 2004
"... In this paper we investigate the convergence of iterative stationary and Krylov subspace methods for the PageRank linear system, including the convergence dependency on teleportation. We demonstrate that linear system iterations converge faster than the simple power method and are less sensitive to ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
In this paper we investigate the convergence of iterative stationary and Krylov subspace methods for the PageRank linear system, including the convergence dependency on teleportation. We demonstrate that linear system iterations converge faster than the simple power method and are less sensitive to the changes in teleportation. In order to perform this study we developed a framework for parallel PageRank computing. We describe the details of the parallel implementation and provide experimental results obtained on a 70-node Beowulf cluster.
Unsupervised modeling of object categories using link analysis techniques
- In CVPR
, 2008
"... We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a large-scale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in wh ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
We propose an approach for learning visual models of object categories in an unsupervised manner in which we first build a large-scale complex network which captures the interactions of all unit visual features across the entire training set and we infer information, such as which features are in which categories, directly from the graph by using link analysis techniques. The link analysis techniques are based on well-established graph mining techniques used in diverse applications such as WWW, bioinformatics, and social networks. The techniques operate directly on the patterns of connections between features in the graph rather than on statistical properties, e.g., from clustering in feature space. We argue that the resulting techniques are simpler, and we show that they perform similarly or better compared to state of the art techniques on common data sets. We also show results on more challenging data sets than those that have been used in prior work on unsupervised modeling.
Efficient and decentralized pagerank approximation in a peer-to-peer web search network
- In VLDB,2006
, 2006
"... PageRank-style (PR) link analyses are a cornerstone of Web search engines and Web mining, but they are computationally expensive. Recently, various techniques have been proposed for speeding up these analyses by distributing the link graph among multiple sites. However, none of these advanced method ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
PageRank-style (PR) link analyses are a cornerstone of Web search engines and Web mining, but they are computationally expensive. Recently, various techniques have been proposed for speeding up these analyses by distributing the link graph among multiple sites. However, none of these advanced methods is suitable for a fully decentralized PR computation in a peer-to-peer (P2P) network with autonomous peers, where each peer can independently crawl Web fragments according to the user’s thematic interests. In such a setting the graph fragments that different peers have locally available or know about may arbitrarily overlap among peers, creating additional complexity for the PR computation. This paper presents the JXP algorithm for dynamically and collaboratively computing PR scores of Web pages that are arbitrarily distributed in a P2P network. The algorithm runs at every peer, and it works by combining locally computed PR scores with random meetings among the peers in the network. It is scalable as the number of peers on the network grows, and experiments as well as theoretical arguments show that JXP scores converge to the true PR scores that one would obtain by a centralized computation. 1.
Dynamic context-sensitive pagerank for expertise mining
- In Social Informatics, volume 6430 of LNCS
, 2010
"... Abstract. Online tools for collaboration and social platforms have become omnipresent in Web-based environments. Interests and skills of people evolve over time depending in performed activities and joint collaborations. We believe that ranking models for recommending experts or collaboration partne ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
Abstract. Online tools for collaboration and social platforms have become omnipresent in Web-based environments. Interests and skills of people evolve over time depending in performed activities and joint collaborations. We believe that ranking models for recommending experts or collaboration partners should not only rely on profiles or skill information that need to be manually maintained and updated by the user. In this work we address the problem of expertise mining based on performed interactions between people. We argue that an expertise mining algorithm must consider a person’s interest and activity level in a certain collaboration context. Our approach is based on the PageRank algorithm enhanced by techniques to incorporate contextual link information. An approach comprising two steps is presented. First, offline analysis of human interactions considering tagged interaction links and second composition of ranking scores based on preferences. We evaluate our approach using an email interaction network. 1
Competing to Share Expertise: the Taskcn Knowledge Sharing Community
"... "Witkeys " are websites in China that form a rapidly growing webbased knowledge market. A user who posts a task also offers a small fee, and many other users submit their answers to compete. The "Witkey " sites fall in-between aspects of the now-defunct Google Answers (vetted exp ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
"Witkeys " are websites in China that form a rapidly growing webbased knowledge market. A user who posts a task also offers a small fee, and many other users submit their answers to compete. The "Witkey " sites fall in-between aspects of the now-defunct Google Answers (vetted experts answer questions for a fee) and Yahoo Answers (anyone can answer or ask a question). As such, these sites promise new possibilities for knowledge-sharing online communities, perhaps fostering the freelance marketplace of the future. In this paper, we investigate one of the biggest "Witkey " websites in China, Taskcn.com. In particular, we apply social network prestige measures to a novel construction of user and task networks based on competitive outcomes to discover the underlying properties of both users and tasks. Our results demonstrate the power of this approach: Our analysis allows us to infer relative expertise of the users and provides an understanding of the participation structure in Taskcn. The results suggest challenges and opportunities for this kind of knowledge sharing medium. Categories and Subject Descriptors H.5.3 [Information Interfaces and Presentation (e.g., HCI)]:
PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES
"... Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can b ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can be computed separately from that of the dangling nodes. The algorithm applies the power method only to the smaller lumped matrix, but the convergence rate is the same as that of the power method applied to the full matrix G. The efficiency of the algorithm increases as the number of dangling nodes increases. We also extend the expression for PageRank and the algorithm to more general Google matrices that have several different dangling node vectors, when it is required to distinguish among different classes of dangling nodes. We also analyze the effect of the dangling node vector on the PageRank, and show that the PageRank of the dangling nodes depends strongly on that of the nondangling nodes but not vice versa. At last we present a Jordan decomposition of the Google matrix for the (theoretical) extreme case when all web pages are dangling nodes.
A singular perturbation approach for choosing PageRank damping factor
- 2006, in arXiv:math.PR/0612079. G. Kollias, E. Gallopoulos
, 2006
"... Abstract. We study the PageRank mass of principal components in a bow-tie Web Graph, as a function of the damping factor c. It is known that the Web graph can be divided into three principal components: SCC, IN and OUT. The Giant Strongly Connected Component (SCC) contains a large group of pages all ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. We study the PageRank mass of principal components in a bow-tie Web Graph, as a function of the damping factor c. It is known that the Web graph can be divided into three principal components: SCC, IN and OUT. The Giant Strongly Connected Component (SCC) contains a large group of pages all having a hyper-link path to each other. The pages in the IN (OUT) component have a path to (from) the SCC, but not back. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the damping factor, in spite of the fact that it drops to zero when c tends to one. However, a detailed study of the OUT component reveals the presence of “dead-ends ” (small groups of pages linking only to each other) that receive an unfairly high ranking when c is close to one. We argue that this problem can be mitigated by choosing c as small as 1/2. 1

