MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Using Rank Propagation and Probabilistic Counting for Link-Based Spam Detection (2006) [17 citations — 10 self]

by Luca Becchetti ,  Carlos Castillo ,  Debora Donato ,  Stefano Leonardi ,  Ricardo Baeza-Yates
In Proceedings of the Workshop on Web Mining and Web Usage Analysis (WebKDD
Add To MetaCart

Abstract:

This paper describes a technique for automating the detection of Web link spam, that is, groups of pages that are linked together with the sole purpose of obtaining an undeservedly high score in search engines. The problem of Web spam is widespread and di#cult to solve, mostly due to the large size of web collections that makes many algorithms unfeasible in practice.

Citations

1439 Modern Information Retrieval – Baeza-Yates, Ribeiro - 1999
1392 Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations – Witten, Frank - 1999
1064 The PageRank Citation Ranking: Bringing Order to the Web – Page, Brin, et al. - 1999
213 Probabilistic counting algorithms for data base applications – Flajolet, Martin - 1985
131 Combating web spam with trustrank – Gyöngyi, Garcia-Molina, et al. - 2004
102 Efficient computation of PageRank – Haveliwala - 1999
96 H.: Web spam taxonomy – Gyongyi, Garcia-Molina - 2005
82 The indexable web is more than 11.5 billion pages – Gulli, Signorini - 2005
70 Ranking the web frontier – Eiron, McCurley, et al. - 2004
66 Probability and Computing: Randomized Algorithms and Probabilistic Analysis – Mitzenmacher, Upfal - 2005
64 Size-estimation framework with applications to transitive closure and reachability – Cohen - 1997
54 ANF: A fast and scalable tool for data mining in massive graphs – Palmer, Gibbons, et al. - 2002
53 Detecting spam web pages through content analysis – Ntoulas, Najork, et al. - 2006
52 Recognizing nepotistic links on the web – Davison - 2000
49 damn spam, and statistics – Using statistical analysis to locate spam web pages – Spam - 2004
48 Identifying link farm spam pages – Wu, Davison - 2005
43 SpamRank – Fully automatic link spam detection – Benczúr, Csalogány, et al. - 2005
38 Making eigenvector-based reputation systems robust to collusions – ZHANG, GOEL, et al. - 2004
29 Counting large numbers of events in small registers – Morris - 1978
22 Estimating the size of generalized transitive closures – Lipton, Naughton - 1989
21 Link-based characterization and detection of Web Spam – Becchetti, Castillo, et al. - 2006
21 Thwarting the nigritude ultramarine: Learning to identify link spam – Drost, Scheffer - 2005
20 PageRank increase under different collusion topologies – Baeza-Yates, Castillo, et al. - 2005
15 Link spam detection based on mass estimation – Gyongyi, Berkhin, et al. - 2006
13 Generalizing PageRank: Damping functions for link-based ranking algorithms – Baeza-Yates, Boldi, et al. - 2006
12 Discovering Large Dense Subgraphs in Massive Graphs – Gibson, Kumar, et al. - 2005
2 The classification of search engine spam. Available online at http://www.silverdisc.co.uk/articles/spam-classification – Perkins - 2001