Web Spam Detection with Anti-Trust Rank (2006)
Abstract:
Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages whose information is of dubious quality, but it is practically infeasible to use human e#ort for a large number of pages. Similar to the Trust Rank algorithm [1], we propose a method of selecting a seed set of pages to be evaluated by a human. We then use the link structure of the web and the manually labeled seed set, to detect other spam pages. Our experiments on the WebGraph dataset [3] show that our approach is very e#ective at detecting spam pages from a small seed set and achieves higher precision of spam page detection than the Trust Rank algorithm, apart from detecting pages with higher pageranks [10, 11], on an average.
Citations
| 1870 | The anatomy of a large-scale hypertextual web search engine – Brin, Page - 1998 |
| 357 | The eigentrust algorithm for reputation management in p2p networks – KAMVAR, SCHLOSSER, et al. |
| 9 | The PageRank citation ranking: Bringing order to the – Page, Brin, et al. - 1998 |
| 3 | Combating Web Spam with Trust – Gyöngyi, Garcia-Molina, et al. - 2004 |
| 3 | Taher Haveliwala – Rank - 2002 |
| 3 | to Whom: Mining Linkage between Web – Links - 2001 |
| 2 | Spam Alliances. Zoltán Gyöngyi, Hector Garcia-Molina – Link - 2005 |

