MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Web Spam Detection with Anti-Trust Rank (2006)

by Vijay Krishnan Stanford ,  Vijay Krishnan
Add To MetaCart

Abstract:

Spam pages on the web use various techniques to artificially achieve high rankings in search engine results. Human experts can do a good job of identifying spam pages and pages whose information is of dubious quality, but it is practically infeasible to use human e#ort for a large number of pages. Similar to the Trust Rank algorithm [1], we propose a method of selecting a seed set of pages to be evaluated by a human. We then use the link structure of the web and the manually labeled seed set, to detect other spam pages. Our experiments on the WebGraph dataset [3] show that our approach is very e#ective at detecting spam pages from a small seed set and achieves higher precision of spam page detection than the Trust Rank algorithm, apart from detecting pages with higher pageranks [10, 11], on an average.

Citations

1870 The anatomy of a large-scale hypertextual web search engine – Brin, Page - 1998
357 The eigentrust algorithm for reputation management in p2p networks – KAMVAR, SCHLOSSER, et al.
9 The PageRank citation ranking: Bringing order to the – Page, Brin, et al. - 1998
3 Combating Web Spam with Trust – Gyöngyi, Garcia-Molina, et al. - 2004
3 Taher Haveliwala – Rank - 2002
3 to Whom: Mining Linkage between Web – Links - 2001
2 Spam Alliances. Zoltán Gyöngyi, Hector Garcia-Molina – Link - 2005