Mining page farms and its application in link spam detection (2007)
| Citations: | 4 - 2 self |
BibTeX
@TECHREPORT{Zhou07miningpage,
author = {Bin Zhou and Name Bin Zhou},
title = {Mining page farms and its application in link spam detection},
institution = {},
year = {2007}
}
OpenURL
Abstract
Understanding the general relations of Web pages and their environments is important with a few interesting applications such as Web spam detection. In this thesis, we study the novel problem of page farm mining and its application in link spam detection. A page farm is the set of Web pages contributing to (a major portion of) the PageRank score of a target page. We show that extracting page farms is computationally expensive, and propose heuristic methods. We propose the concept of link spamicity based on page farms to evaluate the degree of a Web page being link spam. Using a real sample of more than 3 million Web pages, we analyze the statistics of page farms. We examine the effectiveness of our spamicity-based link spam detection methods using a newly available real data set of spam pages. The empirical study results strongly indicate that our methods are effective. Keywords: page farm; link spam; PageRank; link spamicity Subject Terms: Web search engines; Text processing (Computer science); World Wide Web iii iv To my parents, and my sister. “I like the dreams of the future better than the history of the past.” v







