Mercator: A Scalable, Extensible Web Crawler (1999) [102 citations — 4 self]
http://www.research.digital.com/SRC/mercator/paper
http://research.microsoft.com/~najork/mercator.pdf
http://mias.uiuc.edu/files/tutorials/mercator.pdf
DBLP
CACHED:
Abstract:
This paper describes Mercator, a scalable, extensible web crawler written entirely in Java. Scalable web 1 Introduction Designing a scalable web crawler comparable to the ones used by the major search engines is a complex endeavor. However, due to the competitive nature of the search engine business, there are few papers in the literature describing the challenges and tradeoffs inherent in web crawler design. This paper's main contribution is to fill that gap. It de...

