MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Mercator: A Scalable, Extensible Web Crawler (1999) [102 citations — 4 self]

Abstract:

This paper describes Mercator, a scalable, extensible web crawler written entirely in Java. Scalable web 1 Introduction Designing a scalable web crawler comparable to the ones used by the major search engines is a complex endeavor. However, due to the competitive nature of the search engine business, there are few papers in the literature describing the challenges and tradeoffs inherent in web crawler design. This paper's main contribution is to fill that gap. It de...

Citations

1839 The Anatomy of a Large-Scale Hypertextual Web Search Engine – Brin, Page - 1998
825 Space/time trade-offs in hash coding with allowable errors – Bloom - 1970
200 Efficient crawling through URL ordering – Cho, Garcia-Molina, et al. - 1998
129 Fingerprinting by random polynomials – Rabin - 1981
113 Finding What People Want: Experiences with the Web Crawler – Pinkerton - 1994
93 GENVL and WWWW: Tools for Taming the Web – McBryan - 1994
62 SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers – Miller, Bharat - 1998
53 M.: Measuring Index Quality using Random Walks on the Web – Henzinger, Heydon, et al. - 1999
52 Some applications of Rabin's fingerprinting method – Broder - 1993
48 The RBSE Spider – Balancing Effective Search Against Web Load – Eichmann - 1994
36 Crawling Towards Eternity: Building an Archive of the World – Burner - 1997
22 Performance Limitations of the Java Core Libraries – Heydon, Najork - 1999
5 The truth about the Web: crawling towards eternity – Smith - 1997
1 srcjava home – Ghemawat