• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Mercator: A scalable, extensible web crawler (1999)

Cached

  • Download as a PDF

Download Links

  • [www.cs.fiu.edu]
  • [www.cs.ucr.edu]
  • [www.n3labs.com]
  • [www.cs.ucr.edu]
  • [coitweb.uncc.edu]
  • [coitweb.uncc.edu]
  • [www.bagualu.net]
  • [webpages.uncc.edu]
  • [www.research.digital.com]
  • [webarchive.jira.com]
  • [research.microsoft.com]
  • [www.csd.uwo.ca]
  • [www.csd.uwo.ca]
  • [nlp.uned.es]
  • [mias.uiuc.edu]
  • [www.mias.uiuc.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Allan Heydon , Marc Najork
Venue:Word Wide Web
Citations:172 - 5 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@ARTICLE{Heydon99mercator:a,
    author = {Allan Heydon and Marc Najork},
    title = {Mercator: A scalable, extensible web crawler},
    journal = {Word Wide Web},
    year = {1999}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper describes Mercator, a scalable, extensible web crawler written entirely in Java. Scalable web crawlers are an important component of many web services, but their design is not well-documented in the literature. We enumerate the major components of any scalable web crawler, comment on alternatives and tradeoffs in their design, and describe the particular components used in Mercator. We also describe Mercator’s support for extensibility and customizability. Finally, we comment on Mercator’s performance, which we have found to be comparable to that of other crawlers for which performance numbers have been published. 1

Keyphrases

extensible web crawler    scalable web crawler    major component    particular component    mercator support    performance number    mercator performance    important component    many web service   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University