Effective Page Refresh Policies for Web Crawlers (2003)
Cached
Download Links
- [rose.cs.ucla.edu]
- [oak.cs.ucla.edu]
- [cui.unige.ch]
- DBLP
Other Repositories/Bibliography
| Venue: | ACM TRANSACTIONS ON DATABASE SYSTEMS |
| Citations: | 50 - 3 self |
BibTeX
@ARTICLE{Cho03effectivepage,
author = {Junghoo Cho and Hector Garcia-Molina},
title = {Effective Page Refresh Policies for Web Crawlers},
journal = {ACM TRANSACTIONS ON DATABASE SYSTEMS},
year = {2003},
volume = {28},
pages = {2003}
}
Years of Citing Articles
OpenURL
Abstract
In this paper we study how we can maintain local copies of remote data sources "fresh," when the source data is updated autonomously and independently. In particular, we study the problem of Web crawlers that maintain local copies of remote Web pages for Web search engines. In this context, remote data sources (Web sites) do not notify the copies (Web crawlers) of new changes, so we need to periodically poll the sources to maintain the copies up-to-date. Since polling the sources takes significant time and resources, it is very difficult to keep the copies completely up-to-date. This paper







