Results 1 -
2 of
2
Sic Transit Gloria Telae: Towards an Understanding of the Web's Decay
- In Proceedings of the 13th conference on World Wide Web
, 2004
"... The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
The rapid growth of the web has been noted and tracked extensively. Recent studies have however documented the dual phenomenon: web pages have small half lives, and thus the web exhibits rapid death as well. Consequently, page creators are faced with an increasingly burdensome task of keeping links up-to-date, and many are falling behind. In addition to just individual pages, collections of pages or even entire neighborhoods of the web exhibit significant decay, rendering them less e#ective as information resources. Such neighborhoods are identified only by frustrated searchers, seeking a way out of these stale neighborhoods, back to more up-to-date sections of the web; measuring the decay of a page purely on the basis of dead links on the page is too naive to reflect this frustration. In this paper we formalize a strong notion of a decay measure and present algorithms for computing it e#ciently. We explore this measure by presenting a number of validations, and use it to identify interesting artifacts on today's web. We then describe a number of applications of such a measure to search engines, web page maintainers, ontologists, and individual users.
The availability and persistence of web references in D-Lib Magazine
- In Proceedings of the 5th International Web Archiving Workshop (IWAW ’05
, 2005
"... Abstract. We explore the availability and persistence of URLs cited in articles published in D-Lib Magazine. We extracted 4387 unique URLs referenced in 453 articles published from July 1995 to August 2004. The availability was checked three times a week for 25 weeks from September 2004 to February ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. We explore the availability and persistence of URLs cited in articles published in D-Lib Magazine. We extracted 4387 unique URLs referenced in 453 articles published from July 1995 to August 2004. The availability was checked three times a week for 25 weeks from September 2004 to February 2005. We found that approximately 28 % of those URLs failed to resolve initially, and 30% failed to resolve at the last check. A majority of the unresolved URLs were due to 404 (page not found) and 500 (internal server error) errors. The content pointed to by the URLs was relatively stable; only 16% of the content registered more than a 1 KB change during the testing period. We explore possible factors which may cause a URL to fail by examining its age, path depth, top-level domain and file extension. Based on the data collected, we found the half-life of a URL referenced in a D-Lib Magazine article is approximately 10 years. We also found that URLs were more likely to be unavailable if they pointed to resources in the.net,.edu or country-specific top-level domain, used non-standard ports (i.e., not port 80), or pointed to resources with uncommon or deprecated extensions (e.g.,.shtml,.ps,.txt). 1

