Results 1 -
3 of
3
Topical Locality in the Web
- In Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval (SIGIR 2000
, 2000
"... Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable WorldWide Web. In this paper, we examine to what extent these ideas hold by empi ..."
Abstract
-
Cited by 108 (8 self)
- Add to MetaCart
Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable WorldWide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of linked pages having similar textual content to be high; the similarity of sibling pages increases when the links from the parent are close together; titles, descriptions, and anchor text represent at least part of the target page; and that anchor text may be a useful discriminator among unseen child pages. These results show the foundations necessary for the success of many web systems, including search engines, focused crawlers, linkage analyzers, and intelligent web agents.
The Design and Evaluation of Web Prefetching and Caching Techniques
, 2002
"... User-perceived retrieval latencies in the World Wide Web can be improved by pre-loading a local cache with resources likely to be accessed. A user requesting content that can be served by the cache is able to avoid the delays inherent in the Web, such as congested networks and slow servers. The diff ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
User-perceived retrieval latencies in the World Wide Web can be improved by pre-loading a local cache with resources likely to be accessed. A user requesting content that can be served by the cache is able to avoid the delays inherent in the Web, such as congested networks and slow servers. The difficulty, then, is to determine what content to prefetch into the cache. This work explores machine learning algorithms for user sequence prediction, both in general and specifically for sequences of Web requests. We also consider information retrieval techniques to allow the use of the content of Web pages to help predict future requests. Although history-based mechanisms can provide strong performance in predicting future requests, performance can be improved by including predictions from additional sources. While past researchers have used a variety of techniques for evaluating caching algorithms and systems, most of those methods were not applicable to the evaluation of prefetching algorithms or systems. Therefore, two new mechanisms for evaluation are introduced. The first is a detailed trace-based simulator, built from scratch,
Topical Locality in the Web: Experiments and Observations
, 2000
"... Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable World-Wide Web. In this paper, we examine to what extent these ideas hold by emp ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable World-Wide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of linked pages having similar textual content to be high; the similarity of sibling pages increases when the links from the parent are close together; titles, descriptions, and anchor text represent at least part of the target page; and that anchor text may be a useful discriminator among unseen child pages. These results present the foundations necessary for the success of many web systems, including search engines, focused crawlers, linkage analyzers, and intelligent web agents. 1 Introduction Most web pages are linked to others with related content...

