Results 1 - 10
of
70
A taxonomy of web search
- SIGIR FORUM
, 2002
"... Classic IR (information retrieval) is inherently predicated on users searching for information, the socalled "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where ..."
Abstract
-
Cited by 319 (4 self)
- Add to MetaCart
Classic IR (information retrieval) is inherently predicated on users searching for information, the socalled "information need". But the need behind a web search is often not informational -- it might be navigational (give me the url of the site I want to reach) or transactional (show me sites where I can perform a certain transaction, e.g. shop, download a file, or find a map). We explore this taxonomy of web searches and discuss how global search engines evolved to deal with web-specific needs.
Stuff I've seen: A system for personal information retrieval and re-use
- SIGIR '03
, 2003
"... Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge work involves finding and re-using previously seen information. We describe the design and evaluation of a system, called Stuff Iâve Seen (SIS), that facilitates information re-use. Th ..."
Abstract
-
Cited by 191 (7 self)
- Add to MetaCart
Most information retrieval technologies are designed to facilitate information discovery. However, much knowledge work involves finding and re-using previously seen information. We describe the design and evaluation of a system, called Stuff Iâve Seen (SIS), that facilitates information re-use. This is accomplished in two ways. First, the system provides a unified index of information that a person has seen, whether it was seen as email, web page, document, appointment, etc. Second, because the information has been seen before, rich contextual cues can be used in the search interface. The system has been used internally by more than 230 employees. We report on both qualitative and quantitative aspects of system use. Initial findings show that time and people are important retrieval cues. Users find information more easily using SIS, and use other search tools less frequently after installation.
Efficient data mining for path traversal patterns
- IEEE Transactions on Knowledge and Data Engineering
, 1998
"... Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we der ..."
Abstract
-
Cited by 128 (10 self)
- Add to MetaCart
Abstract—In this paper, we explore a new data mining capability that involves mining path traversal patterns in a distributed information-providing environment where documents or objects are linked together to facilitate interactive access. Our solution procedure consists of two steps. First, we derive an algorithm to convert the original sequence of log data into a set of maximal forward references. By doing so, we can filter out the effect of some backward references, which are mainly made for ease of traveling and concentrate on mining meaningful user access sequences. Second, we derive algorithms to determine the frequent traversal patterns¦i.e., large reference sequences¦from the maximal forward references obtained. Two algorithms are devised for determining large reference sequences; one is based on some hashing and pruning techniques, and the other is further improved with the option of determining large reference sequences in batch so as to reduce the number of database scans required. Performance of these two methods is comparatively analyzed. It is shown that the option of selective scan is very advantageous and can lead to prominent performance improvement. Sensitivity analysis on various parameters is conducted. Index Terms—Data mining, traversal patterns, distributed information system, World Wide Web, performance analysis.
Summary of WWW Characterizations
- World Wide Web
, 1998
"... To date there have been a number of efforts that attempt to characterize various aspects of the World Wide Web. This paper presents a summary of these efforts, highlighting regularities and invariants that have been discovered. Keywords: Statistics, Metrics, Analysis, and Modeling ..."
Abstract
-
Cited by 78 (0 self)
- Add to MetaCart
To date there have been a number of efforts that attempt to characterize various aspects of the World Wide Web. This paper presents a summary of these efforts, highlighting regularities and invariants that have been discovered. Keywords: Statistics, Metrics, Analysis, and Modeling
Distributions of Surfers' Paths through the World Wide Web: Empirical Characterizations
- World Wide Web
, 1999
"... Surfing the World Wide Web (WWW) involves traversing hyperlink connections among documents. The ability to predict surfing patterns could solve many problems facing producers and consumers of WWW content. We analyzed WWW server logs for a WWW site, collected over ten days, to compare different path ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
Surfing the World Wide Web (WWW) involves traversing hyperlink connections among documents. The ability to predict surfing patterns could solve many problems facing producers and consumers of WWW content. We analyzed WWW server logs for a WWW site, collected over ten days, to compare different path reconstruction methods and to investigate how past surfing behavior predicts future surfing choices. Since log files do not explicitly contain user paths, various methods have evolved to reconstruct user paths. Session times, number of clicks per visit, and Levenshtein Distance analyses were performed to show the impact of various reconstruction methods. Different methods for measuring surfing patterns were also compared. Markov model approximations were used to model the probability of users choosing links conditional on past surfing paths. Information-theoretic (entropy) measurements suggest that information is gained by using longer paths to estimate the conditional probability of link choice given surf path. The improvements diminish, however, as one increases the length of path beyond one. Information-theoretic (Total Divergence to the Average entropy) measurements suggest that the conditional probabilities of link choice given surf path are more stable over time for shorter paths than longer paths. Direct examination of the accuracy of the conditional probability models in predicting test data also suggests that shorter paths yield more stable models and can be estimated reliably with less data than longer paths. iii Keywords: WWW, paths, prediction, user modeling, log file analysis 1 1.
Aliasing on the World Wide Web: Prevalence and Performance Implications
, 2002
"... Aliasing occurs in Web transactions when requests containing different URLs elicit replies containing identical data payloads. Aliasing can cause cache misses, and there is reason to suspect that offthe -shelf Web authoring tools might increase aliasing on the Web. Existing research literature, howe ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
Aliasing occurs in Web transactions when requests containing different URLs elicit replies containing identical data payloads. Aliasing can cause cache misses, and there is reason to suspect that offthe -shelf Web authoring tools might increase aliasing on the Web. Existing research literature, however, says little about the prevalence of aliasing in user-initiated transactions or its impact on endto -end performance in large multi-level cache hierarchies.
Predicting Web Actions from HTML Content
- In Proceedings of the The Thirteenth ACM Conference on Hypertext and Hypermedia (HT’02
, 2002
"... This paper examines the accuracy of predicting a user's next action based on analysis of the content of the pages requested recently by the user. Predictions are made using the similarity of a model of the user's interest to the text in and around the hypertext anchors of recently requested Web page ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
This paper examines the accuracy of predicting a user's next action based on analysis of the content of the pages requested recently by the user. Predictions are made using the similarity of a model of the user's interest to the text in and around the hypertext anchors of recently requested Web pages. This approach can make predictions of actions that have never been taken by the user and potentially make predictions that reflect current user interests. We evaluate this technique using data from a full-content log of Web activity and find that textual similarity-based predictions outperform simpler approaches.
A Survey of Proxy Cache Evaluation Techniques
- In Proceedings of the Fourth International Web Caching Workshop (WCW99
, 1999
"... Proxy caches are increasingly used around the world to reduce bandwidth requirements and alleviate delays associated with the World-Wide Web. In order to compare proxy cache performances, objective measurements must be made. In this paper, we define a space of proxy evaluation methodologies based on ..."
Abstract
-
Cited by 28 (8 self)
- Add to MetaCart
Proxy caches are increasingly used around the world to reduce bandwidth requirements and alleviate delays associated with the World-Wide Web. In order to compare proxy cache performances, objective measurements must be made. In this paper, we define a space of proxy evaluation methodologies based on source of workload used and form of algorithm implementation. We then survey recent publications and show their locations within this space. 1 Introduction Proxy caches are increasingly used around the world to reduce bandwidth and alleviate delays associated with the World-Wide Web. This paper describes the space of proxy cache evaluation methodologies and places current research within that space. The primary contributions of this paper are threefold: 1) definition and description of the space of evaluation techniques; 2) appraisal of the di#erent methods within that space; and 3) a survey of cache evaluation techniques from the research literature. In the next section we provide backgro...
Elastic Windows: A Hierarchical Multi-Window World-Wide Web Browser
, 1997
"... The World-Wide Web is becoming an invaluable source for the information needs of many users. However, current browsers are still primitive, in that they do not support many of the navigation needs of users, as indicated by user studies. They do not provide an overview and a sense of location in the ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
The World-Wide Web is becoming an invaluable source for the information needs of many users. However, current browsers are still primitive, in that they do not support many of the navigation needs of users, as indicated by user studies. They do not provide an overview and a sense of location in the information structure being browsed. Also they do not facilitate organization and filtering of information nor aid users in accessing already visited pages without high cognitive demands. In this paper, a new browsing interface is proposed with multiple hierarchical windows and efficient multiple window operations. It provides a flexible environment where users can quickly organize, filter, and restructure the information on the screen as they reformulate their goals. Overviews can give the user a sense of location in the browsing history as well as provide fast access to a hierarchy of pages.
Smartback: supporting users in back navigation
- In Proc. WWW 2004
, 2004
"... This paper presents the design and user evaluation of SmartBack, a feature that complements the standard Back button by enabling users to jump directly to key pages in their navigation session, making common navigation activities more efficient. Defining key pages was informed by the findings of a u ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
This paper presents the design and user evaluation of SmartBack, a feature that complements the standard Back button by enabling users to jump directly to key pages in their navigation session, making common navigation activities more efficient. Defining key pages was informed by the findings of a user study that involved detailed monitoring of Web usage and analysis of Web browsing in terms of navigation trails. The pages accessible through SmartBack are determined automatically based on the structure of the user’s navigation trails or page association with specific user’s activities, such as search or browsing bookmarked sites. We discuss implementation decisions and present results of a usability study in which we deployed the SmartBack prototype and monitored usage for a month in both corporate and home settings. The results show that the feature brings qualitative improvement to the browsing experience of individuals who use it.

