Results 1 -
4 of
4
Optimizing Result Prefetching in Web Search Engines with Segmented Indices
- In VLDB
, 2001
"... We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for qu ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
We study the process in which search engines with segmented indices serve queries. In particular, we investigate the number of result pages which search engines should prepare during the query processing phase. Search engine users have been observed to browse through very few pages of results for queries which they submit. This behavior of users suggests that prefetching many results upon processing an initial query is not efficient, since most of the prefetched results will not be requested by the user who initiated the search. However, a policy which abandons result prefetching in favor of retrieving just the first page of search results might not make optimal use of system resources as well. We argue that for a certain behavior of users, engines should prefetch a constant number of result pages per query. We define a concrete query processing model for search engines with segmented indices, and analyze the cost of such prefetching policies. Based on these costs, we show how to determine the constant which optimizes the prefetching policy. Our results are mostly applicable to local index partitions of the inverted files, but are also applicable to processing of short queries in global index architectures.
The Mirror DBMS at TREC
- In Proceedings of the eighth Text Retrieval Conference, TREC-8. NIST Special Publications
"... The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
The database group at University of Twente participates in TREC8 using the Mirror DBMS, a prototype database system especially designed for multimedia and web retrieval. From a database perspective, the purpose has been to check whether we can get sufficient performance, and to prepare for the very large corpus track in which we plan to participate next year. From an IR perspective, the experiments have been designed to learn more about the effect of the global statistics on the ranking.
Analyzing Web Robots and Their Impact on Caching
- In Proceedings of the 6th Web Caching and Content Delivery Workshop
, 2001
"... Understanding the nature and the characteristics of Web robots is an essential step to analyze their impact on caching. Using a multi-layer hierarchical workload model, this paper presents a characterization of the workload generated by autonomous agents and robots. This characterization focuses on ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Understanding the nature and the characteristics of Web robots is an essential step to analyze their impact on caching. Using a multi-layer hierarchical workload model, this paper presents a characterization of the workload generated by autonomous agents and robots. This characterization focuses on the statistical properties of the arrival process and on the robot behavior graph model. A set of criteria is proposed for identifying robots in real logs. We then identify and characterize robots from real logs applying a multi-layered approach. Using a stack distance based analytical model for the interaction between robots and Web site caching, we assess the impact of robots' requests on Web caches. Our analyses point out that robots cause a significant increase in the miss ratio of a server-side cache. Robots have a referencing pattern that completely disrupts locality assumptions. These results indicate not only the need for a better understanding of the behavior of robots, but also the need of Web caching policies that treat robots' requests differently than human generated requests.
Evaluating Answer Quality/Efficiency Tradeoffs
- In KRDB Workshop, volume 10 of CEUR Workshop Proceedings
, 1998
"... For many emerging applications and environments, information systems designers and implementers must consider the tradeoffs between efficiency and the quality of query answers. This flexibility, while allowing numerous opportunities for optimization, complicates the development of various system com ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
For many emerging applications and environments, information systems designers and implementers must consider the tradeoffs between efficiency and the quality of query answers. This flexibility, while allowing numerous opportunities for optimization, complicates the development of various system components and their performance evaluation. In this short paper, we first outline the issues that arise in evaluating answer quality /efficiency tradeoffs. We then describe how we address these issues in two ongoing projects that adopt notions of answer quality from the field of Information Retrieval. 1 Introduction Traditionally, database systems have been designed based on the idea that for a given query on a given database, there is a single, correct answer that must be returned. It is clear, however, that for many emerging applications and environments it is neither practical nor desirable to enforce such a stringent requirement. In particular, when scaling query processing to wide-area ...

