Results 1 -
5 of
5
ODISSEA: A Peer-to-Peer Architecture for Scalable Web Search and Information Retrieval
- In WebDB
, 2003
"... this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/ ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
this paper appears in [15], and updated information is available at http://cis.poly.edu/westlab/odissea/
Optimized Query Execution in Large Search Engines with Global Page Ordering
, 2003
"... Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the ran ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
Large web search engines have to answer thousands of queries per second with interactive response times. A major factor in the cost of executing a query is given by the lengths of the inverted lists for the query terms, which increase with the size of the document collection and are often in the range of many megabytes. To address this issue, IR and database researchers have proposed pruning techniques that compute or approximate term-based ranking functions without scanning over the full inverted lists.
Three-level caching for efficient query processing in large web search engines
- In Proc. of the 14th Int. World Wide Web Conference
, 2005
"... Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with thi ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to improve scalability. In particular, two-level caching techniques cache results of repeated identical queries at the frontend, while index data for frequently used query terms are cached in each node at a lower level. We propose and evaluate a three-level caching scheme that adds an intermediate level of caching for additional performance gains. This intermediate level attempts to exploit frequently occurring pairs of terms by caching intersections or projections of the corresponding inverted lists. We propose and study several offline and online algorithms for the resulting weighted caching problem, which turns out to be surprisingly rich in structure. Our experimental evaluation based on a large web crawl and real search engine query log shows significant performance gains for the best schemes, both in isolation and in combination with the other caching levels. We also observe that a careful selection of cache admission and eviction policies is crucial for best overall performance.
When Does Fast Recovery Trump High Reliability?
- in Proc. 2nd Workshop on Evaluating and Architecting System Dependability
, 2002
"... this paper, we argue that for interactive Internet applications, a decrease in MTTR is sometimes more valuable than the corresponding increase in MTTF to improve Availability by the same amount, and we make a case for adopting MTTR as the primary metric for reasoning about system availability and fo ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
this paper, we argue that for interactive Internet applications, a decrease in MTTR is sometimes more valuable than the corresponding increase in MTTF to improve Availability by the same amount, and we make a case for adopting MTTR as the primary metric for reasoning about system availability and focusing designs on fast recovery
Scaling a Shared Object Space to the Internet: Case Study of Virat," (pdf
- Journal of Object Technology,Sept./Oct
"... Scalability is an important issue in the construction of distributed systems. Shared object spaces provide an elegant and easy-to program abstraction for building applications. However, existing shared object spaces have been realized at the cluster level. Use of centralized components, lack of effe ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Scalability is an important issue in the construction of distributed systems. Shared object spaces provide an elegant and easy-to program abstraction for building applications. However, existing shared object spaces have been realized at the cluster level. Use of centralized components, lack of effective failure handling mechanisms, lack of efficient object lookup mechanisms as well as consistency maintenance are the key issues that inhibit scalability of existing shared object spaces. We present the case study of scaling an existing shared object space (Virat) to the Internet. Bottlenecks in Virat include the granularity of consistency maintenance and Object Meta-data Repository (OMR) failures. Both the design and implementation of Virat has been modified in order to increase the granularity at which consistency is maintained. Virat has also been redesigned such that the OMRs form a Peer-to-Peer (P2P) overlay in order to handle OMR failures and improve scalability. Experimental evaluations are presented to show that the optimized version of Virat scales better, especially over a wide-area network. In addition, this paper also explains how to develop applications over the shared object space, with code sketches. 1

