Results 1 -
3 of
3
Dissemination of Collection Wide Information in a Distributed Information Retrieval System
- In Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1995
"... We find that dissemination of collection wide information (CWI) in a distributed collection of documents is needed to achieve retrieval effectiveness comparable to a centralized collection. Complete dissemination is unnecessary. The required dissemination level depends upon how documents are allocat ..."
Abstract
-
Cited by 64 (10 self)
- Add to MetaCart
We find that dissemination of collection wide information (CWI) in a distributed collection of documents is needed to achieve retrieval effectiveness comparable to a centralized collection. Complete dissemination is unnecessary. The required dissemination level depends upon how documents are allocated among sites. Low dissemination is needed for random document allocation, but higher levels are needed when documents are allocated based on content. We define parameters to control dissemination and document allocation and present results from four test collections. We define the notion of iso-knowledge lines with respect to the number of sites and level of dissemination in the distributed archive, and show empirically that iso-knowledge lines are also isoeffectiveness lines when documents are randomly allocated. 1 Introduction In the rapidly evolving internetworks of today, a vast diversity of information is becoming electronically available. The information environment is highly dist...
Building a distributed full-text index for the web
- ACM Trans. Inf. Syst
, 2001
"... We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creati ..."
Abstract
-
Cited by 63 (3 self)
- Add to MetaCart
We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.
Maintaining Retrieval Effectiveness in Distributed, Dynamic Information Retrieval Systems
, 1996
"... Traditional information retrieval (IR) techniques were developed under the tacit assumptions of static, centralized archives of documents. Advanced techniques invariably use information derived from the entire collection in an effort to produce high-quality responses to user queries. In dynamic, dis ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Traditional information retrieval (IR) techniques were developed under the tacit assumptions of static, centralized archives of documents. Advanced techniques invariably use information derived from the entire collection in an effort to produce high-quality responses to user queries. In dynamic, distributed information environments these assumptions are clearly not met. Heretofore easily obtainable collection wide information (CWI) may be unavailable to some or all member sites in a distributed document archive, so some degree of incompleteness or inconsistency must be tolerated. In this dissertation, we present a rigorous empirical study investigating how allowing the view of CWI to drift from rigorously defined values influences retrieval effectiveness. We give a generic model for searching a document collection that allows for the use of CWI derived from a subset of the collection. Within this model, we identify two realistic scenarios where the use of subset-derived collection stat...

