Results 1 - 10
of
12
A Document-Centric Approach to Static Index Pruning in Text Retrieval Systems
, 2006
"... We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term should remain in the index or not. The decision is made based on the term's contribution to the document's Kullback-Leibler di ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term should remain in the index or not. The decision is made based on the term's contribution to the document's Kullback-Leibler divergence from the text collection's global language model. Our technique can be used to decrease the size of the index by over 90%, at only a minor decrease in retrieval e#ectiveness. It thus allows us to make the index small enough to fit entirely into the main memory of a single PC, even for large text collections containing millions of documents. This results in great e#ciency gains, superior to those of earlier pruning methods, and an average response time around 20 ms on the GOV2 document collection.
Scalable peer-to-peer web retrieval with highly discriminative keys
- In ICDE
, 2007
"... The suitability of Peer-to-Peer (P2P) approaches for fulltext web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we formalize a novel indexing/retrieval model that achiev ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
The suitability of Peer-to-Peer (P2P) approaches for fulltext web retrieval has recently been questioned because of the claimed unacceptable bandwidth consumption induced by retrieval from very large document collections. In this contribution we formalize a novel indexing/retrieval model that achieves high performance, costefficient retrieval by indexing with highly discriminative keys (HDKs) stored in a distributed global index maintained in a structured P2P network. HDKs correspond to carefully selected terms and term sets appearing in a small number of collection documents. We provide a theoretical analysis of the scalability of our retrieval model and report experimental results obtained with our HDK-based P2P retrieval engine. These results show that, despite increased indexing costs, the total traffic generated with the HDK approach is significantly smaller than the one obtained with distributed single-term indexing strategies. Furthermore, our experiments show that the retrieval performance obtained with a random set of real queries is comparable to the one of centralized, single-term solution using the best state-of-the-art BM25 relevance computation scheme. Finally, our scalability analysis demonstrates that the HDK approach can scale to large networks of peers indexing web-size document collections, thus opening the way towards viable, truly-decentralized web retrieval. 1.
University of glasgow at trec 2005: Experiments in terabyte and enterprise tracks with terrier
- In Proceedings of TREC-05
, 2005
"... With our participation in TREC 2005, we continue experiments using Terrier, a modular and scalable Information Retrieval (IR) framework, in 4 tasks from the Terabyte and Enterprise tracks. In the Terabyte track, we investigate new Divergence From Randomness weighting models, and a novel query expa ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
With our participation in TREC 2005, we continue experiments using Terrier, a modular and scalable Information Retrieval (IR) framework, in 4 tasks from the Terabyte and Enterprise tracks. In the Terabyte track, we investigate new Divergence From Randomness weighting models, and a novel query expansion approach that can take into account various document fields, namely content, title and anchor text. In addition, we test a new selective query expansion mechanism which determines the appropriateness of using query expansion on a per-query basis, using statistical information from a low-cost query performance predictor. In the Enterprise track, we investigate combining document fields evidence with other information occurring in an Enterprise setting. In the email known item task, we also investigate temporal and thread priors suitable for email search. In the expert search task, for each candidate, we generate profiles of expertise evidence from the W3C collection. Moreover, we propose a model for ranking these candidate profiles in response to a query.
Beyond term indexing: A P2P framework for web information retrieval
- Informatica
, 2006
"... Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
Web search over peer-to-peer (P2P) networks shows promise to become an alternative to the state-of-the-art search engines since P2P overlays offer means for decentralized search across widely-distributed document collections. However, the design of effective techniques for P2P indexing and retrieval raises a number of technical challenges due to potentially unscalable resource (e.g. bandwidth, storage) consumption. The paper presents a framework for full-text information retrieval in structured P2P networks and introduces a novel retrieval model based on highly discriminative keys—terms and term sets appearing in a restricted number of documents—that ensure efficient and scalable retrieval. Our goal is to design scalable techniques for building a global key index in structured P2P overlays for large document collections. We present experimental results that show acceptable indexing and retrieval costs while the retrieval quality is comparable to standard centralized solutions with BM25 relevance computation scheme. Povzetek: Razvito je P2P ogrodje za internetne iskalnike. 1
Predicting Query Performance in Intranet Search
, 2005
"... The issue of query performance prediction has been studied in the context of text retrieval and Web search. In this paper, we investigate this issue in an intranet environment. The collection used is a crawl of the dcs.gla.ac.uk domain, and the queries are logged from the domain search engine, which ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The issue of query performance prediction has been studied in the context of text retrieval and Web search. In this paper, we investigate this issue in an intranet environment. The collection used is a crawl of the dcs.gla.ac.uk domain, and the queries are logged from the domain search engine, which is powered by the Terrier platform. We propose an automatic evaluation methodology generating the mean average precision of each query by cross-comparing the output of diverse search engines. We measure the correlation of two pre-retrieval predictors with mean average precision, which is obtained by our proposed evaluation methodology. Results show that the predictors are very effective for 1 and 2-term queries, which are the majority of the real queries in the intranet environment.
Combining fields for query expansion and adaptive query expansion
, 2007
"... In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale.GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.
A Performance Prediction Approach to Enhance Collaborative Filtering Performance
"... Abstract. Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the area of Collaborative Filtering (CF), where it has barely been researched so far. ..."
Abstract
- Add to MetaCart
Abstract. Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the area of Collaborative Filtering (CF), where it has barely been researched so far. We investigate the adaptation of clarity-based query performance predictors to predict neighbor performance in CF. A predictor is proposed and introduced in a kNN CF algorithm to produce a dynamic variant where neighbor ratings are weighted based on their predicted performance. The properties of the predictor are empirically studied by, first, checking the correlation of the predictor output with a proposed measure of neighbor performance. Then, the performance of the dynamic kNN variant is examined on different sparsity and neighborhood size conditions, where the variant consistently outperforms the baseline algorithm, with increasing difference on small neighborhoods.
Predicting Neighbor Goodness in Collaborative Filtering
"... Abstract. Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the subarea of Collaborative Filtering (CF), where it has barely been researched so f ..."
Abstract
- Add to MetaCart
Abstract. Performance prediction has gained increasing attention in the IR field since the half of the past decade and has become an established research topic in the field. The present work restates the problem in the subarea of Collaborative Filtering (CF), where it has barely been researched so far. We investigate the adaptation of clarity-based query performance predictors to define predictors of neighbor performance in CF. The proposed predictors are introduced in a memory-based CF algorithm to produce a dynamic variant where neighbor ratings are weighted based on their predicted performance. The approach is tested with encouraging empirical results, as the dynamic variants consistently outperform the baseline algorithms, with increasing difference on small neighborhoods.
Ranking Experts with Discriminative Probabilistic Models
"... In the realistic settings of expert finding, the evidence for expertise often comes from heterogeneous knowledge sources. As some sources tend to be more reliable and indicative than the others, different data sources need to receive different weights to reflect their degrees of importance. However, ..."
Abstract
- Add to MetaCart
In the realistic settings of expert finding, the evidence for expertise often comes from heterogeneous knowledge sources. As some sources tend to be more reliable and indicative than the others, different data sources need to receive different weights to reflect their degrees of importance. However, most previous studies in expert finding did not differentiate data sources, which may lead to unsatisfactory performance in the settings where the heterogeneity of data sources is present. In this paper, we investigate how to merge and weight heterogeneous knowledge sources in the context of expert finding. A relevance-based supervised learning framework is presented to learn the combination weights from training data. Beyond just learning a fixed combination strategy for all the queries and experts, we propose a series of probabilistic models which have increasing capability to associate the combination weights with specific experts and queries. In the last (and also the most sophisticated) proposed model, the combination weights depend on both expert classes and query topics, and these classes and topics are derived from expert and query features. Compared with expert and query independent combination methods, the proposed combination strategy can better adjust to different types of experts and queries. In consequence, the model yields much flexibility of combining data sources when dealing with a broad range of expertise areas and a large variation in experts. Empirical studies on a real world faculty expertise testbed demonstrate the effectiveness and robustness of the proposed learning based models.
A Complete-Computerised Delphi Process with a Multi-agent System
"... Abstract. Looking for alternative ways of coordinating agents, this paper explores the adaptation of the Delphi protocol to agent systems. The Delphi protocol can be applied when a community of experts is required to deliver a consensual answer. In these cases, consensus stands for reaching an agree ..."
Abstract
- Add to MetaCart
Abstract. Looking for alternative ways of coordinating agents, this paper explores the adaptation of the Delphi protocol to agent systems. The Delphi protocol can be applied when a community of experts is required to deliver a consensual answer. In these cases, consensus stands for reaching an agreement among the experts about what the answer should be. This consensus reaching problem has been already considered in the literature, though its automatisation remains as a challenge. Intuitively, the experts should dialogue, interchange ideas, and change their mind as the discussion progresses. This paper presents a computerisation of discussion among expert agents and shows how they can be drawn towards a conclusion discussion by means of the Delphi process. The proof of concept is made with a document relevance evaluation problem where a community of experts decide whether a document is relevant or not. In conclusion, this paper makes an important contribution to people using Delphi processes, because the presented system is the first completecomputerised Delphi process. With respect to multi-agent systems, it has the potential to solve coordination in an original way, different from everything that has been done before. Key words: agent oriented software engineering, multi-agent systems, development 1

