Results 1 -
5 of
5
Estimation of global term weights for distributed and ubiquitous IR
- In Proc. of UKDU’06
, 2006
"... Abstract. This paper reports on information retrieval experiments aimed at application in ubiquitous or P2P environments. The main question to be investigated is whether global term statistics such as IDF – which normally require a global view on the document collection – can be replaced with estima ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. This paper reports on information retrieval experiments aimed at application in ubiquitous or P2P environments. The main question to be investigated is whether global term statistics such as IDF – which normally require a global view on the document collection – can be replaced with estimates obtained from a representative (i.e. well-balanced) reference corpus without losing too much effectiveness. Experiments with the British National Corpus (as a reference corpus) and two different IR test collections show that this is indeed possible. More interestingly still, lists of estimates can be compressed to a great extent without degrading performance, indicating that robust information retrieval is possible with very little knowledge of term characteristics, namely just an extended list of stop words. This makes it possible to distribute compressed term lists onto mobile devices without taking up too much bandwidth or storage capacity. 1
* Manuscript Global Term Weights in Distributed Environments
"... This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the retrieval collection are co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection – an “extended stop word list ” – are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some “domain-specific stop words ” need to be added. A good solution for achieving this is to mix estimates from small samples of the retrieval collection with ones derived from a reference corpus. Key words: distributed information retrieval, term weighting, language modeling 1
Preface
, 2006
"... In this paper we discuss the central role played by context in providing adaptive interfaces to a user within a ubiquitous environment. ..."
Abstract
- Add to MetaCart
In this paper we discuss the central role played by context in providing adaptive interfaces to a user within a ubiquitous environment.
The Workshop on
"... In this paper we discuss the central role played by context in providing adaptive interfaces to a user within a ubiquitous environment. ..."
Abstract
- Add to MetaCart
In this paper we discuss the central role played by context in providing adaptive interfaces to a user within a ubiquitous environment.
Comparison on the Effectiveness of Different Statistical Similarity Measures
"... Document retrieval is the process of matching of some sated user query against a set of free-text records (documents), its one major technique for organizing and managing information. This project was concerned with studying which of the different statistical measures in IR have the most effectivene ..."
Abstract
- Add to MetaCart
Document retrieval is the process of matching of some sated user query against a set of free-text records (documents), its one major technique for organizing and managing information. This project was concerned with studying which of the different statistical measures in IR have the most effectiveness on document retrieval using a unified set of documents. The results show that the Cosine Similarity Measure is the best of other seven measures (Inner Product,

