Results 1 -
5 of
5
Modeling Score Distributions for Combining the Outputs of Search Engines
, 2001
"... In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant docu ..."
Abstract
-
Cited by 72 (4 self)
- Add to MetaCart
In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be fitted using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for not only probabilistic search engines like INQUERY but also vector space search engines like SMART for English. We have also used this model to fit the output of other search engines like LSI search engines and search engines indexing other languages like Chinese. It is then shown that given a query for which relevance information is not available, a mixture model consisting of an exponential and a normal distribution can be fitted to the score distribution. These distributions can be used to map the scores of a search engine to probabilities. We also discuss how the shape of the score distributions arise given certain assumptions about word distributions in documents. We hypothesize that all 'good' text search engines operating on any language have similar characteristics. This model has many possible applications. For example, the outputs of different search engines can be combined by averaging the probabilities (optimal if the search engines are independent) or by using the probabilities to select the best engine for each query. Results show that the technique performs as well as the best current combination techniques. This material is based on work supported in part by the National Science Foundation, Library of Congress and Department of Commerce under cooperative agreement number EEC-9209623, in part by the National Science Foundation under grant numbers IRI-9619117 and IIS-9909073, in part by N...
QuASM: A System for Question Answering Using Semi-Structured Data
- In Proceedings of the Joint Conference on Digital Libraries (JCDL) 2002
, 2002
"... This paper describes a system for question answering using semistructured metadata, QuASM (pronounced "chasm"). Question answering systems aim to improve search performance by providing users with specific answers, rather than having users scan retrieved documents for these answers. Our goal is to a ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
This paper describes a system for question answering using semistructured metadata, QuASM (pronounced "chasm"). Question answering systems aim to improve search performance by providing users with specific answers, rather than having users scan retrieved documents for these answers. Our goal is to answer factual questions by exploiting the structure inherent in documents found on the World Wide Web (WWW). Based on this structure, documents are indexed into smaller units and associated with metadata. Transforming table cells into smaller units associated with metadata is an important part of this task. In addition, we report on work to improve question classification using language models. The domain used to develop this system is documents retrieved from a crawl of www.fedstats.gov.
A Formal Approach to Score Normalization for Meta-search
"... engines in response to a query, has been shown to improve performance. Since the scores produced by different search engines are not comparable, researchers have often decomposed the metasearch problem into a score normalization step followed by a combination step. Combination has been studied by ma ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
engines in response to a query, has been shown to improve performance. Since the scores produced by different search engines are not comparable, researchers have often decomposed the metasearch problem into a score normalization step followed by a combination step. Combination has been studied by many researchers. While appropriate normalization can affect performance, most of the normalization schemes suggested are ad hoc in nature.
Segmentation of Search Engine Results for Effective Data-Fusion
"... Abstract. Metasearch and data-fusion techniques combine the rank lists of multiple document retrieval systems with the aim of improving search coverage and precision. We propose a new fusion method that partitions the rank lists of document retrieval systems into chunks. The size of chunks grows exp ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Metasearch and data-fusion techniques combine the rank lists of multiple document retrieval systems with the aim of improving search coverage and precision. We propose a new fusion method that partitions the rank lists of document retrieval systems into chunks. The size of chunks grows exponentially in the rank list. Using a small number of training queries, the probabilities of relevance of documents in different chunks are approximated for each search system. The estimated probabilities and normalized document scores are used to compute the final document ranks in the merged list. We show that our proposed method produces higher average precision values than previous systems across a range of testbeds. 1
UCD SIFT in the TREC 2009 Web Track
"... Abstract. The SIFT (SIFT Information Fusion Techniques) group in UCD is dedicated to researching Data Fusion in Information Retrieval. This area of research involves the merging of multiple sets of results into a single result set that is presented to the user. As a means of evaluating the effective ..."
Abstract
- Add to MetaCart
Abstract. The SIFT (SIFT Information Fusion Techniques) group in UCD is dedicated to researching Data Fusion in Information Retrieval. This area of research involves the merging of multiple sets of results into a single result set that is presented to the user. As a means of evaluating the effectiveness of this work, the group entered Category B of the TREC 2009 Web Track. This paper discusses the strategies and experiments employed by the UCD SIFT group in entering the TREC Web Track 2009. This involved the use of freely-available Information Retrieval tools to provide inputs to the data fusion process, with the aim of contrasting with more sophisticated systems. 1

