• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Combining Document Representations for Known-Item Search (2003)

by Paul Ogilvie , Jamie Callan
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 46
Next 10 →

Simple BM25 Extension to Multiple Weighted Fields

by Stephen Robertson, Hugo Zaragoza, Michael Taylor , 2004
"... This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents. In the past it has been common to compute scores for the individual fields (e.g. title and body) independently and then combine these scores (typically linearly) to arrive at a final score for t ..."
Abstract - Cited by 106 (10 self) - Add to MetaCart
This paper describes a simple way of adapting the BM25 ranking formula to deal with structured documents. In the past it has been common to compute scores for the individual fields (e.g. title and body) independently and then combine these scores (typically linearly) to arrive at a final score for the document. We highlight how this approach can lead to poor performance by breaking the carefully constructed non-linear saturation of term frequency in the BM25 function. We propose a much more intuitive alternative which weights term frequencies before the nonlinear term frequency saturation function is applied. In this scheme, a structured document with a title weight of two is mapped to an unstructured document with the title content repeated twice. This more verbose unstructured document is then ranked in the usual way. We demonstrate the advantages of this method with experiments on Reuters Vol1 and the TREC dotGov collection.

Discriminative Models for Information Retrieval

by Ramesh Nallapati - SIGIR '04 , 2004
"... Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two po ..."
Abstract - Cited by 66 (1 self) - Add to MetaCart
Discriminative models have been preferred over generative models in many machine learning problems in the recent past owing to some of their attractive theoretical properties. In this paper, we explore the applicability of discriminative classifiers for IR. We have compared the performance of two popular discriminative models, namely the maximum entropy model and support vector machines with that of language modeling, the state-of-the-art generative model for IR. Our experiments on ad-hoc retrieval indicate that although maximum entropy is significantly worse than language models, support vector machines are on par with language models. We argue that the main reason to prefer SVMs over language models is their ability to learn arbitrary features automatically as demonstrated by our experiments on the home-page finding task of TREC-10.

Hierarchical language models for expert finding in enterprise corpora

by Desislava Petkova, W. Bruce Croft - Int. J. on AI Tools
"... Enterprise corpora contain evidence of what employees work on and therefore can be used to automatically find experts on a given topic. We present a general approach for representing the knowledge of a potential expert as a mixture of language models from associated documents. First we retrieve docu ..."
Abstract - Cited by 35 (0 self) - Add to MetaCart
Enterprise corpora contain evidence of what employees work on and therefore can be used to automatically find experts on a given topic. We present a general approach for representing the knowledge of a potential expert as a mixture of language models from associated documents. First we retrieve documents given the expert’s name using a generative probabilistic technique and weight the retrieved documents according to expert-specific posterior distribution. Then we model the expert indirectly through the set of associated documents, which allows us to exploit their underlying structure and complex language features. Experiments show that our method has excellent performance on TREC 2005 expert search task and that it effectively collects and combines evidence for expertise in a heterogeneous collection. 1

Boosting web retrieval through query operations

by Gilad Mishne, Maarten de Rijke - Proceedings ECIR 2005 , 2005
"... We explore the use of phrase and proximity terms in the context of web retrieval, which is different from traditional ad-hoc retrieval both in document structure and in query characteristics. We show that for this type of task, the usage of both phrase and proximity terms is highly beneficial for e ..."
Abstract - Cited by 24 (2 self) - Add to MetaCart
We explore the use of phrase and proximity terms in the context of web retrieval, which is different from traditional ad-hoc retrieval both in document structure and in query characteristics. We show that for this type of task, the usage of both phrase and proximity terms is highly beneficial for early precision as well as for overall retrieval effectiveness. We also analyze why phrase and proximity terms are far more effective for web retrieval than for ad-hoc retrieval.

Using Wikipedia at the TREC QA Track

by David Ahn, Valentin Jijkoun, Gilad Mishne, Karin Müller, Maarten de Rijke, Stefan Schlobach , 2004
"... We describe our participation in the TREC 2004 Question Answering track. We ..."
Abstract - Cited by 18 (5 self) - Add to MetaCart
We describe our participation in the TREC 2004 Question Answering track. We

Probabilistic Models for Combining Diverse Knowledge Sources in Multimedia Retrieval

by Rong Yan - In Ph.D Thesis , 2006
"... In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval m ..."
Abstract - Cited by 18 (2 self) - Add to MetaCart
In recent years, the multimedia retrieval community is gradually shifting its emphasis from analyzing one media source at a time to exploring the opportunities of combining diverse knowledge sources from correlated media types and context. This thesis presents a conditional probabilistic retrieval model as a principled framework to combine diverse knowledge sources. An efficient rank-based learning approach has been developed to explicitly model the ranking relations in the learning process. Under this retrieval framework, we overview and develop a number of state-of-the-art approaches for extracting ranking features from multimedia knowledge sources. To incorporate query information in the combination model, this thesis develops a number of query analysis models that can automatically discover mixing structure of the query space based on previous retrieval results. To adapt the combination function on a per query basis, this thesis also presents a probabilistic local context analysis(pLCA) model to automatically leverage additional retrieval sources to improve initial retrieval outputs. All the proposed approaches are evaluated on multimedia retrieval tasks with large-scale video collections as well as meta-search tasks with large-scale text collections. 1

Indri at TREC 2005: Terabyte Track

by Donald Metzler, Trevor Strohman, Yun Zhou, W. B. Croft , 2004
"... This work details the experiments carried out using the Indri search engine during the TREC 2005 Terabyte Track. Results are presented for each of the three tasks, including e#ciency, ad hoc, and named page finding. Our e#ciency runs focused on query optimization techniques, our ad hoc runs look at ..."
Abstract - Cited by 17 (2 self) - Add to MetaCart
This work details the experiments carried out using the Indri search engine during the TREC 2005 Terabyte Track. Results are presented for each of the three tasks, including e#ciency, ad hoc, and named page finding. Our e#ciency runs focused on query optimization techniques, our ad hoc runs look at the importance of term proximity and document quality, and our named-page finding runs investigate the use of document priors and document structure.

Web object retrieval

by Zaiqing Nie, Yunxiao Ma, Shuming Shi, Ji-rong Wen, Wei-ying Ma - In Proc. WWW , 2007
"... The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects embedded in static Web pages and online Web databases. In this paper, we propose a paradigm shift to enable searching at the obje ..."
Abstract - Cited by 17 (1 self) - Add to MetaCart
The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects embedded in static Web pages and online Web databases. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. In this paper, we propose several language models for Web object retrieval. We test these models on our academic search engine called Libra and compare their performances. 1.

Automatic Feature Selection in the Markov Random Field Model for Information Retrieval

by Donald Metzler - In Proceedings of CIKM’07 , 2007
"... Previous applications of the Markov random field model for information retrieval have used manually chosen features. However, it is often difficult or impossible to know, a priori, the best set of features to use for a given task or data set. Therefore, there is a need to develop automatic feature s ..."
Abstract - Cited by 16 (6 self) - Add to MetaCart
Previous applications of the Markov random field model for information retrieval have used manually chosen features. However, it is often difficult or impossible to know, a priori, the best set of features to use for a given task or data set. Therefore, there is a need to develop automatic feature selection techniques. In this paper we describe a greedy procedure for automatically selecting features to use within the Markov random field model for information retrieval. We also propose a novel, robust method for describing classes of textual information retrieval features. Experimental results, evaluated on standard TREC test collections, show that our feature selection algorithm produces models that are either significantly more effective than, or equally effective as, models with manually selected features, such as those used in the past.

Language models for searching in Web corpora

by Jaap Kamps, Gilad Mishne , Maarten de Rijke - THE THIRTEENTH TEXT RETRIEVAL CONFERENCE (TREC 2004). NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY. NIST SPECIAL PUBLICATION , 2005
"... We describe our participation in the ..."
Abstract - Cited by 15 (8 self) - Add to MetaCart
We describe our participation in the
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University