Results 1 - 10
of
20
Learning Concept Importance Using a Weighted Dependence Model
"... Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
Modeling query concepts through term dependencies has been shown to have a significant positive effect on retrieval performance, especially for tasks such as web search, where relevance at high ranks is particularly critical. Most previous work, however, treats all concepts as equally important, an assumption that often does not hold, especially for longer, more complex queries. In this paper, we show that one of the most effective existing term dependence models can be naturally extended by assigning weights to concepts. We demonstrate that the weighted dependence model can be trained using existing learning-to-rank techniques, even with a relatively small number of training queries. Our study compares the effectiveness of both endogenous (collectionbased) and exogenous (based on external sources) features for determining concept importance. To test the weighted dependence model, we perform experiments on both publicly available TREC corpora and a proprietary web corpus. Our experimental results indicate that our model consistently and significantly outperforms both the standard bag-of-words model and the unweighted term dependence model, and that combining endogenous and exogenous features generally results in the best retrieval effectiveness.
Reducing the risk of query expansion via robust constrained optimization
- Proceedings of the Eighteenth International Conference on Information and Knowledge Management (CIKM 2009). ACM. Hong
"... We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expans ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
"... Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries. We assume t ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries. We assume that a query is parallel to the titles of documents clicked on for that query. Two translation models are trained and integrated into retrieval models: A word-based translation model that learns the translation probability between single words, and a phrase-based translation model that learns the translation probability between multi-term phrases. Experiments are carried out on a real world data set. The results show that the retrieval systems that use the translation models outperform significantly the systems that do not. The paper also demonstrates that standard statistical machine translation techniques such as word alignment, bilingual phrase extraction, and phrase-based decoding, can be adapted for building a better Web document retrieval system.
Placing Flickr Photos on a Map
"... In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language mod ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper we investigate generic methods for placing photos uploaded to Flickr on the World map. As primary input for our methods we use the textual annotations provided by the users to predict the single most probable location where the image was taken. Central to our approach is a language model based entirely on the annotations provided by users. We define extensions to improve over the language model using tag-based smoothing and cell-based smoothing, and leveraging spatial ambiguity. Further we demonstrate how to incorporate GeoNames 1, a large external database of locations. For varying levels of granularity, we are able to place images on a map with at least twice the precision of the state-of-the-art reported in the literature.
Dynamic Ranked Retrieval
, 2011
"... We present a theoretically well-founded retrieval model for dynamically generating rankings based on interactive user feedback. Unlike conventional rankings that remain static after the query was issued, dynamic rankings allow and anticipate user activity, thus providing a way to combine the otherwi ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
We present a theoretically well-founded retrieval model for dynamically generating rankings based on interactive user feedback. Unlike conventional rankings that remain static after the query was issued, dynamic rankings allow and anticipate user activity, thus providing a way to combine the otherwise contradictory goals of result diversification and high recall. We develop a decision-theoretic framework to guide the design and evaluation of algorithms for this interactive retrieval setting. Furthermore, we propose two dynamic ranking algorithms, both of which are computationally efficient. We prove that these algorithms provide retrieval performance that is guaranteed to be at least as good as the optimal static ranking algorithm. In empirical evaluations, dynamic ranking shows substantial improvements in retrieval performance over conventional static rankings.
Classifying and Filtering Blind Feedback Terms to Improve Information Retrieval Effectiveness
"... The classification of blind relevance feedback (BRF) terms described in this paper aims at increasing precision or recall by determining which terms decrease, increase or do not change the corresponding information retrieval (IR) performance metric. Classification and IR experiments are performed on ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
The classification of blind relevance feedback (BRF) terms described in this paper aims at increasing precision or recall by determining which terms decrease, increase or do not change the corresponding information retrieval (IR) performance metric. Classification and IR experiments are performed on the German and English GIRT data, using the BM25 retrieval model. Several basic memory-based classifiers are trained on different feature sets, grouping together features from different query expansion (QE) approaches. Combined classifiers employ the results of the basic classifiers and correctness predictions as features. The best combined classifiers for German (English) yield 22.9 % (26.4%) and 5.8 % (1.9%) improvement for term classification wrt. precision and recall compared to the best basic classifiers. IR experiments based on this term classification have also been performed. Filtering out different types of BRF terms shows that selecting feedback terms predicted to increase precision improves the average precision significantly compared to experiments without BRF. MAP is improved by +19.8 % compared to the best standard BRF experiment (+11 % for German). BRF term classification also increases the number of relevant and retrieved documents, geometric MAP, and P@10 in comparison to standard BRF. Experiments based on an optimal classification show that there is potential for improving IR effectiveness even more.
Positional Relevance Model for Pseudo-Relevance Feedback
, 2010
"... Pseudo-relevance feedback is an effective technique for improving retrieval results. Traditional feedback algorithms use a whole feedback document as a unit to extract words for query expansion, which is not optimal as a document may cover several different topics and thus contain much irrelevant in ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Pseudo-relevance feedback is an effective technique for improving retrieval results. Traditional feedback algorithms use a whole feedback document as a unit to extract words for query expansion, which is not optimal as a document may cover several different topics and thus contain much irrelevant information. In this paper, we study how to effectively select from feedback documents those words that are focused on the query topic based on positions of terms in feedback documents. We propose a positional relevance model (PRM) to address this problem in a unified probabilistic way. The proposed PRM is an extension of the relevance model to exploit term positions and proximity so as to assign more weights to words closer to query words based on the intuition that words closer to query words are more likely to be related to the query topic. We develop two methods to estimate PRM based on different sampling processes. Experiment results on two large retrieval datasets show that the proposed PRM is effective and robust for pseudo-relevance feedback, significantly outperforming the relevance model in both document-based feedback and passage-based feedback.
Query Expansion for Language Modeling using Sentence Similarities
"... Abstract. We propose a novel method of query expansion for Language Modeling (LM) in Information Retrieval (IR) based on the similarity of the query with sentences in the top ranked documents from an initial retrieval run. In justification of our approach, we argue that the terms in the expanded que ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We propose a novel method of query expansion for Language Modeling (LM) in Information Retrieval (IR) based on the similarity of the query with sentences in the top ranked documents from an initial retrieval run. In justification of our approach, we argue that the terms in the expanded query obtained by the proposed method roughly follow a Dirichlet distribution which, being the conjugate prior of the multinomial distribution used in the LM retrieval model, helps the feedback step. IR experiments on the TREC ad-hoc retrieval test collections using the sentence based query expansion (SBQE) show a significant increase in Mean Average Precision (MAP) compared to baselines obtained using standard term-based query expansion using LM selection score and the Relevance Model (RLM). The proposed approach to query expansion for LM increases the likelihood of generation of the pseudo-relevant documents by adding sentences with maximum term overlap with the query sentences for each top ranked pseudorelevant document thus making the query look more like these documents. A per topic analysis shows that the new method hurts less queries compared to the baseline feedback methods, and improves average precision (AP) over a broad range of queries ranging from easy to difficult in terms of the initial retrieval AP. We also show that the new method is able to add a higher number of good feedback terms (the golden standard of good terms being the set of terms added by True Relevance Feedback). Additional experiments on the challenging search topics of the TREC-2004 Robust track show that the new method is able to improve MAP by 5.7 % without the use of external resources and query hardness prediction typically used for these topics. 1
Learning Lexicon Models from Search Logs for Query Expansion
"... This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods. 1

