• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Evaluating search engines by modeling the relationship between relevance and clicks (2008)

by B Carterette, R Jones
Venue:In proceedings of NIPS20
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 24
Next 10 →

A dynamic bayesian network click model for web search ranking

by Olivier Chapelle, Ya Zhang - In WWW , 2009
"... As with any application of machine learning, web search ranking requires labeled data. The labels usually come in the form of relevance assessments made by editors. Click logs can also provide an important source of implicit feedback and can be used as a cheap proxy for editorial labels. The main di ..."
Abstract - Cited by 36 (7 self) - Add to MetaCart
As with any application of machine learning, web search ranking requires labeled data. The labels usually come in the form of relevance assessments made by editors. Click logs can also provide an important source of implicit feedback and can be used as a cheap proxy for editorial labels. The main difficulty however comes from the so called position bias — urls appearing in lower positions are less likely to be clicked even if they are relevant. In this paper, we propose a Dynamic Bayesian Network which aims at providing us with unbiased estimation of the relevance from the click logs. Experiments show that the proposed click model outperforms other existing click models in predicting both click-through rate and relevance. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]; H.3.5 [Online

Expected Reciprocal Rank for Graded Relevance

by Olivier Chapelle, Ya Zhang - CIKM'09, NOVEMBER 2–6, 2009, HONG KONG, CHINA. , 2009
"... While numerous metrics for information retrieval are available in the case of binary relevance, there is only one commonly used metric for graded relevance, namely the Discounted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assumption: a document in ..."
Abstract - Cited by 32 (6 self) - Add to MetaCart
While numerous metrics for information retrieval are available in the case of binary relevance, there is only one commonly used metric for graded relevance, namely the Discounted Cumulative Gain (DCG). A drawback of DCG is its additive nature and the underlying independence assumption: a document in a given position has always the same gain and discount independently of the documents shown above it. Inspired by the “cascade ” user model, we present a new editorial metric for graded relevance which overcomes this difficulty and implicitly discounts documents which are shown below very relevant documents. More precisely, this new metric is defined as the expected reciprocal length of time that the user will take to find a relevant document. This can be seen as an extension of the classical reciprocal rank to the graded relevance case and we call this metric Expected Reciprocal Rank (ERR). We conduct an extensive evaluation on the query logs of a commercial search engine and show that ERR correlates better with clicks metrics than other editorial metrics.

How Does Clickthrough Data Reflect Retrieval Quality?

by Filip Radlinski, Madhu Kurup, Thorsten Joachims
"... Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a ..."
Abstract - Cited by 31 (4 self) - Add to MetaCart
Automatically judging the quality of retrieval functions based on observable user behavior holds promise for making retrieval evaluation faster, cheaper, and more user centered. However, the relationship between observable user behavior and retrieval quality is not yet fully understood. We present a sequence of studies investigating this relationship for an operational search engine on the arXiv.org e-print archive. We find that none of the eight absolute usage metrics we explore (e.g., number of clicks, frequency of query reformulations, abandonment) reliably reflect retrieval quality for the sample sizes we consider. However, we find that paired experiment designs adapted from sensory analysis produce accurate and reliable statements about the relative quality of two retrieval functions. In particular, we investigate two paired comparison tests that analyze clickthrough data from an interleaved presentation of ranking pairs, and we find that both give accurate and consistent results. We conclude that both paired comparison tests give substantially more accurate and sensitive evaluation results than absolute usage metrics in our domain.

Predicting bounce rates in sponsored search advertisements

by D. Sculley, Robert Malkin, Sugato Basu, Roberto J. Bayardo, Google Inc - In SIGKDD Conference on Knowledge Discovery and Data Mining (KDD , 2009
"... This paper explores an important and relatively unstudied quality measure of a sponsored search advertisement: bounce rate. The bounce rate of an ad can be informally defined as the fraction of users who click on the ad but almost immediately move on to other tasks. A high bounce rate can lead to po ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
This paper explores an important and relatively unstudied quality measure of a sponsored search advertisement: bounce rate. The bounce rate of an ad can be informally defined as the fraction of users who click on the ad but almost immediately move on to other tasks. A high bounce rate can lead to poor advertiser return on investment, and suggests search engine users may be having a poor experience following the click. In this paper, we first provide quantitative analysis showing that bounce rate is an effective measure of user satisfaction. We then address the question, can we predict bounce rate by analyzing the features of the advertisement? An affirmative answer would allow advertisers and search engines to predict the effectiveness and quality of advertisements before they are shown. We propose solutions to this problem involving large-scale learning methods that leverage features drawn from ad creatives in addition

G.: Matching task profiles and user needs in personalized web search

by Julia Luxenburger, Shady Elbassuoni, Gerhard Weikum - In: CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge mining , 2008
"... Personalization has been deemed one of the major challenges in information retrieval with a significant potential for providing better search experience to individual users. Especially, the need for enhanced user models better capturing elements such as users ’ goals, tasks, and contexts has been id ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
Personalization has been deemed one of the major challenges in information retrieval with a significant potential for providing better search experience to individual users. Especially, the need for enhanced user models better capturing elements such as users ’ goals, tasks, and contexts has been identified. In this paper, we introduce a statistical language model for user tasks representing different granularity levels of a user profile, ranging from very specific search goals to broad topics. We propose a personalization framework that selectively matches the actual user information need with relevant past user tasks, and allows to dynamically switch the course of personalization from re-finding very precise information to biasing results to general user interests. In the extreme, our model is able to detect when the user’s search and browse history is not appropriate for aiding the user in satisfying her current information quest. Instead of blindly applying personalization to all user queries, our approach refrains from undue actions in these cases, accounting for the user’s desire of discovering new topics, and changing interests over time. The effectiveness of our method is demonstrated by an empirical user study.

Efficient multiple-click models in web search

by Fan Guo, Chao Liu, Yi-min Wang - In WSDM ’09: Proceedings of the Second International Conference on Web Search and Data Mining , 2009
"... Many tasks that leverage web search users ’ implicit feedback rely on a proper and unbiased interpretation of user clicks. Previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and models on how an average user examines and possibly ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
Many tasks that leverage web search users ’ implicit feedback rely on a proper and unbiased interpretation of user clicks. Previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and models on how an average user examines and possibly clicks web documents returned by a search engine with respect to the submitted query. In this paper, we attempt to close the gap between previous work, which studied how to model a single click, and the reality that multiple clicks on web documents in a single result page are not uncommon. Specifically, we present two multiple-click models: the independent click model (ICM) which is reformulated from previous work, and the dependent click model (DCM) which takes into consideration dependencies between multiple clicks. Both models can be efficiently learned with linear time and space complexities. More importantly, they can be incrementally updated as new click logs flow in. These are well-demanded properties in reality. We systematically evaluate the two models on click logs obtained in July 2008 from a major commercial search engine. The data set, after preprocessing, contains over 110 thousand distinct queries and 8.8 million query sessions. Extensive experimental studies demonstrate the gain of modeling multiple clicks and their dependencies. Finally, we note that since our experimental setup does not rely on tweaking search result rankings, it can be easily adopted by future studies.

Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem

by Yisong Yue, Thorsten Joachims
"... We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 20 ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 2007; Radlinski et al., 2008b). We will present an algorithm with theoretical guarantees as well as simulation results. 1.

Mining user web search activity with layered bayesian networks or how to capture a click in its context

by Benjamin Piwowarski, Georges Dupret, Rosie Jones - In WSDM ’09 , 2009
"... Mining user web search activity potentially has a broad range of applications including web result pre-fetching, automatic search query reformulation, click spam detection, estimation of document relevance and prediction of user satisfaction. This analysis is difficult because the data recorded by s ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Mining user web search activity potentially has a broad range of applications including web result pre-fetching, automatic search query reformulation, click spam detection, estimation of document relevance and prediction of user satisfaction. This analysis is difficult because the data recorded by search engines while users interact with them, although abundant, is very noisy. In this work, we explore the utility of mining search behavior of users, represented by observed variables including the time the user spends on the page, and whether the user reformulated his or her query. As a case study, we examine the contribution this data makes to predicting the relevance of a document in the absence of document content models. To this end, we first propose a method for grouping the interactions of a particular user according to the different tasks he or she undertakes. With each task corresponding to a distinct information need, we then propose a Bayesian Network to holistically model these interactions. The aim is to identify distinct patterns of search behaviors. Finally, we join these patterns to a list of custom features and we use gradient boosted decision trees to predict the relevance of a set of query document pairs for which we have relevance assessments. The experimental results confirm the potential of our model, with significant improvements in precision for predicting the relevance of documents based on a model of the user’s search and click behavior, over a baseline model using only click and query features, with no Bayesian Network input.

Global Ranking by Exploiting User Clicks

by Shihao Ji, Gui-rong Xue, Ke Zhou, O. Chapelle, Gordon Sun, Ciya Liao, Zhaohui Zheng, Hongyuan Zha
"... It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which record ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which records the sequence of documents being clicked and not clicked in the result set during a user search session. We formulate the problem as a global ranking problem, emphasizing the importance of the sequential nature of user clicks, with the goal to predict the relevance labels of all the documents in a search session. This is distinct from conventional learning to rank methods that usually design a ranking model defined on a single document; in contrast, in our model the relational information among the documents as manifested by an aggregation of user clicks is exploited to rank all the documents jointly. In particular, we adapt several sequential supervised learning algorithms, including the conditional random field (CRF), the sliding window method and the recurrent sliding window method, to the global ranking problem. Experiments on the click data collected from a commercial search engine demonstrate that our methods can outperform the baseline models for search results re-ranking.

Tailoring click models to user goals

by Fan Guo, Lei Li, Christos Faloutsos - In WSCD ’09: Proceedings of the 2009 workshop on Web Search Click Data , 2009
"... Click models provide a principled way of understanding user interaction with web search results in a query session and a statistical tool for leveraging search engine click logs to analyze and improve user experience. An important component in all existing click models is the user behavior assumptio ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
Click models provide a principled way of understanding user interaction with web search results in a query session and a statistical tool for leveraging search engine click logs to analyze and improve user experience. An important component in all existing click models is the user behavior assumption – how users scan, examine and click web documents listed in the result page. Usually the average user behavior pattern is summarized in a small set of global parameters. Can we fit multiple models with different user behavior parameters on a click data set? A previous study showed that the mixture modeling approach did not lead to better performance despite extra computational cost. In this paper, we present how to tailor click models to user goals in web search through query term classification. We demonstrate that better predicative power could be achieved by fitting two click models for navigational queries and informational queries respectively, as evidenced by the likelihood and perplexity evaluation results on a subset of the MSN 2006 RFP data which consists of 121,179 distinct query terms and over 2.8 million query sessions. We also propose search relevance score (SRS) as a flexible evaluation metric of search engine performance. This metric can be derived as summary statistics under any click model, and is applicable to a single query session, a particular query term and the search engine overall.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University