Results 1 - 10
of
33
Query suggestion using hitting time
- in Proc. of conf. on Inf. and Knowledge Manage. (CIKM’08
"... Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. How ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Generating alternative queries, also known as query suggestion, has long been proved useful to help a user explore and express his information need. In many scenarios, such suggestions can be generated from a large scale graph of queries and other accessory information, such as the clickthrough. However, how to generate suggestions while ensuring their semantic consistency with the original query remains a challenging problem. In this work, we propose a novel query suggestion algorithm based on ranking queries with the hitting time on a large scale bipartite graph. Without involvement of twisted heuristics or heavy tuning of parameters, this method clearly captures the semantic consistency between the suggested query and the original query. Empirical experiments on a large scale query log of a commercial search engine and a scientific literature collection show that hitting time is effective to generate semantically consistent query suggestions. The proposed algorithm and its variations can successfully boost long tail queries, accommodating personalized query suggestion, as well as finding related authors in research.
Learning diverse rankings with multi-armed bandits
- In Proceedings of the 25 th ICML
, 2008
"... Algorithms for learning to rank Web documents usually assume a document’s relevance is independent of other documents. This leads to learned ranking functions that produce rankings with redundant results. In contrast, user studies have shown that diversity at high ranks is often preferred. We presen ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
Algorithms for learning to rank Web documents usually assume a document’s relevance is independent of other documents. This leads to learned ranking functions that produce rankings with redundant results. In contrast, user studies have shown that diversity at high ranks is often preferred. We present two online learning algorithms that directly learn a diverse ranking of documents based on users ’ clicking behavior. We show that these algorithms minimize abandonment, or alternatively, maximize the probability that a relevant document is found in the top k positions of a ranking. Moreover, one of our algorithms asymptotically achieves optimal worst-case performance even if users’ interests change. 1.
Smoothing Clickthrough Data for Web Search Ranking
"... Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and doc ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
Incorporating features extracted from clickthrough data (called clickthrough features) has been demonstrated to significantly improve the performance of ranking models for Web search applications. Such benefits, however, are severely limited by the data sparseness problem, i.e., many queries and documents have no or very few clicks. The ranker thus cannot rely strongly on clickthrough features for document ranking. This paper presents two smoothing methods to expand clickthrough data: query clustering via Random Walk on click graphs and a discounting method inspired by the Good-Turing estimator. Both methods are evaluated on real-world data in three Web search domains. Experimental results show that the ranking models trained on smoothed clickthrough features consistently outperform those trained on unsmoothed features. This study demonstrates both the importance and the benefits of dealing with the sparseness problem in clickthrough data.
Click-Through Prediction for News Queries
"... A growing trend in commercial search engines is the display of specialized content such as news, products, etc. interleaved with web search results. Ideally, this content should be displayed only when it is highly relevant to the search query, as it competes for space with “regular ” results and adv ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
A growing trend in commercial search engines is the display of specialized content such as news, products, etc. interleaved with web search results. Ideally, this content should be displayed only when it is highly relevant to the search query, as it competes for space with “regular ” results and advertisements. One measure of the relevance to the search query is the click-through rate the specialized content achieves when displayed; hence, if we can predict this click-through rate accurately, we can use this as the basis for selecting when to show specialized content. In this paper, we consider the problem of estimating the clickthrough rate for dedicated news search results. For queries for which news results have been displayed repeatedly before, the click-through rate can be tracked online; however, the key challenge for which previously unseen queries to display news results remains. In this paper we propose a supervised model that offers accurate prediction of news click-through rates and satisfies the requirement of adapting quickly to emerging news events.
Preference Handling –- An Introductory Tutorial
"... We present a tutorial introduction to the area of preference handling – one of the core issues in the design of any system that automates or supports decision making. The main goal of this tutorial is to provide a framework, or perspective, within which current work on preference handling – represen ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We present a tutorial introduction to the area of preference handling – one of the core issues in the design of any system that automates or supports decision making. The main goal of this tutorial is to provide a framework, or perspective, within which current work on preference handling – representation, reasoning, and elicitation – can be understood. Our intention is not to provide a technical description of the diverse methods used, but rather, to provide a general perspective on the problem and its varied solutions and to highlight central ideas and techniques.
Efficient multiple-click models in web search
- In WSDM ’09: Proceedings of the Second International Conference on Web Search and Data Mining
, 2009
"... Many tasks that leverage web search users ’ implicit feedback rely on a proper and unbiased interpretation of user clicks. Previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and models on how an average user examines and possibly ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Many tasks that leverage web search users ’ implicit feedback rely on a proper and unbiased interpretation of user clicks. Previous eye-tracking experiments and studies on explaining position-bias of user clicks provide a spectrum of hypotheses and models on how an average user examines and possibly clicks web documents returned by a search engine with respect to the submitted query. In this paper, we attempt to close the gap between previous work, which studied how to model a single click, and the reality that multiple clicks on web documents in a single result page are not uncommon. Specifically, we present two multiple-click models: the independent click model (ICM) which is reformulated from previous work, and the dependent click model (DCM) which takes into consideration dependencies between multiple clicks. Both models can be efficiently learned with linear time and space complexities. More importantly, they can be incrementally updated as new click logs flow in. These are well-demanded properties in reality. We systematically evaluate the two models on click logs obtained in July 2008 from a major commercial search engine. The data set, after preprocessing, contains over 110 thousand distinct queries and 8.8 million query sessions. Extensive experimental studies demonstrate the gain of modeling multiple clicks and their dependencies. Finally, we note that since our experimental setup does not rely on tweaking search result rankings, it can be easily adopted by future studies.
Entropy-biased Models for Query Representation On The Click Graph
, 2009
"... Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-elim ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-eliminated. Nor do they handle heterogeneous query-URL pairs well. In this paper, we investigate and develop a novel entropy-biased framework for modeling click graphs. The intuition behind this model is that various query-URL pairs should be treated differently, i.e., common clicks on less frequent but more specific URLs are of greater value than common clicks on frequent and general URLs. Based on this intuition, we utilize the entropy information of the URLs and introduce a new concept, namely the inverse query frequency (IQF), to weigh the importance (discriminative ability) of a click on a certain URL. The IQF weighting scheme is never explicitly explored or statistically examined for any bipartite graphs in the information retrieval literature. We not only formally define and quantify this scheme, but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation. To illustrate our methodology, we conduct experiments with the AOL query log data for query similarity analysis and query suggestion tasks. Experimental results demonstrate that considerable improvements in performance are obtained with our entropy-biased models.
Global Ranking by Exploiting User Clicks
"... It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which record ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which records the sequence of documents being clicked and not clicked in the result set during a user search session. We formulate the problem as a global ranking problem, emphasizing the importance of the sequential nature of user clicks, with the goal to predict the relevance labels of all the documents in a search session. This is distinct from conventional learning to rank methods that usually design a ranking model defined on a single document; in contrast, in our model the relational information among the documents as manifested by an aggregation of user clicks is exploited to rank all the documents jointly. In particular, we adapt several sequential supervised learning algorithms, including the conditional random field (CRF), the sliding window method and the recurrent sliding window method, to the global ranking problem. Experiments on the click data collected from a commercial search engine demonstrate that our methods can outperform the baseline models for search results re-ranking.
Using Clicks as Implicit Judgments: Expectations Versus Observations
"... Abstract. Clickthrough data has been the subject of increasing popularity as an implicit indicator of user feedback. Previous analysis has suggested that user click behaviour is subject to a quality bias—that is, users click at different rank positions when viewing effective search results than when ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. Clickthrough data has been the subject of increasing popularity as an implicit indicator of user feedback. Previous analysis has suggested that user click behaviour is subject to a quality bias—that is, users click at different rank positions when viewing effective search results than when viewing less effective search results. Based on this observation, it should be possible to use click data to infer the quality of the underlying search system. In this paper we carry out a user study to systematically investigate how click behaviour changes for different levels of search system effectiveness as measured by information retrieval performance metrics. Our results show that click behaviour does not vary systematically with the quality of search results. However, click behaviour does vary significantly between individual users, and between search topics. This suggests that using direct click behaviour—click rank and click frequency—to infer the quality of the underlying search system is problematic. Further analysis of our user click data indicates that the correspondence between clicks in a search result list and subsequent confirmation that the clicked resource is actually relevant is low. Using clicks as an implicit indication of relevance should therefore be done with caution. 1
Bypass rates: reducing query abandonment using negative inferences
- in Proc. of the ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining
, 2008
"... We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed—documents returned higher in the ordering of the search results but skipped by the user. This approach complements the popular click-through rate analysis, and helps to dr ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
We introduce a new approach to analyzing click logs by examining both the documents that are clicked and those that are bypassed—documents returned higher in the ordering of the search results but skipped by the user. This approach complements the popular click-through rate analysis, and helps to draw negative inferences in the click logs. We formulate a natural objective that finds sets of results that are unlikely to be collectively bypassed by a typical user. This is closely related to the problem of reducing query abandonment. We analyze a greedy approach to optimizing this objective, and establish theoretical guarantees of its performance. We evaluate our approach on a large set of queries, and demonstrate that it compares favorably to the maximal marginal relevance approach on a number of metrics including mean average precision and mean reciprocal rank.

