• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Beyond independent relevance: methods and evaluation metrics for subtopic retrieval (2003)

by C Zhai, W W Cohen, J Lafferty
Venue:In SIGIR ’03
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 77
Next 10 →

Turning down the noise in the blogosphere

by Khalid El-arini, Gaurav Veda, Dafna Shahaf, Carlos Guestrin May - In KDD , 2009
"... In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approa ..."
Abstract - Cited by 13 (5 self) - Add to MetaCart
In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere. We define a simple and elegant notion of coverage and formalize it as a submodular optimization problem, for which we can efficiently compute a near-optimal solution. In addition, since people have varied interests, the ideal coverage algorithm should incorporate user preferences in order to tailor the selected posts to individual tastes. We define the problem of learning a personalized coverage function by providing an appropriate user-interaction model and formalizing an online learning framework for this task. We then provide a no-regret algorithm which can quickly learn a user’s preferences from limited feedback. We evaluate our coverage and personalization algorithms extensively over real blog data. Results from a user study show that our simple coverage algorithm does as well as most popular blog aggregation sites, including Google Blog Search, Yahoo! Buzz, and Digg. Furthermore, we demonstrate empirically that our algorithm can successfully adapt to user preferences. We believe that our technique, especially with personalization, can dramatically reduce information overload.

Learning to rank relational objects and its application to web search

by Tao Qin, Tie-yan Liu, De-sheng Wang, Wen-ying Xiong, Xu-dong Zhang, Hang Li - In WWW ’08 , 2008
"... Learning to rank is a new statistical learning technology on creating a ranking model for sorting objects. The technology has been successfully applied to web search, and is becoming one of the key machineries for building search engines. Existing approaches to learning to rank, however, did not con ..."
Abstract - Cited by 12 (5 self) - Add to MetaCart
Learning to rank is a new statistical learning technology on creating a ranking model for sorting objects. The technology has been successfully applied to web search, and is becoming one of the key machineries for building search engines. Existing approaches to learning to rank, however, did not consider the cases in which there exists relationship between the objects to be ranked, despite of the fact that such situations are very common in practice. For example, in web search, given a query certain relationships usually exist among the the retrieved documents, e.g., URL hierarchy, similarity, etc., and sometimes it is necessary to utilize the information in ranking of the documents. This paper addresses the issue and formulates it as a novel learning problem, referred to as, ‘learning to rank relational objects’. In the new learning

LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval

by Tao Qin, Tie-yan Liu, Jun Xu, Hang Li
"... LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the documen ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.

PAPER Ambiguous requests: implications for retrieval tests, systems and theories

by Karen Spärck-jones, Stephen E. Robertson, Mark S
"... In early 2006, as a result of a series of conversations between Steve Robertson, Mark Sanderson and Karen Spärck-Jones, Karen circulated a note summing up our discussions, which were on the topic of ambiguous requests. At the core of our discussion was the question: is too much information retrieval ..."
Abstract - Cited by 10 (1 self) - Add to MetaCart
In early 2006, as a result of a series of conversations between Steve Robertson, Mark Sanderson and Karen Spärck-Jones, Karen circulated a note summing up our discussions, which were on the topic of ambiguous requests. At the core of our discussion was the question: is too much information retrieval research focussed on search tasks where the query unambiguously defines the user’s need? Karen took great interest in this topic and examined it from many angles. There was input from the two of us, but as can be seen from the writing style, the text is principly and delightfully Karen’s.

Utility-based information distillation over temporally sequenced documents

by Yiming Yang, Abhay Harpale, Abhimanyu Lad, Bryan Kisiel, Ni Lao, Monica Rogati - Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval , 2007
"... This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detection and non-redundant passage ranking with res ..."
Abstract - Cited by 8 (4 self) - Add to MetaCart
This paper examines a new approach to information distillation over temporally ordered documents, and proposes a novel evaluation scheme for such a framework. It combines the strengths of and extends beyond conventional adaptive filtering, novelty detection and non-redundant passage ranking with respect to long-lasting information needs (‘tasks ’ with multiple queries). Our approach supports fine-grained user feedback via highlighting of arbitrary spans of text, and leverages such information for utility optimization in adaptive settings. For our experiments, we defined hypothetical tasks based on news events in the TDT4 corpus, with multiple queries per task. Answer keys (nuggets) were generated for each query and a semiautomatic procedure was used for acquiring rules that allow automatically matching nuggets against system responses. We also propose an extension of the NDCG metric for assessing the utility of ranked passages as a combination of relevance and novelty. Our results show encouraging utility enhancements using the new approach, compared to the baseline systems without incremental learning or the novelty detection components.

Redundancy, Diversity and Interdependent Document Relevance

by Filip Radlinski, Paul N. Bennett, Ben Carterette, Thorsten Joachims , 2009
"... ..."
Abstract - Cited by 7 (2 self) - Add to MetaCart
Abstract not found

Explicit search result diversification through sub-queries

by Rodrygo L. T. Santos, Jie Peng, Craig Macdonald, Iadh Ounis - In Proc. of ECIR , 2010
"... Abstract. Queries submitted to a retrieval system are often ambiguous. In such a situation, a sensible strategy is to diversify the ranking of results to be retrieved, in the hope that users will find at least one of these results to be relevant to their information need. In this paper, we introduce ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
Abstract. Queries submitted to a retrieval system are often ambiguous. In such a situation, a sensible strategy is to diversify the ranking of results to be retrieved, in the hope that users will find at least one of these results to be relevant to their information need. In this paper, we introduce xQuAD, a novel framework for search result diversification that builds such a diversified ranking by explicitly accounting for the relationship between documents retrieved for the original query and the possible aspects underlying this query, in the form of sub-queries. We evaluate the effectiveness of xQuAD using a standard TREC collection. The results show that our framework markedly outperforms state-ofthe-art diversification approaches under a simulated best-case scenario. Moreover, we show that its effectiveness can be further improved by estimating the relative importance of each identified sub-query. Finally, we show that our framework can still outperform the simulated bestcase scenario of the state-of-the-art diversification approaches using subqueries automatically derived from the baseline document ranking itself. 1

Evaluating diversified search results using per-intent graded relevance

by Tetsuya Sakai, Ruihua Song - In Proceedings of ACM SIGIR 2011 , 2011
"... Search queries are often ambiguous and/or underspecified. To accomodate different user needs, search result diversification has received attention in the past few years. Accordingly, several new metrics for evaluating diversification have been proposed, but their properties are little understood. We ..."
Abstract - Cited by 6 (4 self) - Add to MetaCart
Search queries are often ambiguous and/or underspecified. To accomodate different user needs, search result diversification has received attention in the past few years. Accordingly, several new metrics for evaluating diversification have been proposed, but their properties are little understood. We compare the properties of existing metrics given the premises that (1) queries may have multiple intents; (2) the likelihood of each intent given a query is available; and (3) graded relevance assessments are available for each intent. We compare a wide range of traditional and diversified IR metrics after adding graded relevance assessments to the TREC 2009 Web track diversity task test collection which originally had binary relevance assessments. Our primary criterion is discriminative power, which represents the reliability of a metric in an experiment. Our results show that diversified IR experiments with a given number of topics can be as reliable as traditional IR experiments with the same number of topics, provided that the right metrics are used. Moreover, we compare the intuitiveness of diversified IR metrics by closely examining the actual ranked lists from TREC. We show that a family of metrics called D♯-measures have several advantages over other metrics such as α-nDCG and Intent-Aware metrics.

Mobile Information Retrieval with Search Results Clustering: Prototypes and Evaluations

by Claudio Carpineto, Stefano Mizzaro, Giovanni Romano, Matteo Snidero - Journal of American Society for Information Science and Technology (JASIST , 2009
"... Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results cl ..."
Abstract - Cited by 6 (3 self) - Add to MetaCart
Web searches from mobile devices such as PDAs and cell phones are becoming increasingly popular. However, the traditional list-based search interface paradigm does not scale well to mobile devices due to their inherent limitations. In this article, we investigate the application of search results clustering, used with some success for desktop computer searches, to the mobile scenario. Building on CREDO (Conceptual Reorganization of Documents), a Web clustering engine based on concept lattices, we present its mobile versions Credino and SmartCREDO, for PDAs and cell phones, respectively. Next, we evaluate the retrieval performance of the three prototype systems. We measure the effectiveness of their clustered results compared to a ranked list of results on a subtopic retrieval task, by means of the device-independent notion of subtopic reach time together with a reusable test collection built from Wikipedia ambiguous entries. Then, we make a crosscomparison of methods (i.e., clustering and ranked list) and devices (i.e., desktop, PDA, and cell phone), using an interactive information-finding task performed by external participants. The main finding is that clustering engines are a viable complementary approach to plain search engines both for desktop and mobile searches especially, but not only, for multitopic informational queries.

Diversifying Web Search Results

by Davood Rafiei, Krishna Bharat, Anand Shukla
"... Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how ‘diversity ’ interplays with ‘quality’ and when preference should be given to one or both ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how ‘diversity ’ interplays with ‘quality’ and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14 % to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University