Results 1 - 10
of
17
Letor: Benchmark dataset for research on learning to rank for information retrieval
- In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval
, 2007
"... This paper is concerned with learning to rank for information retrieval (IR). Ranking is the central problem for information retrieval, and employing machine learning techniques to learn the ranking function is viewed as a promising approach to IR. Unfortunately, there was no benchmark dataset that ..."
Abstract
-
Cited by 73 (11 self)
- Add to MetaCart
This paper is concerned with learning to rank for information retrieval (IR). Ranking is the central problem for information retrieval, and employing machine learning techniques to learn the ranking function is viewed as a promising approach to IR. Unfortunately, there was no benchmark dataset that could be used in comparison of existing learning algorithms and in evaluation of newly proposed algorithms, which stood in the way of the related research. To deal with the problem, we have constructed a benchmark dataset referred to as LETOR and distributed it to the research communities. Specifically we have derived the LETOR data from the existing data sets widely used in IR, namely, OHSUMED and TREC data. The two collections contain queries, the contents of the retrieved documents, and human judgments on the relevance of the documents with respect to the queries. We have extracted features from the datasets, including both conventional features, such as term frequency, inverse document frequency, BM25, and language models for IR, and features proposed recently at SIGIR, such as HostRank, feature propagation, and topical PageRank. We have then packaged LETOR with the extracted features, queries, and relevance judgments. We have also provided the results of several state-ofthe-arts learning to rank algorithms on the data. This paper describes in details about LETOR.
Ranking with multiple hyperplanes
- Proceedings of the 30th Annual International ACM SIGIR Conference
, 2007
"... The central problem for many applications in Information Retrieval is ranking and learning to rank is considered as a promising approach for addressing the issue. Ranking SVM, for example, is a state-of-the-art method for learning to rank and has been empirically demonstrated to be effective. In thi ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
The central problem for many applications in Information Retrieval is ranking and learning to rank is considered as a promising approach for addressing the issue. Ranking SVM, for example, is a state-of-the-art method for learning to rank and has been empirically demonstrated to be effective. In this paper, we study the issue of learning to rank, particularly the approach of using SVM techniques to perform the task. We point out that although Ranking SVM is advantageous, it still has shortcomings. Ranking SVM employs a single hyperplane in the feature space as the model for ranking, which is too simple to tackle complex ranking problems. Furthermore, the training of Ranking SVM is also computationally costly. In this paper, we look at an alternative approach to Ranking SVM, which we call “Multiple Hyperplane Ranker ” (MHR), and make comparisons between the two approaches. MHR takes the divide-and-conquer strategy. It employs multiple hyperplanes to rank instances and finally aggregates the ranking results given by the hyperplanes. MHR contains Ranking SVM as a special case, and MHR can overcome the shortcomings which Ranking SVM suffers from. Experimental results on two information retrieval datasets show that MHR can outperform Ranking SVM in ranking.
Query-level loss functions for information retrieval
- INFORMATION PROCESSING AND MANAGEMENT
, 2008
"... Many machine learning technologies such as support vector machines, boosting, and neural networks have been applied to the ranking problem in information retrieval. However, since originally the methods were not developed for this task, their loss functions do not directly link to the criteria used ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
Many machine learning technologies such as support vector machines, boosting, and neural networks have been applied to the ranking problem in information retrieval. However, since originally the methods were not developed for this task, their loss functions do not directly link to the criteria used in the evaluation of ranking. Specifically, the loss functions are defined on the level of documents or document pairs, in contrast to the fact that the evaluation criteria are defined on the level of queries. Therefore, minimizing the loss functions does not necessarily imply enhancing ranking performances. To solve this problem, we propose using query-level loss functions in learning of ranking functions. We discuss the basic properties that a query-level loss function should have and propose a query-level loss function based on the cosine similarity between a ranking list and the corresponding ground truth. We further design a coordinate descent algorithm, referred to as RankCosine, which utilizes the proposed loss function to create a generalized additive ranking model. We also discuss whether the loss functions of existing ranking algorithms can be extended to query-level. Experimental results on the datasets of TREC web track, OHSUMED, and a commercial web search engine show that with the use of the proposed querylevel loss function we can significantly improve ranking accuracies. Furthermore, we found that it is difficult to extend the document-level loss functions to query-level loss functions.
LETOR: A Benchmark Collection for Research on Learning to Rank for Information Retrieval
"... LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the documen ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.
Learning random walks to rank nodes in graphs
- In ICML’07
, 2007
"... Ranking nodes in graphs is of much recent interest. Edges, via the graph Laplacian, are used to encourage local smoothness of node scores in SVM-like formulations with generalization guarantees. In contrast, Pagerank variants are based on Markovian random walks. For directed graphs, there is no simp ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Ranking nodes in graphs is of much recent interest. Edges, via the graph Laplacian, are used to encourage local smoothness of node scores in SVM-like formulations with generalization guarantees. In contrast, Pagerank variants are based on Markovian random walks. For directed graphs, there is no simple known correspondence between these views of scoring/ranking. Recent scalable algorithms for learning the Pagerank transition probabilities do not have generalization guarantees. In this paper we show some correspondence results between the Laplacian and the Pagerank approaches, and give new generalization guarantees for the latter. We enhance the Pagerank-learning approaches to use an additive margin. We also propose a general framework for rank-sensitive scorelearning, and apply it to Laplacian smoothing. Experimental results are promising.
Learning to Rank for Information Retrieval Using Genetic Programming
"... One central problem of information retrieval (IR) is to determine which documents are relevant and which are not to the user information need. This problem is practically handled by a ranking function which defines an ordering among documents according to their degree of relevance to the user query. ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
One central problem of information retrieval (IR) is to determine which documents are relevant and which are not to the user information need. This problem is practically handled by a ranking function which defines an ordering among documents according to their degree of relevance to the user query. This paper discusses work on using machine learning to automatically generate an effective ranking function for IR. This task is referred to as “learning to rank for IR ” in the field. In this paper, a learning method, RankGP, is presented to address this task. RankGP employs genetic programming to learn a ranking function by combining various types of evidences in IR, including content features, structure features, and query-independent features. The proposed method is evaluated using the LETOR benchmark datasets and found to be competitive with Ranking SVM and RankBoost.
Linear feature-based models for information retrieval
- Information Retrieval
, 2007
"... Abstract. There have been a number of linear, feature-based models proposed by the information retrieval community recently. Although each model is presented differently, they all share a common underlying framework. In this paper we explore and discuss the theoretical issues of this framework, incl ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract. There have been a number of linear, feature-based models proposed by the information retrieval community recently. Although each model is presented differently, they all share a common underlying framework. In this paper we explore and discuss the theoretical issues of this framework, including a novel look at the parameter space. We then detail supervised training algorithms that directly maximize the evaluation metric under consideration, such as mean average precision. We present results that show training models in this way can lead to significantly better test set performance compared to other training methods that do not directly maximize the metric. Finally, we show that linear feature-based models can consistently and significantly outperform current state of the art retrieval models with the correct choice of features.
A Cascade Ranking Model for Efficient Ranked Retrieval
"... There is a fundamental tradeoff between effectiveness and efficiency when designing retrieval models for large-scale document collections. Effectiveness tends to derive from sophisticated ranking functions, such as those constructed using learning to rank, while efficiency gains tend to arise from i ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
There is a fundamental tradeoff between effectiveness and efficiency when designing retrieval models for large-scale document collections. Effectiveness tends to derive from sophisticated ranking functions, such as those constructed using learning to rank, while efficiency gains tend to arise from improvements in query evaluation and caching strategies. Given their inherently disjoint nature, it is difficult to jointly optimize effectiveness and efficiency in end-to-end systems. To address this problem, we formulate and develop a novel cascade ranking model, which unlike previous approaches, can simultaneously improve both top k ranked effectiveness and retrieval efficiency. The model constructs a cascade of increasingly complex ranking functions that progressively prunes and refines the set of candidate documents to minimize retrieval latency and maximize result set quality. We present a novel boosting algorithm for learning such cascades to directly optimize the tradeoff between effectiveness and efficiency. Experimental results show that our cascades are faster and return higher quality results than comparable ranking models.
A Meta-Learning Approach for Robust Rank Learning
"... Learning effective feature-based ranking functions is a fundamental task for search engines, and has recently become an active area of research [10, 3, 2]. Many of these recent algorithms are based on the pairwise preference framework, in which instead of taking documents in isolation, document pair ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Learning effective feature-based ranking functions is a fundamental task for search engines, and has recently become an active area of research [10, 3, 2]. Many of these recent algorithms are based on the pairwise preference framework, in which instead of taking documents in isolation, document pairs are used as instances in the learning process. One disadvantage of this process is that a noisy relevance judgment on a single document can lead to a large number of mis-labeled document pairs. This can jeopardize robustness and deteriorate overall ranking performance. In this paper we study the effects of outlying pairs in rank learning with pairwise preferences and introduce a new meta-learning algorithm capable of suppressing these undesirable effects. This algorithm works as a second optimization step in which any linear baseline ranker can be used as input. Experiments on eight different ranking datasets show that this optimization step produces statistically significant performance gains over various state-of-the-art baseline rankers.
Direct Optimization of Ranking Measures
"... Web page ranking requires the optimization of sophisticated performance measures. Current approaches only minimize measures indirectly related to performance scores. We present a new approach which allows optimization of an upper bound of the appropriate loss function. This is achieved via structure ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Web page ranking requires the optimization of sophisticated performance measures. Current approaches only minimize measures indirectly related to performance scores. We present a new approach which allows optimization of an upper bound of the appropriate loss function. This is achieved via structured estimation, where in our case the input corresponds to a set of documents and the output is a ranking. Training is efficient since computing the loss function can be done via a linear assignment problem. At test time, a sorting operation suffices, as our algorithm assigns a relevance score to every (document, query) pair. Moreover, we provide a general method for finding tighter nonconvex relaxations of structured loss functions. Experiments show that the our algorithm yields improved accuracies on several public and commercial ranking datasets.

