## Listwise approach to learning to rank - theory and algorithm (2008)

### Cached

### Download Links

Venue: | Proceedings of 25th International Conference on Machine Learning |

Citations: | 52 - 13 self |

### BibTeX

@INPROCEEDINGS{Liu08listwiseapproach,

author = {Tie-yan Liu and Jue Wang and Wensheng Zhang and Hang Li},

title = {Listwise approach to learning to rank - theory and algorithm},

booktitle = {Proceedings of 25th International Conference on Machine Learning},

year = {2008},

pages = {1192--1199}

}

### OpenURL

### Abstract

This paper aims to conduct a study on the listwise approach to learning to rank. The listwise approach learns a ranking function by taking individual lists as instances and minimizing a loss function defined on the predicted list and the ground-truth list. Existing work on the approach mainly focused on the development of new algorithms; methods such as RankCosine and ListNet have been proposed and good performances by them have been observed. Unfortunately, the underlying theory was not sufficiently studied so far. To amend the problem, this paper proposes conducting theoretical analysis of learning to rank algorithms through investigations on the properties of the loss functions, including consistency, soundness, continuity, differentiability, convexity, and efficiency. A sufficient condition on consistency for ranking is given, which seems to be the first such result obtained in related research. The paper then conducts analysis on three loss functions: likelihood loss, cosine loss, and cross entropy loss. The latter two were used in RankCosine and ListNet. The use of the likelihood loss leads to the development of

### Citations

2686 | Modern Information Retrieval
- Baeza-Yates, Ribeiro-Neto
- 1999
(Show Context)
Citation Context ...ents were repeated 20 times with different initial values of parameters in the Neural Network model. Table 2 shows the means and standard deviations of the accuracies and Mean Average Precision (MAP)(=-=Baeza-Yates & Ribeiro-Neto, 1999-=-) of the three algorithms. The accuracy measures the proportion of correctly ranked instances and MAP5 is a commonly used measure in IR. As shown in the table, ListMLE achieves the best performance am... |

804 |
The Elements of Statistical Learning: Data Mining, Inference and Prediction
- Hastie, Tibshirani, et al.
- 2002
(Show Context)
Citation Context ...t the confidence of prediction. For example, hinge loss, exponential loss, and logistic loss are sound for classification. In contrast, square loss is sound for regression but not for classification (=-=Hastie et al., 2001-=-). 3. Listwise Approach We give a formal definition of the listwise approach to learning to rank. Let X be the input space whose 2 In a broad sense, methods directly optimizing evaluation measures, su... |

566 | An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research
- Freund, Iyer, et al.
- 2003
(Show Context)
Citation Context ...pati, 2004) transforms ranking into regression or classification on 1 In this paper, we use permutation and ranked list interchangeably. single objects. The pairwise approach (Herbrich et al., 1999) (=-=Freund et al., 1998-=-) (Burges et al., 2005) transforms ranking into classification on object pairs. The advantage for these two approaches is that existing theories and algorithms on regression or classification can be d... |

388 | Learning to rank using gradient descent
- Burges, Shaked, et al.
- 2005
(Show Context)
Citation Context ... ranking into regression or classification on 1 In this paper, we use permutation and ranked list interchangeably. single objects. The pairwise approach (Herbrich et al., 1999) (Freund et al., 1998) (=-=Burges et al., 2005-=-) transforms ranking into classification on object pairs. The advantage for these two approaches is that existing theories and algorithms on regression or classification can be directly applied, but t... |

320 | Ir evaluation methods for retrieving highly relevant documents
- Järvelin, Kekäläinen
- 2000
(Show Context)
Citation Context ...a split provided in LETOR to conduct five-fold cross validation experiments. In evaluation, besides MAP, we adopted another measures commonly used in IR: Normalized Discounted Cumulative Gain (NDCG) (=-=Jarvelin & Kekanainen, 2000-=-). Note that here the ground truth in the data is given as partial ranking, while the methods need to use total ranking (permutation) in training. To bridge the gap, for RankCosine and ListNet, we ado... |

147 | Learning to rank: from pairwise approach to listwise approach
- Cao, Qin, et al.
- 2007
(Show Context)
Citation Context ...y than previous work. Several methods such as RankCosine and ListNet have been proposed. Previous experiments demonstrate that the listwise approach usually performs better than the other approaches (=-=Cao et al., 2007-=-)(Qin et al., 2007). Existing work on the listwise approach mainly focused on the development of new algorithms, such as RankCosine and ListNet. However, there was no sufficient theoretical foundation... |

127 | A support vector method for optimizing average precision
- Yue, Finley, et al.
- 2007
(Show Context)
Citation Context ...Approach We give a formal definition of the listwise approach to learning to rank. Let X be the input space whose 2 In a broad sense, methods directly optimizing evaluation measures, such as SVM-MAP (=-=Yue et al., 2007-=-) and AdaRank (Xu & Li, 2007) can also be regarded as listwise algorithms. We will, however, limit our discussions in this paper on algorithms like ListNet and RankCosine.Listwise Approach to Learnin... |

121 | Convexity, classification, and risk bounds
- Bartlett, Jordan, et al.
- 2006
(Show Context)
Citation Context ...ifier can achieve the optimal Bayes error rate in the large sample limit. Many well known loss functions such as hinge loss, exponential loss, and logistic loss are all consistent (cf., (Zhang, 2004)(=-=Bartlett et al., 2003-=-)(Lin, 2002)). Soundness of a loss function guarantees that the loss can represent well the targeted learning problem. That is, an incorrect prediction should receive a larger penalty than a correct p... |

108 | Letor: Benchmark dataset for research on learning to rank for information retrieval
- Liu, Xu, et al.
- 2007
(Show Context)
Citation Context ...xperimental Results We conducted two experiments to verify the correctness of the theoretical findings. One data set is synthetic data, and the other is the LETOR benchmark data for learning to rank (=-=Liu et al., 2007-=-).Listwise Approach to Learning to Rank - Theory and Algorithm Table 1. Comparison between different surrogate losses. Loss Consistency Soundness Continuity Differentiability Convexity Complexity √ √... |

102 | AdaRank: a boosting algorithm for information retrieval
- Xu, Li
- 2007
(Show Context)
Citation Context ...ition of the listwise approach to learning to rank. Let X be the input space whose 2 In a broad sense, methods directly optimizing evaluation measures, such as SVM-MAP (Yue et al., 2007) and AdaRank (=-=Xu & Li, 2007-=-) can also be regarded as listwise algorithms. We will, however, limit our discussions in this paper on algorithms like ListNet and RankCosine.Listwise Approach to Learning to Rank - Theory and Algor... |

101 |
Analyzing and modeling rank data
- Marden
- 1995
(Show Context)
Citation Context ...e predicted result (by the ranking function), and define the loss function as the negative log likelihood of the ground truth list. The probability distribution turns out to be a Plackett-Luce model (=-=Marden, 1995-=-). The likelihood loss function has the nice properties as below.Listwise Approach to Learning to Rank - Theory and Algorithm First, the likelihood loss is consistent. The following proposition shows... |

86 | Support vector machines and the Bayes rule in classi
- Lin
- 1999
(Show Context)
Citation Context ...ptimal Bayes error rate in the large sample limit. Many well known loss functions such as hinge loss, exponential loss, and logistic loss are all consistent (cf., (Zhang, 2004)(Bartlett et al., 2003)(=-=Lin, 2002-=-)). Soundness of a loss function guarantees that the loss can represent well the targeted learning problem. That is, an incorrect prediction should receive a larger penalty than a correct prediction, ... |

80 | Discriminative models for information retrieval
- Nallapati
- 2004
(Show Context)
Citation Context ...s are reported in Section 6 and the conclusion and future work are given in the last section. 2. Related Work Existing methods for learning to rank fall into three categories. The pointwise approach (=-=Nallapati, 2004-=-) transforms ranking into regression or classification on 1 In this paper, we use permutation and ranked list interchangeably. single objects. The pairwise approach (Herbrich et al., 1999) (Freund et ... |

35 | Subset ranking using regression - Cossock, Zhang - 2006 |

22 | Query-level loss functions for information retrieval
- Qin, Zhang, et al.
- 2008
(Show Context)
Citation Context ...rk. Several methods such as RankCosine and ListNet have been proposed. Previous experiments demonstrate that the listwise approach usually performs better than the other approaches (Cao et al., 2007)(=-=Qin et al., 2007-=-). Existing work on the listwise approach mainly focused on the development of new algorithms, such as RankCosine and ListNet. However, there was no sufficient theoretical foundation laid down. Furthe... |

2 |
Support vector vector learning for ordinal regression
- Herbrich, Graepel, et al.
- 1999
(Show Context)
Citation Context ...ointwise approach (Nallapati, 2004) transforms ranking into regression or classification on 1 In this paper, we use permutation and ranked list interchangeably. single objects. The pairwise approach (=-=Herbrich et al., 1999-=-) (Freund et al., 1998) (Burges et al., 2005) transforms ranking into classification on object pairs. The advantage for these two approaches is that existing theories and algorithms on regression or c... |