## A support vector method for optimizing average precision (2007)

### Cached

### Download Links

- [www.cs.cornell.edu]
- [www.joachims.org]
- [www.cs.cornell.edu]
- [www.yisongyue.com]
- [www.cs.cornell.edu]
- [radlinski.org]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of SIGIR’07 |

Citations: | 115 - 5 self |

### BibTeX

@INPROCEEDINGS{Yue07asupport,

author = {Yisong Yue and Thomas Finley},

title = {A support vector method for optimizing average precision},

booktitle = {In Proceedings of SIGIR’07},

year = {2007},

pages = {271--278},

publisher = {ACM}

}

### Years of Citing Articles

### OpenURL

### Abstract

Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.

### Citations

8961 | The nature of statistical learning theory
- Vapnik
- 1999
(Show Context)
Citation Context ...number of times. Using this setup, we performed the same experiments while using our method (SVM ∆ map), an SVM optimizing for ROCArea (SVM ∆ roc) [13], and a conventional classification SVM (SVMacc) =-=[20]-=-. All SVM methods used a linear kernel. We reported the average performance of all models over the 50 trials. 5.1 Comparison with Base Functions In analyzing our results, the first question to answer ... |

371 | Large margin methods for structured and interdependent output variables
- Tsochantaridis, Joachims, et al.
- 2005
(Show Context)
Citation Context ... not clear how to incorporate non-linear multivariate loss functions such as MAP loss directly into global optimization problems such as SVM training. We now present a method based on structural SVMs =-=[19]-=- to address this problem. We use the structural SVM formulation, presented in Optimization Problem 1, to learn a w ∈ R N . Optimization Problem 1. (Structural SVM) 1 min w,ξ≥0 2 �w�2 + C n s.t. ∀i, ∀y... |

348 | Learning to rank using gradient descent
- Burges, Shaked, et al.
(Show Context)
Citation Context ... approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy [17, 15], ROCArea [1, 5, 10, 11, 13, 21] or modifications of ROCArea [4], and NDCG =-=[2, 3]-=-. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some previous systems have obtained good MAP performance, it is known that neither achiev... |

304 | Document language models, query models, and risk minimization for information retrieval - LAFFERTY, C |

299 |
Large margin rank boundaries for ordinal regression
- Herbrich, Graepel, et al.
- 2000
(Show Context)
Citation Context ...ining data to achieve the same MAP performance. The second common approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy [17, 15], ROCArea =-=[1, 5, 10, 11, 13, 21]-=- or modifications of ROCArea [4], and NDCG [2, 3]. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some previous systems have obtained good... |

295 | Ir Evaluation Methods for Retrieving Highly Relevant Documents
- Järvelin, Kekäläinen
- 2000
(Show Context)
Citation Context ...work used by our method is fairly general. A natural extension of this framework would be to develop methods to optimize for other important IR measures, such as Normalized Discounted Cumulative Gain =-=[2, 3, 4, 12]-=- and Mean Reciprocal Rank. 7. ACKNOWLEDGMENTS This work was funded under NSF Award IIS-0412894, NSF CAREER Award 0237381, and a gift from Yahoo! Research. The third author was also partly supported by... |

194 | The relationship between precision-recall and roc curves
- Davis, Goadrich
- 2006
(Show Context)
Citation Context ...timal MAP performance. In fact, although some previous systems have obtained good MAP performance, it is known that neither achieving optimal accuracy nor ROCArea can guarantee optimal MAP performance=-=[7]-=-. In this paper, we present a general approach for learning ranking functions that maximize MAP performance. Specifically, we present an SVM algorithm that globally optimizes a hinge-loss relaxation o... |

191 | A support vector method for multivariate performance measures
- Joachims
- 2005
(Show Context)
Citation Context ...ining data to achieve the same MAP performance. The second common approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy [17, 15], ROCArea =-=[1, 5, 10, 11, 13, 21]-=- or modifications of ROCArea [4], and NDCG [2, 3]. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some previous systems have obtained good... |

183 | A Markov random field model for term dependencies
- Metzler, Croft
- 2005
(Show Context)
Citation Context ...s it conceptually just as easy to optimize SVMs for MAP as was previously possible only for accuracy and ROCArea. In contrast to recent work directly optimizing for MAP performance by Metzler & Croft =-=[16]-=- and Caruana et al. [6], our technique is computationally efficient while finding a globally optimal solution. Like [6, 16], our method learns a linear model, but is much more efficient in practice an... |

140 | Automatic combination of multiple ranked retrieval systems
- Bartell, Cottrell, et al.
- 1994
(Show Context)
Citation Context ...ining data to achieve the same MAP performance. The second common approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy [17, 15], ROCArea =-=[1, 5, 10, 11, 13, 21]-=- or modifications of ROCArea [4], and NDCG [2, 3]. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some previous systems have obtained good... |

111 |
Learning to rank with nonsmooth cost functions
- Burges, Ragno, et al.
- 2007
(Show Context)
Citation Context ... approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy [17, 15], ROCArea [1, 5, 10, 11, 13, 21] or modifications of ROCArea [4], and NDCG =-=[2, 3]-=-. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some previous systems have obtained good MAP performance, it is known that neither achiev... |

100 | Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring
- Morik, Brockhausen, et al.
- 1999
(Show Context)
Citation Context ...requiring more training data to achieve the same MAP performance. The second common approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy =-=[17, 15]-=-, ROCArea [1, 5, 10, 11, 13, 21] or modifications of ROCArea [4], and NDCG [2, 3]. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some pre... |

83 | Adapting ranking svm to document retrieval
- Cao, Xu, et al.
- 2006
(Show Context)
Citation Context ... second common approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy [17, 15], ROCArea [1, 5, 10, 11, 13, 21] or modifications of ROCArea =-=[4]-=-, and NDCG [2, 3]. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some previous systems have obtained good MAP performance, it is known th... |

80 |
Overview of the TREC-9 web track
- Hawking
- 1999
(Show Context)
Citation Context ...n total. For each query, we considered the scores of documents found in the union of the top 1000 documents of each base function. For our second set of base functions, we used scores from the TREC 9 =-=[8]-=- and TREC 10 [9] Web Track submissions. We used only the non-manual, non-short submissions from both years. For TREC 9 and TREC 10, there were 53 and 18 such submissions, respectively. A typical submi... |

74 | Support vector machines for classification in nonstandard situations
- Lin, Lee, et al.
(Show Context)
Citation Context ...requiring more training data to achieve the same MAP performance. The second common approach is to learn a function that maximizes a surrogate measure. Performance measures optimized include accuracy =-=[17, 15]-=-, ROCArea [1, 5, 10, 11, 13, 21] or modifications of ROCArea [4], and NDCG [2, 3]. Learning a model to optimize for such measures might result in suboptimal MAP performance. In fact, although some pre... |

67 |
Overview of the TREC2001 web track
- Hawking, Craswell
- 2001
(Show Context)
Citation Context ...h query, we considered the scores of documents found in the union of the top 1000 documents of each base function. For our second set of base functions, we used scores from the TREC 9 [8] and TREC 10 =-=[9]-=- Web Track submissions. We used only the non-manual, non-short submissions from both years. For TREC 9 and TREC 10, there were 53 and 18 such submissions, respectively. A typical submission contained ... |

56 | Ensemble selection from libraries of models
- Caruana, Niculescu-Mizil
(Show Context)
Citation Context ...s easy to optimize SVMs for MAP as was previously possible only for accuracy and ROCArea. In contrast to recent work directly optimizing for MAP performance by Metzler & Croft [16] and Caruana et al. =-=[6]-=-, our technique is computationally efficient while finding a globally optimal solution. Like [6, 16], our method learns a linear model, but is much more efficient in practice and, unlike [16], can han... |

39 | Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic
- Yan, Dodier, et al.
- 2003
(Show Context)
Citation Context |

32 |
Optimising area under the ROC curve using gradient descent
- Herschtal, Raskutti
- 2004
(Show Context)
Citation Context |

9 |
D.: Learning a ranking from pairwise preferences
- Carterette, Petkova
- 2006
(Show Context)
Citation Context |

3 |
The probability ranking principle in ir. Journal of documentation 33(4):294–304
- Robertson
- 1977
(Show Context)
Citation Context ...7 ACM 978-1-59593-597-7/07/0007 ...$5.00. Filip Radlinski Cornell University Ithaca, NY, USA filip@cs.cornell.edu Thorsten Joachims Cornell University Ithaca, NY, USA tj@cs.cornell.edu a query (e.g., =-=[18, 14]-=-). If solved effectively, the ranking with best MAP performance can easily be derived from the probabilities of relevance. However, achieving high MAP only requires finding a good ordering of the docu... |