Results 1 - 10
of
35
Yahoo! Learning to Rank Challenge Overview
, 2011
"... Learning to rank for information retrieval has gained a lot of interest in the recent years but there is a lack for large real-world datasets to benchmark algorithms. That led us to publicly release two datasets used internally at Yahoo! for learning the web search ranking function. To promote these ..."
Abstract
-
Cited by 72 (6 self)
- Add to MetaCart
Learning to rank for information retrieval has gained a lot of interest in the recent years but there is a lack for large real-world datasets to benchmark algorithms. That led us to publicly release two datasets used internally at Yahoo! for learning the web search ranking function. To promote these datasets and foster the development of state-of-the-art learning to rank algorithms, we organized the Yahoo! Learning to Rank Challenge in spring 2010. This paper provides an overview and an analysis of this challenge, along with a detailed description of the released datasets.
Metric learning to rank
- In Proceedings of the 27th annual International Conference on Machine Learning (ICML
, 2010
"... We study metric learning as a problem of information retrieval. We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precisi ..."
Abstract
-
Cited by 60 (9 self)
- Add to MetaCart
(Show Context)
We study metric learning as a problem of information retrieval. We present a general metric learning algorithm, based on the structural SVM framework, to learn a metric such that rankings of data induced by distance from a query can be optimized against various ranking measures, such as AUC, Precision-at-k, MRR, MAP or NDCG. We demonstrate experimental results on standard classification data sets, and a large-scale online dating recommendation problem. 1.
Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem
, 2009
"... We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 20 ..."
Abstract
-
Cited by 43 (8 self)
- Add to MetaCart
(Show Context)
We present an on-line learning framework tailored towards real-time learning from observed user behavior in search engines and other information retrieval systems. In particular, we only require pairwise comparisons which were shown to be reliably inferred from implicit feedback (Joachims et al., 2007; Radlinski et al., 2008b). We will present an algorithm with theoretical guarantees as well as simulation results.
Predicting Structured Objects with Support Vector Machines
, 2009
"... Machine Learning today offers a broad repertoire of methods for classification and regression. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise naturally in natural language processing, search engines, and bioinformatics. The following explores ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
(Show Context)
Machine Learning today offers a broad repertoire of methods for classification and regression. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise naturally in natural language processing, search engines, and bioinformatics. The following explores a generalization of Support Vector Machines (SVMs) for such complex prediction problems.
The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List
"... Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximize ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
A general approximation framework for direct optimization of information retrieval measures
, 2008
"... Recently direct optimization of information retrieval (IR) measures becomes a new trend in learning to rank. Several methods have been proposed and the effectiveness of them has also been empirically verified. However, theoretical justification to the algorithms was not sufficient and there were man ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
(Show Context)
Recently direct optimization of information retrieval (IR) measures becomes a new trend in learning to rank. Several methods have been proposed and the effectiveness of them has also been empirically verified. However, theoretical justification to the algorithms was not sufficient and there were many open problems remaining. In this paper, we theoreti-cally justify the approach of directly optimizing IR measures, and further propose a new general framework for this approach, which enjoys several theoretical advantages. The general framework, which can be used to optimize most IR measures, addresses the task by approximating the IR measures and optimizing the approximated surrogate functions. Theoret-ical analysis shows that a high approximation accuracy can be achieved by the approach. We take average precision (AP) and normalized discounted cumulative gains (NDCG) as examples to demonstrate how to realize the proposed framework. Experiments on benchmark datasets show that our approach is very effective when compared to existing methods. The em-pirical results also agree well with the theoretical results obtained in the paper. 1
Direct Optimization of Ranking Measures
"... Web page ranking requires the optimization of sophisticated performance measures. Current approaches only minimize measures indirectly related to performance scores. We present a new approach which allows optimization of an upper bound of the appropriate loss function. This is achieved via structure ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Web page ranking requires the optimization of sophisticated performance measures. Current approaches only minimize measures indirectly related to performance scores. We present a new approach which allows optimization of an upper bound of the appropriate loss function. This is achieved via structured estimation, where in our case the input corresponds to a set of documents and the output is a ranking. Training is efficient since computing the loss function can be done via a linear assignment problem. At test time, a sorting operation suffices, as our algorithm assigns a relevance score to every (document, query) pair. Moreover, we provide a general method for finding tighter nonconvex relaxations of structured loss functions. Experiments show that the our algorithm yields improved accuracies on several public and commercial ranking datasets.
Optimizing mean reciprocal rank for person re-identification
- In AVSS
, 2011
"... Person re-identification is one of the most challenging is-sues in network-based surveillance. The difficulties mainly come from the great appearance variations induced by il-lumination, camera view and body pose changes. Maybe influenced by the research on face recognition and gen-eral object recog ..."
Abstract
-
Cited by 11 (6 self)
- Add to MetaCart
(Show Context)
Person re-identification is one of the most challenging is-sues in network-based surveillance. The difficulties mainly come from the great appearance variations induced by il-lumination, camera view and body pose changes. Maybe influenced by the research on face recognition and gen-eral object recognition, this problem is habitually treated as a verification or classification problem, and much effort has been put on optimizing standard recognition criteria. However, we found that in practical applications the users usually have different expectations. For example, in a real surveillance system, we may expect that a visual user inter-face can show us the relevant images in the first few (e.g. 20) candidates, but not necessarily before all the irrelevant ones. In other words, there is no problem to leave the fi-nal judgement to the users. Based on such an observation, this paper treats the re-identification problem as a ranking problem and directly optimizes a listwise ranking function named Mean Reciprocal Rank (MRR), which is considered by us to be able to generate results closest to human expec-tations. Using a maximum-margin based structured learn-ing model, we are able to show improved re-identification results on widely-used benchmark datasets. 1.
Learning content similarity for music recommendation
- IEEE TASLP
, 2012
"... Many tasks in music information retrieval, such as recommendation, and playlist generation for online radio, fall naturally into the query-by-example setting, wherein a user queries the system by providing a song, and the system responds with a list of relevant or similar song recommendations. Such ..."
Abstract
-
Cited by 10 (6 self)
- Add to MetaCart
Many tasks in music information retrieval, such as recommendation, and playlist generation for online radio, fall naturally into the query-by-example setting, wherein a user queries the system by providing a song, and the system responds with a list of relevant or similar song recommendations. Such applications ultimately depend on the notion of similarity between items to produce high-quality results. Current state-of-the-art systems employ collaborative filter methods to represent musical items, effectively comparing items in terms of their constituent users. While collaborative filter techniques perform well when historical data is available for each item, their reliance on historical data impedes performance on novel or unpopular items. To combat this problem, practitioners rely on content-based similarity, which naturally extends to novel items, but is typically outperformed by collaborative filter methods. In this paper, we propose a method for optimizing content-based similarity by learning from a sample of collaborative filter data. The optimized content-based similarity metric can then be applied to answer queries on novel and unpopular items, while still maintaining high recommendation accuracy. The proposed system yields accurate and efficient representations of audio content, and experimental results show significant improvements in accuracy over competing content-based recommendation techniques.
Automatic Factual Question Generation from Text
"... Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this chall ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Texts with potential educational value are becoming available through the Internet (e.g., Wikipedia, news services). However, using these new texts in classrooms introduces many challenges, one of which is that they usually lack practice exercises and assessments. Here, we address part of this challenge by automating the creation of a specific type of assessment item. Specifically, we focus on automatically generating factual WH questions. Our goal is to create an automated system that can take as input a text and produce as output questions for assessing a reader’s knowledge of the information in the text. The questions could then be presented to a teacher, who could select and revise the ones that he or she judges to be useful. After introducing the problem, we describe some of the computational and linguistic challenges presented by factual question generation. We then present an implemented system that leverages existing natural language processing techniques to address some of these challenges. The system uses a combination of manually encoded transformation rules and a statistical question ranker trained on a tailored dataset of labeled system output. We present experiments that evaluate individual components of the system as well as the system as a whole. We found, among other things, that the question ranker roughly doubled the acceptability