Results 1  10
of
150
Learning to rank using gradient descent
 In ICML
, 2005
"... We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data f ..."
Abstract

Cited by 346 (16 self)
 Add to MetaCart
We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine. 1.
Classification by pairwise coupling
, 1998
"... We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the BradleyTerry method for paired comparisons. We study the nature of the class probability estim ..."
Abstract

Cited by 273 (0 self)
 Add to MetaCart
We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the BradleyTerry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated datasets. Classifiers used include linear discriminants, nearest neighbors, and the support vector machine.
Active exploration for learning rankings from clickthrough data
 In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
, 2007
"... We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specificall ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
We address the task of learning rankings of documents from search engine logs of user behavior. Previous work on this problem has relied on passively collected clickthrough data. In contrast, we show that an active exploration strategy can provide data that leads to much faster learning. Specifically, we develop a Bayesian approach for selecting rankings to present users so that interations result in more informative training data. Our results using the TREC10 Web corpus, as well as synthetic data, demonstrate that a directed exploration strategy quickly leads to users being presented improved rankings in an online learning setting. We find that active exploration substantially outperforms passive observation and random exploration.
Label Ranking by Learning Pairwise Preferences
"... Preference learning is an emerging topic that appears in different guises in the recent literature. This work focuses on a particular learning scenario called label ranking, where the problem is to learn a mapping from instances to rankings over a finite number of labels. Our approach for learning s ..."
Abstract

Cited by 46 (16 self)
 Add to MetaCart
Preference learning is an emerging topic that appears in different guises in the recent literature. This work focuses on a particular learning scenario called label ranking, where the problem is to learn a mapping from instances to rankings over a finite number of labels. Our approach for learning such a mapping, called ranking by pairwise comparison (RPC), first induces a binary preference relation from suitable training data using a natural extension of pairwise classification. A ranking is then derived from the preference relation thus obtained by means of a ranking procedure, whereby different ranking methods can be used for minimizing different loss functions. In particular, we show that a simple (weighted) voting strategy minimizes risk with respect to the wellknown Spearman rank correlation. We compare RPC to existing label ranking methods, which are based on scoring individual labels instead of comparing pairs of labels. Both empirically and theoretically, it is shown that RPC is superior in terms of computational efficiency, and at least competitive in terms of accuracy.
Multilabel classification via calibrated label ranking
 MACH LEARN
, 2008
"... Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
Label ranking studies the problem of learning a mapping from instances to rankings over a predefined set of labels. Hitherto existing approaches to label ranking implicitly operate on an underlying (utility) scale which is not calibrated in the sense that it lacks a natural zero point. We propose a suitable extension of label ranking that incorporates the calibrated scenario and substantially extends the expressive power of these approaches. In particular, our extension suggests a conceptually novel technique for extending the common learning by pairwise comparison approach to the multilabel scenario, a setting previously not being amenable to the pairwise decomposition technique. The key idea of the approach is to introduce an artificial calibration label that, in each example, separates the relevant from the irrelevant labels. We show that this technique can be viewed as a combination of pairwise preference learning and the conventional relevance classification technique, where a separate classifier is trained to predict whether a label is relevant or not. Empirical results in the area of text categorization, image classification and gene analysis underscore the merits of the calibrated model in comparison to stateoftheart multilabel learning methods.
MM algorithms for generalized BradleyTerry models
 The Annals of Statistics
, 2004
"... The Bradley–Terry model for paired comparisons is a simple and muchstudied means to describe the probabilities of the possible outcomes when individuals are judged against one another in pairs. Among the many studies of the model in the past 75 years, numerous authors have generalized it in several ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
The Bradley–Terry model for paired comparisons is a simple and muchstudied means to describe the probabilities of the possible outcomes when individuals are judged against one another in pairs. Among the many studies of the model in the past 75 years, numerous authors have generalized it in several directions, sometimes providing iterative algorithms for obtaining maximum likelihood estimates for the generalizations. Building on a theory of algorithms known by the initials MM, for minorization–maximization, this paper presents a powerful technique for producing iterative maximum likelihood estimation algorithms for a wide class of generalizations of the Bradley–Terry model. While algorithms for problems of this type have tended to be custombuilt in the literature, the techniques in this paper enable their mass production. Simple conditions are stated that guarantee that each algorithm described will produce a sequence that converges to the unique maximum likelihood estimator. Several of the algorithms and convergence results herein are new. 1. Introduction. In
The choice axiom after twenty years
 Journal of Mathematical Psychology
, 1977
"... This survey is divided into three major sections. The first concerns mathematical results about the choice axiom and the choice models that devoIve from it. For example, its relationship to Thurstonian theory is satisfyingly understood; much is known about how choice and ranking probabilities may re ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
This survey is divided into three major sections. The first concerns mathematical results about the choice axiom and the choice models that devoIve from it. For example, its relationship to Thurstonian theory is satisfyingly understood; much is known about how choice and ranking probabilities may relate, although little of this knowledge seems empirically useful; and there are certain interesting statistical facts. The second section describes attempts that have been made to test and apply these models. The testing has been done mostly, though not exclusively, by psychologists; the applications have been mostly in economics and sociology. Although it is clear from many experiments that the conditions under which the choice axiom holds are surely delicate, the need for simple, rational underpinnings in complex theories, as in economics and sociology, leads one to accept assumptions that are at best approximate. And the third section concerns alternative, more general theories which, in spirit, are much like the choice axiom. Perhaps I had best admit at the outset that, as a commentator on this scene, I am qualified no better than many others and rather less well than some who have been working in this area recently, which I have not been. My pursuits have led me along other,
Generalized bradleyterry models and multiclass probability estimates
 Journal of Machine Learning Research
"... Editor: The BradleyTerry model for obtaining individual skill from paired comparisons has been popular in many areas. In machine learning, this model is related to multiclass probability estimates by coupling all pairwise classification results. Error correcting output codes (ECOC) are a general f ..."
Abstract

Cited by 26 (3 self)
 Add to MetaCart
Editor: The BradleyTerry model for obtaining individual skill from paired comparisons has been popular in many areas. In machine learning, this model is related to multiclass probability estimates by coupling all pairwise classification results. Error correcting output codes (ECOC) are a general framework to decompose a multiclass problem to several binary problems. To obtain probability estimates under this framework, this paper introduces a generalized BradleyTerry model in which paired individual comparisons are extended to paired team comparisons. We propose a simple algorithm with convergence proofs to solve the model and obtain individual skill. Experiments on synthetic and real data demonstrate that the algorithm is useful for obtaining multiclass probability estimates. Moreover, we discuss four extensions of the proposed model: 1) weighted individual skill, 2) homefield advantage, 3) ties, and 4) comparisons with more than two teams. Keywords: BradleyTerry model, Probability estimates, Error correcting output codes, Support Vector Machines
Reducing Multiclass to Binary By Coupling Probability Estimates
, 2001
"... This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise cou ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
This paper presents a method for obtaining class membership probability estimates for multiclass classification problems by coupling the probability estimates produced by binary classifiers. This is an extension for arbitrary code matrices of a method due to Hastie and Tibshirani for pairwise coupling of probability estimates. Experimental results with Boosted Naive Bayes show that our method produces calibrated class membership probability estimates, while having similar classification accuracy as lossbased decoding, a method for obtaining the most likely class that does not generate probability estimates.