Results 1 - 10
of
35
Multi-instance multi-label learning
- Artificial Intelligence
"... In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicate ..."
Abstract
-
Cited by 38 (16 self)
- Add to MetaCart
(Show Context)
In this paper, we propose the MIML (Multi-Instance Multi-Label learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Consideringthat the degeneration process may lose information, we propose the D-MimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming single-instances into the MIML representation for learning, while SubCod works by transforming single-label examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the single-instances or single-label examples directly.
Iterative ranking from pair-wise comparisons
- Advances in Neural Information Processing Systems 25 (NIPS
, 2012
"... The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
(Show Context)
The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining ranking, finding ‘scores ’ for each object (e.g. player’s rating) is of interest to understanding the intensity of the preferences. In this paper, we propose a novel iterative rank aggregation algorithm for discovering scores for objects from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with edges present between two objects if they are compared; the scores turn out to be the stationary probability of this random walk. The algorithm is model independent. To establish the efficacy of our method, however, we consider the popular Bradley-Terry-Luce (BTL) model in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. This, in essence, leads to order-optimal dependence on the number of samples required to learn the scores well by our algorithm. Indeed, the experimental evaluation shows that our (model independent) algorithm performs as well as the Maximum Likelihood Estimator of the BTL model and outperforms a recently proposed algorithm by Ammar and Shah [1]. 1
On NDCG Consistency of Listwise Ranking Methods
"... We study the consistency of listwise ranking methods with respect to the popular Normalized Discounted Cumulative Gain (NDCG) criterion. State of the art listwise approaches replace NDCG with a surrogate loss that is easier to optimize. We characterize NDCG consistency of surrogate losses to discove ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
(Show Context)
We study the consistency of listwise ranking methods with respect to the popular Normalized Discounted Cumulative Gain (NDCG) criterion. State of the art listwise approaches replace NDCG with a surrogate loss that is easier to optimize. We characterize NDCG consistency of surrogate losses to discover a surprising fact: several commonly used surrogates are NDCG inconsistent. We then show how to modify them so that they become NDCG consistent. We then state a stronger but more natural notion of strong NDCG consistency, and surprisingly are able to provide an explicit characterization of all strongly NDCG consistent surrogates. Going beyond qualitative consistency considerations, we also give quantitive statements that enable us to transform the excess error, as measured in the surrogate, to the excess error in comparison to the Bayes optimal ranking function for NDCG. Finally, we also derive improved results if a certain natural “low noise ” or “large margin ” condition holds. Our experiments demonstrate that ensuring NDCG consistency does improve the performance of listwise ranking methods on realworld datasets. Moreover, a novel surrogate function suggested by our theoretical results leads to further improvements over even NDCG consistent versions of existing surrogates. 1
The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List
"... Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximize ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
(Show Context)
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞-norm extreme of the lp-norm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on real-world data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Learning Scoring Functions with Order-Preserving Losses and Standardized Supervision
"... We address the problem of designing surrogate losses for learning scoring functions in the context of label ranking. We extend to ranking problems a notion of orderpreserving losses previously introduced for multiclass classification, and show that these losses lead to consistent formulations with r ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
We address the problem of designing surrogate losses for learning scoring functions in the context of label ranking. We extend to ranking problems a notion of orderpreserving losses previously introduced for multiclass classification, and show that these losses lead to consistent formulations with respect to a family of ranking evaluation metrics. An order-preserving loss can be tailored for a given evaluation metric by appropriately setting some weights depending on this metric and the observed supervision. These weights, called the standard form of the supervision, do not always exist, but we show that previous consistency results for ranking were proved in special cases where they do. We then evaluate a new pairwise loss consistent with the (Normalized) Discounted Cumulative Gain on benchmark datasets. 1.
On the (non-)existence of convex, calibrated surrogate losses for ranking
- In Advances in Neural Information Processing Systems 25
, 2012
"... Abstract We study surrogate losses for learning to rank, in a framework where the rankings are induced by scores and the task is to learn the scoring function. We focus on the calibration of surrogate losses with respect to a ranking evaluation metric, where the calibration is equivalent to the gua ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
Abstract We study surrogate losses for learning to rank, in a framework where the rankings are induced by scores and the task is to learn the scoring function. We focus on the calibration of surrogate losses with respect to a ranking evaluation metric, where the calibration is equivalent to the guarantee that near-optimal values of the surrogate risk imply near-optimal values of the risk defined by the evaluation metric. We prove that if a surrogate loss is a convex function of the scores, then it is not calibrated with respect to two evaluation metrics widely used for search engine evaluation, namely the Average Precision and the Expected Reciprocal Rank. We also show that such convex surrogate losses cannot be calibrated with respect to the Pairwise Disagreement, an evaluation metric used when learning from pairwise preferences. Our results cast lights on the intrinsic difficulty of some ranking problems, as well as on the limitations of learning-to-rank algorithms based on the minimization of a convex surrogate risk.
Classification Calibration Dimension for General Multiclass Losses
"... We study consistency properties of surrogate loss functions for general multiclass classification problems, defined by a general loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specif ..."
Abstract
-
Cited by 9 (6 self)
- Add to MetaCart
(Show Context)
We study consistency properties of surrogate loss functions for general multiclass classification problems, defined by a general loss matrix. We extend the notion of classification calibration, which has been studied for binary and multiclass 0-1 classification problems (and for certain other specific learning problems), to the general multiclass setting, and derive necessary and sufficient conditions for a surrogate loss to be classification calibrated with respect to a loss matrix in this setting. We then introduce the notion of classification calibration dimension of a multiclass loss matrix, which measures the smallest ‘size ’ of a prediction space for which it is possible to design a convex surrogate that is classification calibrated with respect to the loss matrix. We derive both upper and lower bounds on this quantity, and use these results to analyze various loss matrices. In particular, as one application, we provide a different route from the recent result of Duchi et al. (2010) for analyzing the difficulty of designing ‘low-dimensional ’ convex surrogates that are consistent with respect to pairwise subset ranking losses. We anticipate the classification calibration dimension may prove to be a useful tool in the study and design of surrogate losses for general multiclass learning problems. 1
Statistical consistency of ranking methods in a rank-differentiable probability space
- In Advances in Neural Information Processing Systems 25
, 2012
"... Abstract This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise s ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
Abstract This paper is concerned with the statistical consistency of ranking methods. Recently, it was proven that many commonly used pairwise ranking methods are inconsistent with the weighted pairwise disagreement loss (WPDL), which can be viewed as the true loss of ranking, even in a low-noise setting. This result is interesting but also surprising, given that the pairwise ranking methods have been shown very effective in practice. In this paper, we argue that the aforementioned result might not be conclusive, depending on what kind of assumptions are used. We give a new assumption that the labels of objects to rank lie in a rank-differentiable probability space (RDPS), and prove that the pairwise ranking methods become consistent with WPDL under this assumption. What is especially inspiring is that RDPS is actually not stronger than but similar to the low-noise setting. Our studies provide theoretical justifications of some empirical findings on pairwise ranking methods that are unexplained before, which bridge the gap between theory and applications.
Rank centrality: Ranking from pair-wise comparisons,” arXiv:1209.1688
, 2012
"... The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The question of aggregating pairwise comparisons to obtain a global ranking over a collection of objects has been of interest for a very long time: be it ranking of online gamers (e.g. MSR’s TrueSkill system) and chess players, aggregating social opinions, or deciding which product to sell based on transactions. In most settings, in addition to obtaining a ranking, finding ‘scores ’ for each object (e.g. player’s rating) is of interest for understanding the intensity of the preferences. In this paper, we propose Rank Centrality, an iterative rank aggregation algorithm † for discovering scores for objects (or items) from pairwise comparisons. The algorithm has a natural random walk interpretation over the graph of objects with an edge present between a pair of objects if they are compared; the score, which we call Rank Centrality, of an object turns out to be it’s stationary probability under this random walk. To study the efficacy of the algorithm, we consider the popular Bradley-Terry-Luce (BTL) model (also known as the Multi Nomial Logit (MNL)) in which each object has an associated score which determines the probabilistic outcomes of pairwise comparisons between objects. We bound the finite sample error rates between the scores assumed by the BTL model and those estimated by our algorithm. In particular, the number of samples required to learn the score well with high probability depends on the structure of the
Learning mixed multinomial logit model from ordinal data.
- In Advances in Neural Information Processing Systems,
, 2014
"... Abstract Motivated by generating personalized recommendations using ordinal (or preference) data, we study the question of learning a mixture of MultiNomial Logit (MNL) model, a parameterized class of distributions over permutations, from partial ordinal or preference data (e.g. pair-wise compariso ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
(Show Context)
Abstract Motivated by generating personalized recommendations using ordinal (or preference) data, we study the question of learning a mixture of MultiNomial Logit (MNL) model, a parameterized class of distributions over permutations, from partial ordinal or preference data (e.g. pair-wise comparisons). Despite its long standing importance across disciplines including social choice, operations research and revenue management, little is known about this question. In case of single MNL models (no mixture), computationally and statistically tractable learning from pair-wise comparisons is feasible. However, even learning mixture with two MNL components is infeasible in general. Given this state of affairs, we seek conditions under which it is feasible to learn the mixture model in both computationally and statistically efficient manner. We present a sufficient condition as well as an efficient algorithm for learning mixed MNL models from partial preferences/comparisons data. In particular, a mixture of r MNL components over n objects can be learnt using samples whose size scales polynomially in n and r (concretely, r 3.5 n 3 (log n) 4 , with r n 2/7 when the model parameters are sufficiently incoherent). The algorithm has two phases: first, learn the pair-wise marginals for each component using tensor decomposition; second, learn the model parameters for each component using RANKCEN-TRALITY introduced by Negahban et al. In the process of proving these results, we obtain a generalization of existing analysis for tensor decomposition to a more realistic regime where only partial information about each sample is available.