Results 1  10
of
15
Combining Predictions in Pairwise Classification: An Optimal Adaptive Voting Strategy and Its Relation to Weighted Voting
 TO APPEAR IN PATTERN RECOGNITION
, 2009
"... Weighted voting is the commonly used strategy for combining predictions in pairwise classification. Even though it shows good classification performance in practice, it is often criticized for lacking a sound theoretical justification. In this paper, we study the problem of combining predictions wit ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Weighted voting is the commonly used strategy for combining predictions in pairwise classification. Even though it shows good classification performance in practice, it is often criticized for lacking a sound theoretical justification. In this paper, we study the problem of combining predictions within a formal framework of label ranking and, under some model assumptions, derive a generalized voting strategy in which predictions are properly adapted according to the strengths of the corresponding base classifiers. We call this strategy adaptive voting and show that it is optimal in the sense of yielding a MAP prediction of the class label of a test instance. Moreover, we offer a theoretical justification for weighted voting by showing that it yields a good approximation of the optimal adaptive voting prediction. This result is further corroborated by empirical evidence from experiments with real and synthetic data sets showing that, even though adaptive voting is sometimes able to achieve consistent improvements, weighted voting is in general quite competitive, all the more in cases where the aforementioned model assumptions underlying adaptive voting are not met. In this sense, weighted voting appears to be a more robust aggregation strategy.
Machine Learning Techniques—Reductions Between Prediction Quality Metrics
"... Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one los ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Abstract Machine learning involves optimizing a loss function on unlabeled data points given examples of labeled data points, where the loss function measures the performance of a learning algorithm. We give an overview of techniques, called reductions, for converting a problem of minimizing one loss function into a problem of minimizing another, simpler loss function. This tutorial discusses how to create robust reductions that perform well in practice. The reductions discussed here can be used to solve any supervised learning problem with a standard binary classification or regression algorithm available in any machine learning toolkit. We also discuss common design flaws in folklore reductions. 1
Surrogate regret bounds for bipartite ranking via strongly proper losses.
 Journal of Machine Learning Research,
, 2014
"... Abstract The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of misranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been wi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Abstract The problem of bipartite ranking, where instances are labeled positive or negative and the goal is to learn a scoring function that minimizes the probability of misranking a pair of positive and negative instances (or equivalently, that maximizes the area under the ROC curve), has been widely studied in recent years. A dominant theoretical and algorithmic framework for the problem has been to reduce bipartite ranking to pairwise classification; in particular, it is well known that the bipartite ranking regret can be formulated as a pairwise classification regret, which in turn can be upper bounded using usual regret bounds for classification problems. Recently,
A twostage ensemble of diverse models for advertisement ranking in KDD cup 2012
 Department of Computer Science and Information Engineering, National Taiwan University
, 2012
"... This paper describes the solution of National Taiwan Uni ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
This paper describes the solution of National Taiwan Uni
Preferencebased learning to rank
 MACHINE LEARNING
, 2010
"... This paper presents an efficient preferencebased ranking algorithm running in two stages. In the first stage, the algorithm learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes use of that preference function to produce an ac ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
This paper presents an efficient preferencebased ranking algorithm running in two stages. In the first stage, the algorithm learns a preference function defined over pairs, as in a standard binary classification problem. In the second stage, it makes use of that preference function to produce an accurate ranking, thereby reducing the learning problem of ranking to binary classification. This reduction is based on the familiar QuickSort and guarantees an expected pairwise misranking loss of at most twice that of the binary classifier derived in the first stage. Furthermore, in the important special case of bipartite ranking, the factor of two in loss is reduced to one. This improved bound also applies to the regret achieved by our ranking and that of the binary classifier obtained. Our algorithm is randomized, but we prove a lower bound for any deterministic reduction of ranking to binary classification showing that randomization is necessary to achieve our guarantees. This, and a recent result by Balcan et al., who show a regret bound of two for a deterministic algorithm in the bipartite case, suggest a tradeoff between achieving low regret and determinism in this context. Our reduction also admits an improved running time guarantee with respect to that deterministic algorithm. In particular, the number of calls to the preference function in the reduction is improved from Ω(n 2) to O(n log n). In addition, when the top k ranked elements only are required (k ≪n), as in many applications in information extraction or search engine design, the time complexity of our algorithm can be further reduced to O(k log k+n). Our algorithm is thus practical for realistic applications where the number of points to rank exceeds several thousand.
Ranking with kernels in Fourier space
"... In typical ranking problems the total number n of items to be ranked is relatively large, but each data instance involves only k << n items. This paper examines the structure of such partial rankings in Fourier space. Specifically, we develop a kernel–based framework for solving ranking proble ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
In typical ranking problems the total number n of items to be ranked is relatively large, but each data instance involves only k << n items. This paper examines the structure of such partial rankings in Fourier space. Specifically, we develop a kernel–based framework for solving ranking problems, define some canonical kernels on permutations, and show that by transforming to Fourier space, the complexity of computing the kernel between two partial rankings can be reduced from O((n−k)! 2) to O((2k) 2k+3). 1
Bipartite Ranking through Minimization of Univariate Loss
, 2011
"... Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Minimization of the rank loss or, equivalently, maximization of the AUC in bipartite ranking calls for minimizing the number of disagreements between pairs of instances. Since the complexity of this problem is inherently quadratic in the number of training examples, it is tempting to ask how much is actually lost by minimizing a simple univariate loss function, as done by standard classification methods, as a surrogate. In this paper, we first note that minimization of 0/1 loss is not an option, as it may yield an arbitrarily high rank loss. We show, however, that better results can be achieved by means of a weighted (costsensitive) version of 0/1 loss. Yet, the real gain is obtained through marginbased loss functions, for which we are able to derive proper bounds, not only for rank risk but, more importantly, also for rank regret. The paper is completed with an experimental study in which we address specific questions raised by our theoretical analysis.
NEW LEARNING FRAMEWORKS FOR INFORMATION RETRIEVAL
, 2011
"... Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will co ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Recent advances in machine learning have enabled the training of increasingly complex information retrieval models. This dissertation proposes principled approaches to formalize the learning problems for information retrieval, with an eye towards developing a unified learning framework. This will conceptually simplify the overall development process, making it easier to reason about higher level goals and properties of the retrieval system. This dissertation advocates two complementary approaches, structured prediction and interactive learning, to learn featurerich retrieval models that can perform well in practice.
How to Rank with Fewer Errors  A PTAS for Feedback Arc Set in Tournaments
"... We present the first polynomial time approximation scheme (PTAS) for the minimum feedback arc set problem in tournaments. A weighted generalization gives the first PTAS for Kemeny rank aggregation. The runtime is singly exponential in 1/ǫ, improving on the conference version of this work, which was ..."
Abstract
 Add to MetaCart
(Show Context)
We present the first polynomial time approximation scheme (PTAS) for the minimum feedback arc set problem in tournaments. A weighted generalization gives the first PTAS for Kemeny rank aggregation. The runtime is singly exponential in 1/ǫ, improving on the conference version of this work, which was doubly exponential.