Results 1  10
of
39
A support vector method for multivariate performance measures
 Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear per ..."
Abstract

Cited by 208 (5 self)
 Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
A support vector method for optimizing average precision
 In SIGIR ’07
, 2007
"... Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP ei ..."
Abstract

Cited by 127 (5 self)
 Add to MetaCart
Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.
Supervised Random Walks: Predicting and Recommending Links in Social Networks
"... Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Althoug ..."
Abstract

Cited by 62 (0 self)
 Add to MetaCart
Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms stateoftheart unsupervised approaches as well as approaches that are based on feature extraction.
Cost curves: an improved method for visualizing classifier performance
 Machine Learning
, 2006
"... Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing ..."
Abstract

Cited by 49 (7 self)
 Add to MetaCart
Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.
The PNorm Push: A Simple Convex Ranking Algorithm that Concentrates at the Top of the List
, 2009
"... We are interested in supervised ranking algorithms that perform especially well near the top of the ranked list, and are only required to perform sufficiently well on the rest of the list. In this work, we provide a general form of convex objective that gives highscoring examples more importance. T ..."
Abstract

Cited by 25 (14 self)
 Add to MetaCart
We are interested in supervised ranking algorithms that perform especially well near the top of the ranked list, and are only required to perform sufficiently well on the rest of the list. In this work, we provide a general form of convex objective that gives highscoring examples more importance. This “push ” near the top of the list can be chosen arbitrarily large or small, based on the preference of the user. We choose ℓpnorms to provide a specific type of push; if the user sets p larger, the objective concentrates harder on the top of the list. We derive a generalization bound based on the pnorm objective, working around the natural asymmetry of the problem. We then derive a boostingstyle algorithm for the problem of ranking with a push at the top. The usefulness of the algorithm is illustrated through experiments on repository data. We prove that the minimizer of the algorithm’s objective is unique in a specific sense. Furthermore, we illustrate how our objective is related to quality measurements for information retrieval.
Learning to rank by maximizing auc with linear programming
 IN IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN 2006
, 2006
"... Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approximating the equivalent WicoxonMannWhitney (WMW) statistic. We present a linear programming approach similar to 1norm Suppo ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approximating the equivalent WicoxonMannWhitney (WMW) statistic. We present a linear programming approach similar to 1norm Support Vector Machines (SVMs) for instance ranking by an approximation to the WMW statistic. Our formulation can be applied to nonlinear problems by using a kernel function. Our ranking algorithm outperforms SVMs in both AUC and classification performance when using RBF kernels, but curiously not with polynomial kernels. We experiment with variations of chunking to handle the quadratic growth of the number of constraints in our formulation.
Ensemble Feature Ranking
 Proceedings of ECMLPKDD’04
, 2004
"... Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes a ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes an ensemble approach for Feature Ranking, aggregating feature rankings extracted along independent runs of an evolutionary learning algorithm named ROGER. The convergence of ensemble feature ranking is studied in a theoretical perspective, and a statistical model is devised for the empirical validation, inspired from the complexity framework proposed in the Constraint Satisfaction domain. Comparative experiments demonstrate the robustness of the approach for learning (a limited kind of) nonlinear concepts, specifically when the features significantly outnumber the examples. 1
Logistic Regression for Data Mining and HighDimensional Classification
, 2004
"... The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundati ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundation and a probabilistic model useful for ``explaining'' the data. There is a perception that LR is slow, unstable, and unsuitable for large learning or classification tasks. Through fast approximate numerical methods, regularization to avoid numerical instability, and an efficient implementation we will show that LR can outperform modern algorithms like Support Vector Machines (SVM) on a variety of learning tasks. Our novel implementation, which uses a modified iteratively reweighted least squares estimation procedure, can compute model parameters for sparse binary datasets with hundreds of thousands of rows and attributes, and millions or tens of millions of nonzero elements in just a few seconds. Our implementation also handles realvalued dense datasets of similar size.
A fast algorithm for learning large scale preference relations
 Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
, 2007
"... We consider the problem of learning the ranking function that maximizes a generalization of the WilcoxonMannWhitney statistic on training data. Relying on an ɛexact approximation for the errorfunction, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We consider the problem of learning the ranking function that maximizes a generalization of the WilcoxonMannWhitney statistic on training data. Relying on an ɛexact approximation for the errorfunction, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m 2), to O(m), where m is the size of the training data. Experiments on public benchmarks for ordinal regression and collaborative filtering show that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when trained on the same data, and is several orders of magnitude faster. 1