Results 1  10
of
40
A support vector method for multivariate performance measures
 Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear per ..."
Abstract

Cited by 200 (5 self)
 Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
A support vector method for optimizing average precision
 In Proceedings of SIGIR’07
, 2007
"... Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP eithe ..."
Abstract

Cited by 120 (5 self)
 Add to MetaCart
Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.
Supervised Random Walks: Predicting and Recommending Links in Social Networks
"... Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Althoug ..."
Abstract

Cited by 61 (0 self)
 Add to MetaCart
Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms stateoftheart unsupervised approaches as well as approaches that are based on feature extraction.
Cost curves: an improved method for visualizing classifier performance
 Machine Learning
, 2006
"... Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.
Learning to rank by maximizing auc with linear programming
 IN IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN 2006
, 2006
"... Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approximating the equivalent WicoxonMannWhitney (WMW) statistic. We present a linear programming approach similar to 1norm Suppo ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approximating the equivalent WicoxonMannWhitney (WMW) statistic. We present a linear programming approach similar to 1norm Support Vector Machines (SVMs) for instance ranking by an approximation to the WMW statistic. Our formulation can be applied to nonlinear problems by using a kernel function. Our ranking algorithm outperforms SVMs in both AUC and classification performance when using RBF kernels, but curiously not with polynomial kernels. We experiment with variations of chunking to handle the quadratic growth of the number of constraints in our formulation.
Ensemble Feature Ranking
 Proceedings of ECMLPKDD’04
, 2004
"... Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes a ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes an ensemble approach for Feature Ranking, aggregating feature rankings extracted along independent runs of an evolutionary learning algorithm named ROGER. The convergence of ensemble feature ranking is studied in a theoretical perspective, and a statistical model is devised for the empirical validation, inspired from the complexity framework proposed in the Constraint Satisfaction domain. Comparative experiments demonstrate the robustness of the approach for learning (a limited kind of) nonlinear concepts, specifically when the features significantly outnumber the examples. 1
Logistic Regression for Data Mining and HighDimensional Classification
, 2004
"... The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundati ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundation and a probabilistic model useful for ``explaining'' the data. There is a perception that LR is slow, unstable, and unsuitable for large learning or classification tasks. Through fast approximate numerical methods, regularization to avoid numerical instability, and an efficient implementation we will show that LR can outperform modern algorithms like Support Vector Machines (SVM) on a variety of learning tasks. Our novel implementation, which uses a modified iteratively reweighted least squares estimation procedure, can compute model parameters for sparse binary datasets with hundreds of thousands of rows and attributes, and millions or tens of millions of nonzero elements in just a few seconds. Our implementation also handles realvalued dense datasets of similar size.
A fast algorithm for learning large scale preference relations
 Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
, 2007
"... We consider the problem of learning the ranking function that maximizes a generalization of the WilcoxonMannWhitney statistic on training data. Relying on an ɛexact approximation for the errorfunction, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We consider the problem of learning the ranking function that maximizes a generalization of the WilcoxonMannWhitney statistic on training data. Relying on an ɛexact approximation for the errorfunction, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m 2), to O(m), where m is the size of the training data. Experiments on public benchmarks for ordinal regression and collaborative filtering show that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when trained on the same data, and is several orders of magnitude faster. 1
A maximal figureofmerit (MFoM)learning approach to robust classifier design for text categorization
 ACM Transactions on Information Systems
, 2006
"... We propose a maximal figureofmerit learning (MFoM) approach for robust classifier design, which directly optimizes performance metrics of interest for different target classifiers. The proposed approach, embedding the decision functions of classifiers and performance metrics into the overall train ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
We propose a maximal figureofmerit learning (MFoM) approach for robust classifier design, which directly optimizes performance metrics of interest for different target classifiers. The proposed approach, embedding the decision functions of classifiers and performance metrics into the overall training objective, learns the parameters of classifiers in a decisionfeedback manner to effectively take into account of both positive and negative training samples, and therefore reduce the required size of the positive training data. It has three desirable properties: (a) it is a performance metric oriented learning; (b) the optimized metric is consistent in both training and evaluation sets; and, (c) it is more robust and less sensitive to data variation, and can handle insufficient training data scenario. We evaluate it on the text categorization task using the Reuters21578 dataset. Training a F1based binary tree classifier using MFoM, we observed significantly improved performance and enhanced robustness compared to the baseline and SVM, especially on categories with insufficient training samples. The generality for designing other metricbased classifiers is also demonstrated by comparing the precision, recall, and F1based classifiers. The results clearly show the consistency in performance