Results 1  10
of
28
A support vector method for multivariate performance measures
 Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear per ..."
Abstract

Cited by 192 (5 self)
 Add to MetaCart
This paper presents a Support Vector Method for optimizing multivariate nonlinear performance measures like the F1score. Taking a multivariate prediction approach, we give an algorithm with which such multivariate SVMs can be trained in polynomial time for large classes of potentially nonlinear performance measures, in particular ROCArea and all measures that can be computed from the contingency table. The conventional classification SVM arises as a special case of our method. 1.
A support vector method for optimizing average precision
 In Proceedings of SIGIR’07
, 2007
"... Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP eithe ..."
Abstract

Cited by 113 (5 self)
 Add to MetaCart
Machine learning is commonly used to improve ranked retrieval systems. Due to computational difficulties, few learning techniques have been developed to directly optimize for mean average precision (MAP), despite its widespread use in evaluating such systems. Existing approaches optimizing MAP either do not find a globally optimal solution, or are computationally expensive. In contrast, we present a general SVM learning algorithm that efficiently finds a globally optimal solution to a straightforward relaxation of MAP. We evaluate our approach using the TREC 9 and TREC 10 Web Track corpora (WT10g), comparing against SVMs optimized for accuracy and ROCArea. In most cases we show our method to produce statistically significant improvements in MAP scores.
Supervised Random Walks: Predicting and Recommending Links in Social Networks
"... Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Althoug ..."
Abstract

Cited by 56 (0 self)
 Add to MetaCart
Predicting the occurrence of links is a fundamental problem in networks. In the link prediction problem we are given a snapshot of a network and would like to infer which interactions among existing members are likely to occur in the near future or which existing interactions are we missing. Although this problem has been extensively studied, the challenge of how to effectively combine the information from the network structure with rich node and edge attribute data remains largely open. We develop an algorithm based on Supervised Random Walks that naturally combines the information from the network structure with node and edge level attributes. We achieve this by using these attributes to guide a random walk on the graph. We formulate a supervised learning task where the goal is to learn a function that assigns strengths to edges in the network such that a random walker is more likely to visit the nodes to which new links will be created in the future. We develop an efficient training algorithm to directly learn the edge strength estimation function. Our experiments on the Facebook social graph and large collaboration networks show that our approach outperforms stateoftheart unsupervised approaches as well as approaches that are based on feature extraction.
Cost curves: an improved method for visualizing classifier performance
 Machine Learning
, 2006
"... Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.
Learning to rank by maximizing auc with linear programming
 IN IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN 2006
, 2006
"... Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approximating the equivalent WicoxonMannWhitney (WMW) statistic. We present a linear programming approach similar to 1norm Suppo ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
Area Under the ROC Curve (AUC) is often used to evaluate ranking performance in binary classification problems. Several researchers have approached AUC optimization by approximating the equivalent WicoxonMannWhitney (WMW) statistic. We present a linear programming approach similar to 1norm Support Vector Machines (SVMs) for instance ranking by an approximation to the WMW statistic. Our formulation can be applied to nonlinear problems by using a kernel function. Our ranking algorithm outperforms SVMs in both AUC and classification performance when using RBF kernels, but curiously not with polynomial kernels. We experiment with variations of chunking to handle the quadratic growth of the number of constraints in our formulation.
Ensemble Feature Ranking
 Proceedings of ECMLPKDD’04
, 2004
"... Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes a ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
Abstract. A crucial issue for Machine Learning and Data Mining is Feature Selection, selecting the relevant features in order to focus the learning search. A relaxed setting for Feature Selection is known as Feature Ranking, ranking the features with respect to their relevance. This paper proposes an ensemble approach for Feature Ranking, aggregating feature rankings extracted along independent runs of an evolutionary learning algorithm named ROGER. The convergence of ensemble feature ranking is studied in a theoretical perspective, and a statistical model is devised for the empirical validation, inspired from the complexity framework proposed in the Constraint Satisfaction domain. Comparative experiments demonstrate the robustness of the approach for learning (a limited kind of) nonlinear concepts, specifically when the features significantly outnumber the examples. 1
Logistic Regression for Data Mining and HighDimensional Classification
, 2004
"... The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundati ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The focus of this thesis is fast and robust adaptations of logistic regression (LR) for data mining and highdimensional classification problems. LR is wellunderstood and widely used in the statistics, machine learning, and data analysis communities. Its benefits include a firm statistical foundation and a probabilistic model useful for ``explaining'' the data. There is a perception that LR is slow, unstable, and unsuitable for large learning or classification tasks. Through fast approximate numerical methods, regularization to avoid numerical instability, and an efficient implementation we will show that LR can outperform modern algorithms like Support Vector Machines (SVM) on a variety of learning tasks. Our novel implementation, which uses a modified iteratively reweighted least squares estimation procedure, can compute model parameters for sparse binary datasets with hundreds of thousands of rows and attributes, and millions or tens of millions of nonzero elements in just a few seconds. Our implementation also handles realvalued dense datasets of similar size.
A fast algorithm for learning large scale preference relations
 Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics
, 2007
"... We consider the problem of learning the ranking function that maximizes a generalization of the WilcoxonMannWhitney statistic on training data. Relying on an ɛexact approximation for the errorfunction, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
We consider the problem of learning the ranking function that maximizes a generalization of the WilcoxonMannWhitney statistic on training data. Relying on an ɛexact approximation for the errorfunction, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m 2), to O(m), where m is the size of the training data. Experiments on public benchmarks for ordinal regression and collaborative filtering show that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when trained on the same data, and is several orders of magnitude faster. 1
SemiSupervised Training of Models for AppearanceBased Statistical Object Detection Methods
, 2004
"... Appearancebased object detection systems using statistical models have proven quite successful. They can reliably detect textured, rigid objects in a variety of poses, lighting conditions and scales. However, the construction of these systems is timeconsuming and difficult because a large number o ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Appearancebased object detection systems using statistical models have proven quite successful. They can reliably detect textured, rigid objects in a variety of poses, lighting conditions and scales. However, the construction of these systems is timeconsuming and difficult because a large number of training examples must be collected and manually labeled in order to capture variations in object appearance. Typically, this requires indicating which regions of the image correspond to the object to be detected, and which belong to background clutter, as well as marking key landmark locations on the object. The goal of this work is to pursue and evaluate approaches which reduce the amount of fully labeled examples needed, by training these models in a semisupervised manner. To this end, we develop approaches based on ExpectationMaximization and selftraining that utilize a small number of fully labeled training examples in combination with a set of "weakly labeled" examples. This is advantageous in that weakly labeled data are inherently less costly to generate, since the label information is specified in an uncertain or incomplete fashion. For example, a weakly labeled image might be labeled as containing the training object, with the object location and scale left unspecified. In this work we analyze the performance of the techniques developed through a comprehensive empirical investigation. We find that supplementing a small fully labeled training set with weakly labeled data in the training process reliably improves detector performance for a variety of detection approaches. The outcome is the identification of successful approaches and key issues that are central to achieving good performance in the semisupervised training of object detection systems.