Results 1 
4 of
4
Learning to rank using gradient descent
 In ICML
, 2005
"... We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data f ..."
Abstract

Cited by 346 (16 self)
 Add to MetaCart
We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data from a commercial internet search engine. 1.
Probability Estimates for Multiclass Classification by Pairwise Coupling
 Journal of Machine Learning Research
, 2003
"... Pairwise coupling is a popular multiclass classification method that combines together all pairwise comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement. ..."
Abstract

Cited by 187 (1 self)
 Add to MetaCart
Pairwise coupling is a popular multiclass classification method that combines together all pairwise comparisons for each pair of classes. This paper presents two approaches for obtaining class probabilities. Both methods can be reduced to linear systems and are easy to implement.
Pairwise Neural Network Classifiers with Probabilistic Outputs
 in Advances in Neural Information Processing Systems 7
, 1994
"... Multiclass classification problems can be efficiently solved by partitioning the original problem into subproblems involving only two classes: for each pair of classes, a (potentially small) neural network is trained using only the data of these two classes. We show how to combine the outputs of t ..."
Abstract

Cited by 28 (0 self)
 Add to MetaCart
Multiclass classification problems can be efficiently solved by partitioning the original problem into subproblems involving only two classes: for each pair of classes, a (potentially small) neural network is trained using only the data of these two classes. We show how to combine the outputs of the twoclass neural networks in order to obtain posterior probabilities for the class decisions. The resulting probabilistic pairwise classifier is part of a handwriting recognition system which is currently applied to check reading. We present results on real world data bases and show that, from a practical point of view, these results compare favorably to other neural network approaches. 1 Introduction Generally, a pattern classifier consists of two main parts: a feature extractor and a classification algorithm. Both parts have the same ultimate goal, namely to transform a given input pattern into a representation that is easily interpretable as a class decision. In the case of feedforwar...
ADVISER: Roni KhardonFor my parentsKernel Methods and Their Application to Structured Data
, 2009
"... Supervised Machine learning is concerned with the study of algorithms that take examples and their corresponding labels, and learn a general classification function that can predict the label of future examples. For example, an algorithm may take as input a set of molecules, each labeled “toxic ” or ..."
Abstract
 Add to MetaCart
Supervised Machine learning is concerned with the study of algorithms that take examples and their corresponding labels, and learn a general classification function that can predict the label of future examples. For example, an algorithm may take as input a set of molecules, each labeled “toxic ” or “nontoxic ” and try to predict the toxicity of new molecules based on the function learned from the input. In the astronomy domain, one might try to predict the type of a star given a series of measurements of the star’s brightness, based on a set of known stars and measurements of their brightness. The thesis investigates three aspects of machine learning algorithms that use linear classification functions that work implicitly in feature spaces by using similarity functions known as kernels. The first aspect is robustness to noise, that is learning when some of the labels in the known examples are not reliable. An extensive experimental evaluation reveals a surprising result, that the Perceptron Algorithm with margin is an excellent algorithm in such contexts, and it is competitive or better than more sophisticated alternatives. The second aspect is producing estimates of the confidence of predictions from such classifiers, especially