Results 1  10
of
36
Statistical analysis of Bayes optimal subset ranking
 IEEE Transactions on Information Theory
, 2008
"... Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Abstract—The ranking problem has become increasingly important in modern applications of statistical methods in automated decision making systems. In particular, we consider a formulation of the statistical ranking problem which we call subset ranking, and focus on the DCG (discounted cumulated gain) criterion that measures the quality of items near the top of the ranklist. Similar to error minimization for binary classification, direct optimization of natural ranking criteria such as DCG leads to a nonconvex optimization problems that can be NPhard. Therefore a computationally more tractable approach is needed. We present bounds that relate the approximate optimization of DCG to the approximate minimization of certain regression errors. These bounds justify the use of convex learning formulations for solving the subset ranking problem. The resulting estimation methods are not conventional, in that we focus on the estimation quality in the topportion of the ranklist. We further investigate the asymptotic statistical behavior of these formulations. Under appropriate conditions, the consistency of the estimation schemes with respect to the DCG metric can be derived. I.
How to compare different loss functions and their risks
, 2006
"... Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from subst ..."
Abstract

Cited by 17 (2 self)
 Add to MetaCart
Many learning problems are described by a risk functional which in turn is defined by a loss function, and a straightforward and widelyknown approach to learn such problems is to minimize a (modified) empirical version of this risk functional. However, in many cases this approach suffers from substantial problems such as computational requirements in classification or robustness concerns in regression. In order to resolve these issues many successful learning algorithms try to minimize a (modified) empirical risk of a surrogate loss function, instead. Of course, such a surrogate loss must be “reasonably related ” to the original loss function since otherwise this approach cannot work well. For classification good surrogate loss functions have been recently identified, and the relationship between the excess classification risk and the excess risk of these surrogate loss functions has been exactly described. However, beyond the classification problem little is known on good surrogate loss functions up to now. In this work we establish a general theory that provides powerful tools for comparing excess risks of different loss functions. We then apply this theory to several learning problems including (costsensitive) classification, regression, density estimation, and density level detection.
Relative novelty detection
 Twelfth International Conference on Artificial Intelligence and Statistics, volume 5 of JMLR Workshop and Conference Proceedings
, 2009
"... Novelty detection is an important tool for unsupervised data analysis. It relies on finding regions of low density within which events are then flagged as novel. By design this is dependent on the underlying measure of the space. In this paper we derive a formulation which is able to address this pr ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Novelty detection is an important tool for unsupervised data analysis. It relies on finding regions of low density within which events are then flagged as novel. By design this is dependent on the underlying measure of the space. In this paper we derive a formulation which is able to address this problem by allowing for a reference measure to be given in the form of a sample from an alternative distribution. We show that this optimization problem can be solved efficiently and that it works well in practice. 1
On the Consistency of Ranking Algorithms
"... We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency ev ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate loss function. We show that many commonly used surrogate losses are inconsistent; surprisingly, we show inconsistency even in lownoise settings. We present a newvalueregularizedlinear loss, establishits consistency under reasonable assumptions on noise, and show that it outperforms conventional ranking losses in a collaborative filtering experiment. The goal in ranking is to order a set of inputs in accordance with the preferences of an individual or a population. In this paper we consider a general formulation of the supervised ranking problem in which each training example consists of a query q, a set of inputs x, sometimes called results, and a weighted graph G representing preferences over the results. The learning task is to discover a function that provides a queryspecific ordering of the inputs that best respects the observed preferences. This queryindexed setting is natural for tasks like web search in which a different ranking is needed for each query. Following existing literature, we assume the existence of a scoring function f(x,q) that gives a score to each result in x; the scoresaresortedtoproducearanking(Herbrich et al., 2000; Freund et al., 2003). We assume simply that the observed preference graph G is a directed acyclic graph (DAG). Finally, we cast our work in a decisiontheoretic framework in which ranking procedures are evaluated via a loss function L(f(x,q),G).
Composite Multiclass Losses
"... We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
We consider loss functions for multiclass prediction problems. We show when a multiclass loss can be expressed as a “proper composite loss”, which is the composition of a proper loss and a link function. We extend existing results for binary losses to multiclass losses. We determine the stationarity condition, Bregman representation, ordersensitivity, existence and uniqueness of the composite representation for multiclass losses. We subsume existing results on “classification calibration ” by relating it to properness and show that the simple integral representation for binary proper losses can not be extended to multiclass losses. 1
VC Theory of Large Margin MultiCategory Classifiers
"... In the context of discriminant analysis, Vapnik’s statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binaryvalued functions, the computation of dichotomies with realvalued functions, and the computation of polytomies with functions taking ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In the context of discriminant analysis, Vapnik’s statistical learning theory has mainly been developed in three directions: the computation of dichotomies with binaryvalued functions, the computation of dichotomies with realvalued functions, and the computation of polytomies with functions taking their values in finite sets, typically the set of categories itself. The case of classes of vectorvalued functions used to compute polytomies has seldom been considered independently, which is unsatisfactory, for three main reasons. First, this case encompasses the other ones. Second, it cannot be treated appropriately through a naïve extension of the results devoted to the computation of dichotomies. Third, most of the classification problems met in practice involve multiple categories. In this paper, a VC theory of large margin multicategory classifiers is introduced. Central in this theory are generalized VC dimensions called the γΨdimensions. First, a uniform convergence bound on the risk of the classifiers of interest is derived. The capacity measure involved in this bound is a covering number. This covering number can be upper bounded in terms of the γΨdimensions thanks to generalizations of Sauer’s lemma, as is illustrated in the specific case of the scalesensitive Natarajan dimension. A bound on this latter dimension is then computed for the class of functions on which multiclass SVMs are based. This makes it possible to apply the structural risk minimization inductive principle to those machines.
A framework for kernelbased multicategory classification
, 2005
"... A geometric framework for understanding multicategory classification is introduced, through which many existing ‘alltogether ’ algorithms can be understood. The structure allows the derivation of a parsimonious optimisation function, which is a direct extension of the binary ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
A geometric framework for understanding multicategory classification is introduced, through which many existing ‘alltogether ’ algorithms can be understood. The structure allows the derivation of a parsimonious optimisation function, which is a direct extension of the binary
Robust truncated hinge loss support vector machines. Journal of the American Statistical Association 102 974–983. MR2411659 Seo Young Park Department of Statistics and Operations Research CB3260
, 2007
"... The support vector machine (SVM) has been widely applied for classification problems in both machine learning and statistics. Despite its popularity, however, SVM has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Mor ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
The support vector machine (SVM) has been widely applied for classification problems in both machine learning and statistics. Despite its popularity, however, SVM has some drawbacks in certain situations. In particular, the SVM classifier can be very sensitive to outliers in the training sample. Moreover, the number of support vectors (SVs) can be very large in many applications. To circumvent these drawbacks, we propose the robust truncated hinge loss SVM (RSVM), which uses a truncated hinge loss. The RSVM is shown to be more robust to outliers and to deliver more accurate classifiers using a smaller set of SVs than the standard SVM. Our theoretical results show that the RSVM is Fisherconsistent, even when there is no dominating class, a scenario that is particularly challenging for multicategory classification. Similar results are obtained for a class of marginbased classifiers.
ABCBoost: Adaptive Base Class Boost for Multiclass Classification
"... We propose abcboost (adaptive base class boost) for multiclass classification and present abcmart, an implementation of abcboost, based on the multinomial logit model. The key idea is that, at each boosting iteration, we adaptively and greedily choose a base class. Our experiments on public datas ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We propose abcboost (adaptive base class boost) for multiclass classification and present abcmart, an implementation of abcboost, based on the multinomial logit model. The key idea is that, at each boosting iteration, we adaptively and greedily choose a base class. Our experiments on public datasets demonstrate the improvement of abcmart over the original mart algorithm. 1.
Radiusmargin bound on the leaveoneout error of multiclass SVMs
 n o RR5780, INRIA, 2005, http://www.inria.fr/rrrt/rr5780.html. Bibliography in notes
"... Using a support vector machine (SVM) requires to set the values of two types of hyperparameters: the soft margin parameter C and the parameters of the kernel. To perform this model selection task, the method of choice is crossvalidation. Its leaveoneout variant is known to produce an estimator of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Using a support vector machine (SVM) requires to set the values of two types of hyperparameters: the soft margin parameter C and the parameters of the kernel. To perform this model selection task, the method of choice is crossvalidation. Its leaveoneout variant is known to produce an estimator of the generalization error which is almost unbiased. Its major drawback rests in its time requirement. To overcome this difficulty, several upper bounds on the leaveoneout error of the pattern recognition SVM have been derived. Among those bounds, the most popular one is probably the radiusmargin bound. In this report, we establish a generalized radiusmargin bound dedicated to the multiclass SVM of Lee, Lin and Wahba. Keywords: